Yes, the 8086 wanted to be mechanically translatable from the 8080, but

rep_lodsb · on July 25, 2022

IMO the base+index encoding (in 16-bit x86) was a mistake, it should have simply allowed indirection through any single register (+ immediate offset), including SP.

• the address calculation would be simple enough to do hardwired even on the original 8086, instead of using slow microcode. The 8086 actually took a different amount of clock cycles depending on the registers used for base and index, because there wasn't enough ROM space to duplicate the code for every combination without using jumps!

• you could use [SP+xxxx] to access local variables (frame pointer omission)

• would be able to emulate the 8080's LDAX/STAX and XTHL in a single instruction

• with so few registers, you really want every one available for memory addressing. Especially AX which is used implicitly by instructions like LODSW and MUL.

AceJohnny2 · on July 25, 2022

That would require additional transistors/surface, which was at a premium.

The 8086 was designed under harsh technical constraints by moderns standards. Don't forget that.

AceJohnny2 · on July 25, 2022

TBH I'm still flabbergasted that modern SoCs will "just put a CPU there" rather than a DMA, because we can actually afford it. 6502 cores are sprinkled around like fairy dust.

dannyobrien · on July 25, 2022

I remember someone who was acutely familiar with the design of my first computer (the Ohio Scientific Superboard II, an awesome 1970s 6502 machine! http://oldcomputers.net/osi-600.html ) saying that he spotted the same hardware being used, in its entirety, as a printer buffer a few years later. (A printer buffer was basically a device used to cache the output of a computer to a slow printer so the original machine didn't have wait for the printer to finish. A glorified cable, in other words.)

K0balt · on July 26, 2022

I cut my computing teeth on a sibling of that machine, an Ohio Scientific C2-8P that my brother and I bought in 1979. I was 12, and reverse engineering the games that came on the cassette tapes with the machine are how I got started in software. The arduino uno (atmel/microchip AT328p) is a much more powerful computer lol.

ngcc_hk · on July 26, 2022

The main reason why we have network is not for pc to pc communication but sharing printer and hard disk. Communication comes later. Do not underestimate the productivity gain of printer buffer.

Those were the days my friend.

reaperducer · on July 25, 2022

I'm very sad to live in a world where "printer buffer" has to be explained to the "tech" community.

dannyobrien · on July 26, 2022

Well, I was pre-emptively explaining it, because I realised in mid-anecdote that printers don't need that kind of thing these days. Maybe you are right, and it is obvious to someone skilled in the arts, as the patent applications say!

reaperducer · on July 26, 2022

They do need them, it's just that the buffer is built in to the printer now because memory and integration are so cheap.

The giant Kyocera printer at work has a little screen where it shows you it not only loading up the print buffer, but purging the information from the disk afterward so there is no trace of PII remaining.

leereeves · on July 26, 2022

They have them, because, as you said, it's so cheap. But I think dannyobrien is correct; they probably don't need them, because modern OSes have print queues and the computer is no longer useless while printing.

Someone · on July 26, 2022

Even if the printer is dumb and doesn’t interpret a page description language such as PostScript, I don’t think it’s feasible to get the timing right to drive modern printing hardware when the buffer is on some random other piece of hardware whose OS you don’t even control, even if the connection is by cable.

krallja · on July 26, 2022

It’s nice to be able to run off a long document, pack up your laptop, and pick up the document on your way out the door to the (courtroom|client|boardroom|exam hall|___etc___).

rasz · on July 26, 2022

Its been always like that.

1982 Victor 9000 / Sirius 1 designed by Chuck Peddle of Commodore/MOS 6502 fame had floppy capacity extended from standard IBM PC 180KB all the way up to 600KB per side. 1.2MB floppy in 1982 using same magnetic medium! How? Software controlled drive with 6502 doing GCR combined with Zoned Constant Linear Velocity and Zone bit recording. There were still hard drives being manufactured not using ZBR 8 years later (Seagate ST-157A)!

It also did 25x80 text using 10x16 font or 800x400 bitmap graphics at 76Hz. https://en.wikipedia.org/wiki/Sirius_Systems_Technology http://oldcomputers.net/victor9000.html

There was also an IBM PC 8088 clone in ~1984 with "dedicated Video DMA" implemented by Intel MCS-48 microcontroller (8042/8048, same chip as keyboard controllers at the time), sadly I cant remember the name and google fails me :( I do remember nobody used it for anything and the only coding example I could find at the time was a singular YT video where someone reverse engineered manufacturer demo and reused it for its own sprite engine.

jffry · on July 25, 2022

The FPGA dev boards I worked with in a college course circa 2009 had a couple of PowerPC cores thrown in as a bonus, with a whole system for being able to interface between CPU and FPGA: https://www.xilinx.com/content/dam/xilinx/support/documents/...

moring · on July 26, 2022

Most FPGA projects need a processor anyway, usually adding a soft core, but a hard core is superior in area and performance -- though you cannot choose the type of core anymore, obviously. I'd argue this is a different scenario because they didn't "sprinkle cores" because they had the area but actually reduced the area needed.

deelowe · on July 26, 2022

It's still common today. The de-10 nano of mister fame leverages the hard processor to great affect.

zasdffaa · on July 26, 2022

Really???? 6502s are still used (albeit as components)? People still do 6502 programming? If true I'm bloody gobsmacked.

WorldMaker · on July 26, 2022

In college (more than a decade ago for me, but still in this millennium) our MOS 6502 hardware/assembly programming lab was called "Microcontroller Design". This was as opposed to the later lab class "Microcomputer Design" which used the more recent Motorola 68k. I suppose the chip that powered the original Macintoshes is slightly closer to today's idea of a microcomputer than the 6502 and its original "microcomputer revolution" Apple II/TI Home Computer/Commodore 64. One of the takeaways from those courses was that the "microcomputer" has moved on so much from what you can breadboard in a lab (the Motorola 68k was one of the last "microcomputer" chipsets that you could breadboard and even that was still a hardware reliability challenge), but MOS 6502 was cemented as a "forever" tool as a "microcontroller". You can still get ICs in bread board forms and build it next to classic TTL logic chips and test/prototype it at human scale clock cycles (things you can see/read in an oscilloscope in real time). You can merge into an SoC core and run it at modern processor clock cycles. There are so many wild uses for the MOS 6502 as a "microcontroller" on the periphery of computing. Between MOS' bankruptcy and the many licensees of the 6502 and the many people that cut their programming teeth in the "microcomputer revolution" of the Apple II et al, the 65xx family likely really is a vestigial "forever chip".

As a programmer, and I've heard this sentiment from others too, I also think the 6502 was also the last "fun" assembly language to write directly without a higher level language, which adds to its legacy.

stockerta · on July 26, 2022

For a lot of things a 6502 or other 8bit cores are more than enough. Don't need to put a giant ARM or RISC-V into everything to drive a simple display.

kragen · on July 26, 2022

I'm just surprised to hear people use a 6502 rather than a Picoblaze or 8051 or something.

Gibbon1 · on July 26, 2022

Those old cores are wildly efficient if you're just twiddling bits and shuffling data. And on modern processes you can run them at GHz clock speeds.

zasdffaa · on July 26, 2022

For the core alone, sure, but don't you have to run them off static ram or something (that is, not slow DRAM) to get anywhere near those speeds in practice?

Gibbon1 · on July 26, 2022

Yeah but code size is tiny, might be as small as a few hundred bytes.

raverbashing · on July 26, 2022

While it does exist, I think the answer is not really. Yes sometimes you don't need more than 8-bit but existing solutions from other vendors beat it with integrated RAM+Flash+IO etc

iasay · on July 26, 2022

Yes you can also still buy new 6502 and 6522 etc from WDC via digikey etc.

rep_lodsb · on July 26, 2022

I'm not talking about adding the 32-bit ModRM & SIB byte to the 8086, just the single-register subset with no special cases. Decoding would be exactly the same as for register operands. And there would still be microcode, just not for adding the base+index register - maybe for adding an offset if it is present.

Yes, the 8086 was simple, but all the parts needed to do this should be present: whether an instruction uses register or memory operands is entirely under the control of hardwired logic, the microcode is the same for both (the address calculation subroutine runs first, but is invoked automatically by the hardware).

I would say Intel chose to make addressing more complicated in order to add a feature that complicated register allocation for very little usefulness. If you see something like [BX+SI] in disassembled code, in 99% of cases it's not an actual instruction, but data or misaligned code.

olliej · on July 26, 2022

The base+index encoding is a significant code size reduction though, and as the element size is a restricted set of powers of two, the scaling is cheaper than a generic shift. So base+index uses less memory (less icache specifically), less instruction dispatch overhead, and is able to perform the needed math more efficiently. I assume in modern chips most of these costs aren't really an issue, but that wasn't the case back when the ISA was created.

rep_lodsb · on July 26, 2022

16-bit x86 had no scaling and limited the combinations of registers to one_of(BX, BP) + one_of(SI, DI).

kabdib · on July 26, 2022

Another big mistake was to make the paragraph size 4 bits. Should have been at least 8 -- that would have given the 8086 a 24-bit address space, which would have made things verrry interesting in the MSDOS / early Windows world.

(I know, they didn't have address pins on the 40-pin package. Still, an architecturally guaranteed 16MB of room to play in would have changed the face of the software industry in the 1980s).

dspillett · on July 26, 2022

> I know, they didn't have address pins on the 40-pin package.

You are right that doesn't have to be a limitation on internal architecture: The 386SX internally had the 386's 32-bit data bus only 24 bits were carried out to the external pins. This, along with squeezing the 32-bit data lines through a 16-bit data bus by doubling up requests, made it able to be used on cheaper motherboards based on 286 designs.

I'm not sure that this would have made enough difference to be worth the cost of the extra silicon at that time of the 8086's design though. While a flatter and more roomy address space would have been convenient to developers there would be little or no benefit for end users. The 386SX was a success because it could run software intended for the original 386 (then renamed 386DX) on cheaper hardware (and had the same MMU allowing virtual memory so could run larger software then it's 16Mb physical RAM limit made possible, at a further performance cost). Also the extra silicon requirements were not as expensive at the time the 386SX was being designed as when the 8086 was.

In fact there was a chip that did similar for the 8086: the 8088. It had an 8-but external address bus and broke up 16-bit requests to make them over it. Though unlike the 386SX it maintained the same address bus of its relative.

manholio · on July 26, 2022

> an architecturally guaranteed 16MB of room to play in would have changed the face of the software industry in the 1980s

I don't think it would have, because such amounts of RAM were prohibitively expensive into the late 80s. It would have changed the way software accessed the available RAM, made life easier for developers and improved compatibility be removing various kludges, but largely computers would have looked and behaved similarly, having largely similar specs.

kabdib · on July 29, 2022

VisiCorp was developing their VisiOn platform in the 1984-ish timeframe. They were bending heaven and earth to fit into 640K.

Telling their corporate customers, "Buy another 512K of RAM" would probably have saved their company and let them ship a number of world-class productivity products, and given Windows some interesting competition.

It's easy to second guess this stuff, on the other hand 512K of RAM was about $120 in 1985. We shipped the Atari ST with that much, when having that amount of memory on a Mac was pretty rare.

Gibbon1 · on July 26, 2022

The way you'd deal with the lack of pins on a 40 pin package would have been to just no connect AD20-23.

kabdib · on July 29, 2022

Right, this is exactly what chips like the 68000 did.

VogonPoetry · on July 26, 2022

Given that the 8086 has SI and DI with a +imm8 mode, like the (IX+imm8) and (IY+imm8) instructions on the Z80. I do wonder if Z80 CP/M machines were the real target rather than the 8080.

I did write an 8086 assembly based Z80 emulator (in the late 80's) that was surprisingly compact and had acceptable Z80 performance on a 12MhZ AMD 80286. It passed CP/M system calls to Concurrent DOS.

I should see if I still have the code and run an emulator torture test to see how bad it was.

rep_lodsb · on July 26, 2022

Too bad that you need SI for the program counter, and DI to emulate some addressing modes that the 8080 could do but x86 can't.

Even if translating code ahead of time, "LD (BC), A" needs to be emulated as something like "MOV DI,CX; MOV [DI],AL".

VogonPoetry · on July 26, 2022

I went and found my code, with a side trip into being able to "unzoo" the archive!

Yes, I agree. I also used SI as the PC so I could use "lodsb" (cute HN name BTW) and "lodsw" for 2 byte args.

My register use was

   286   Z80
   ---   ---
   BX    HL
   CX    BC
   DX    DE
   DI    A
   SI    PC

When emulating the DD, FD and CB prefixes I pre-loaded BP with either IX+d, IY+d (IX, IY were stored in memory) or BX and jumping to common code.

Basic structure was like

  OP_00:
     XOR    AH,AH
     LODSB
     SHL    AX,1
     MOV    BP,AX
     JMP    CS:OP_table[BP]

For OP_02 - "LD (BC),A" I used,

  OP_02:
     XCHG   CX,DI
     MOV    [DI], CL
     XCHG   CX,DI
     JMP    OP_00

I did see some opportunities for additional code-golf. Also noticed that EX AF,AF' did not properly exchange the flags (so that will need to get fixed!).

I was mostly only running m80, l80 and mload.

aftbit · on July 25, 2022

>These redundant encodings are used by some assemblers to “fingerprint” their output.

Fascinating! Does anyone have links to more information about this? What purpose might one have for this fingerprinting?

csense · on July 25, 2022

When I first got into assembly language programming, I used MS-DOS DEBUG as my assembler, since it was included on basically every PC in those days.

I was extremely frustrated because my textbook's later chapters used MASM as the assembler, which at the time cost some ghastly amount of money (somewhere around $180 if I recall correctly, which was a truly staggering number of birthday and Christmas presents).

Anyway, my point is, in 2022 everyone takes for granted that an assembler for any system that matters can be downloaded for free from the Internet -- or if it doesn't, one will soon be written by some enthusiast and posted on Gitlab.

But back in the 1980's and well into the 1990's, assemblers were expensive products that companies actually got away with charging a lot of money for. Presumably they made a significant amount of revenue from those sales.

So the use case that immediately springs to my mind is being able to prove a particular commercial binary was assembled by your assembler, in order to make a successful accusation of software piracy and extract a monetary settlement from its developer.

WalterBright · on July 26, 2022

Back in the old Zortech days in the 1980s, a programmer sent us a macro assembler he had coded up. He wished us to include it with Zortech C++, and pay him a royalty.

My partner was impressed with it, and sent it along to me to evaluate.

My tests showed close compatibility. A bit too good. It didn't take me too long to figure out it was the Microsoft MASM program with the copyright notice patched out.

Dodged a bullet with that one.

The reason assemblers were expensive in those days is they were written in assembler, which is far more expensive than writing in C.

rep_lodsb · on July 26, 2022

The "Arrowsoft Assembler" that used to be included in FreeDOS was a hacked MASM version as well.

https://www.mail-archive.com/freedos-user@lists.sourceforge....

kragen · on July 26, 2022

There were other reasons assemblers were expensive in those days, as you know:

① The number of customers was smaller than it is today, so to remain profitable the NRE cost had to be paid by a smaller number of sales.

② Legal proprietary PC software distribution was competing with user group meetings, Hamvention, and 1200-baud underground BBSes where people used names like "Warlord of Chaos", not pop-up PC repair shops, anonymous FTP sites, Eggdrop, Napster, BitTorrent, Debian, and GitHub. Faster and cheaper ways of sharing software both made it cheaper for enthusiasts to collaborate on writing free software and made illegal copies cheaper and more accessible. It's easier to stomach paying US$180 (say, US$500 today) for MASM when your alternative is to spend your weekend and US$60 making long-distance calls to BBSes that might or might not have a copy.

jacquesm · on July 25, 2022

Yes, this is very much true. For the first three systems I owned I wrote my own assembler, just to save some money (which was incredibly tight in those days, as in: that was three months worth of food or so).

Time was worth relatively little in comparison and I figured I will at least know the instruction set inside-out.

rightbyte · on July 25, 2022

How did you do that?

Did you bootstrap it with hand-assembled code or some Basic? Did you assemble the assembler with itself?

jacquesm · on July 25, 2022

Getting it to the point where it would assemble itself was pretty much the goal of the whole exercise. Hand assembly with Leventhal's book (which I still have :) ) to get it to work, usually one subroutine at the time. And then once it work things got much quicker, the bigger problem then was a proper editor.

https://www.amazon.nl/Assembly-Language-Programming-Lance-Le...

Memory management was a tricky part and macro expansion only got done after the whole thing worked (that would have saved a lot of time but I couldn't see the wood for the trees at that point).

The nasty part was that instructions were variable length so you could get into these weird errors where your initial guess for a jump target was too small, which would then move everything else as well, which might lead to other jumps becoming too small. I don't have the code any more but I think I fixed that by making all the jumps long jumps initially. A bit wasteful but that way I could at least get it to work in one pass. Later on when I got better at assembly that sort of thing got fixed.

The end product was endless 'data' statements that were poked into memory at 48K and up, right over the basic interpreter, which we had figured out a way to page out and replace from software. This later led to an annotated version of the basic that came with that computer (a version of MS Basic) and subsequently a replacement interpreter with a lot of built in goodies (double speed tape interface, a whole raft of new statements and functions). Good times!

Shout out to my friend Henri G. who did at least half of that work, we had identical home computers (Dragon 32) and worked together a lot on this. He probably still has his in working order :)

rightbyte · on July 26, 2022

> ... initial guess for a jump target was too small ...

Ye that one is annoying.

I am making a VM compiler right now, and I have the same problem. I would probably have to make some intermediate tree to solve it without making every jump a "far jump".

Moving instructions is a headache :)

jacquesm · on July 27, 2022

Sometimes you can get into a series of these where each subsequent increase in length results in a previous jump that worked before now being out of range. That can get quite tedious.

tambourine_man · on July 26, 2022

That was awesome. You should consider expanding it to a blog post or podcast. This is good stuff.

jacquesm · on July 26, 2022

It was good stuff, it's a bit dated now, to put it mildly. Those neurons haven't fired for decades.

dhosek · on July 25, 2022

When I was in high school, I hand-assembled my 6502 code. I’d write it longhand out on graph paper, then write down addresses in the leftmost column, opcodes in the middle & finally I’d type in all the hex from the Apple Monitor. It was slow, painstaking and error-prone, but it made for clear thinking about how the program worked before I sat at the computer.

jdswain · on July 26, 2022

There’s a story about someone watching Woz entering hex into the computer directly to write a program, and every now and then he’d pause for a little. They asked him why and he said he was calculating the offset for a forward branch, which would mean he’d have to think of all the code up until the branch destination and count the bytes. Then he could carry on and enter those bytes.

jacquesm · on July 26, 2022

That's pretty amazing. I would do something similar but easier: invert the branch do a jmp [word] and then continue as though nothing changed. Then, afer you know the destination address fill it into [word]. That way you don't end up running out of memory (your own, not the computer's).

But Woz's trick is far more impressive.

jacquesm · on July 25, 2022

It probably also really honed your ability to simulate a CPU in your head. That's invaluable when looking at code, to build up a mental model of what it really does (rather than what the comments say it does).

colanderman · on July 25, 2022

I did the same (after reverse-engineering the opcodes by filling memory with sequential bytes and listing the output) until I discovered that the Apple //e (and maybe others) included a built-in mini assembler: https://www.youtube.com/watch?v=yJp2TnKhnzY

dhosek · on July 26, 2022

The Mini Assembler was part of the Apple ][ ROM with Integer BASIC but not on the ][+ ROM with AppleSoft BASIC. I think I eventually got a floppy from someone at a math competition or something like that which had a loadable Integer BASIC image that could be loaded into the RAM of a 64K Apple ][+ or //e, but that was not long before I graduated high school and at a point where I was doing most of my programming on a VM/CMS system (and then later VAX/VMS).

colanderman · on July 27, 2022

The //e "Enhanced" model (which I had) had the mini assembler built-in. [1] The monitor even had a shortcut to start it -- "!".

(Had to Google to remember this bit of trivia again -- funnily enough turned up this [2] HN comment I replied to 9 years ago with nearly the exact same comment I made above.)

[1] https://support.apple.com/kb/TA39083?locale=en_US&viewlocale...

[2] https://news.ycombinator.com/item?id=6200660

WalterBright · on July 26, 2022

The company I worked for in the 1970s, Aph, developed a lot of embedded systems. Their secret weapon was they developed macro assemblers for the various microprocessors in BLISS on a PDP-10. (I think Dan O'Dowd, yes, that Dan! developed them.) With a proper macro assembler, as good as the ones on mainframes, that ran on a mainframe, you could develop code much faster.

jacquesm · on July 26, 2022

A properly used macro assembler is a halfway house towards a high level language.

reaperducer · on July 26, 2022

back in the 1980's and well into the 1990's, assemblers were expensive products that companies actually got away with charging a lot of money for

I remember companies forking out $500-$800 ($3,300 in today's money) for Pascal compilers.

And not just for one title, but $3,300 for each version — per user, per year.

But at least you got a box, and nice thick manuals and tutorials.

mrlonglong · on July 25, 2022

There was the a86/d86 suite for assembly and disassembly which were shareware as long it wasn't used for commercial purposes. These worked well enough.

mtrower · on July 29, 2022

They were shareware, but that's not the same as freeware. He did expect payment for the software if you actually used it beyond evaluation, commercially or not. As I recall he wanted $50 for a license.

As incentive, he'd send you enhanced versions of the software (with 386 opcode support iirc... or was it 32-bit support?), and a ringbound hardcopy of the manuals.

stevekemp · on July 26, 2022

A86 was what I used, in the DOS 3.3 days.

Before that it was a case of writing code on graph paper, looking up the opcodes in the manual, and assembling by hand. Needless to say that writing programs was slow and difficult.

I think I jumped straight from paper -> A86, without using debug.com. Not sure why.

WalterBright · on July 26, 2022

And now you can embed a free x86 disassembler in your program.

https://github.com/dlang/dmd/blob/master/compiler/src/dmd/ba...

The cool thing to do with it is add it to your favorite text editor as a command - highlight some of your text and disassemble it! Great party game - what is your name disassembled? Valid code, or are you a seg fault?

kragen · on July 26, 2022

Thank you for that! I think I end in the middle of an instruction:

    rex.WXB jb 0x6050b4       ; jb offset here is 0x61, 'a'
    outsb  %gs:(%esi),(%dx)

pavlov · on July 26, 2022

Yes, it’s difficult these days to imagine the software situation of the early 1980s: how little software there was, how expensive it was, and how much effort was spent on copy protection that sometimes prevented legitimate customers from using it.

I was just reading a book called “Almost Perfect”, a first-hand history of WordPerfect by one of the founders. He mentions that they bought an IBM PC when it came out and wanted to port their word processor (over from the Data General minis that had been their initial platform). But they had to wait five months for an assembler to become available.

Imagine buying a $5,000 computer and you can’t even get an assembler for it for any money.

(WordPerfect was written entirely in x86 assembly, even when it was a billion-dollar product.)

kragen · on July 26, 2022

Writing an assembler for the 8086 doesn't take five months! Especially if you have a Data General mini to run it on, which already had an assembler and scripting with domacros.

In fact, if memory serves, RDOS had FORTRAN, ALGOL, and BASIC available, even if most of the system software was written in assembler.

Writing an assembler in BASIC on either the Nova or the IBM PC would have been considerably less hassle than writing it in assembly and hand-assembling it.

pavlov · on July 26, 2022

Presumably they didn't want to spend programmer time writing their own macro assembler when they knew others are working on a commercial product. The DG version of WordPerfect was shipping already, so they probably rather spent the time on improving that and waited for the PC-DOS ecosystem to evolve a bit.

kragen · on July 26, 2022

That's my inference, yeah.

ars · on July 25, 2022

> $180 if I recall correctly, which was a truly staggering number of birthday and Christmas presents

I had the same problems, only I was programming a Laser 128 (an Apple II clone) that had a built in System Monitor, which let you directly change memory addresses, but no assembler.

So I photocopied the Assembly to Machine Language chart in a book I got from the library, and did it manually - I learned quite a lot. I remember writing a layout and edge detection routine for a Tetris clone I tried to write (never finished it). Basic was too slow for those critical parts, so I called out to the routine, but did most of the code in Basic.

I remember been utterly wowed that Linux came with a free compiler - I could only dream of having software like that back then.

dhosek · on July 25, 2022

When OS/2 bundled a C/C++ compiler and a visual interface builder, I was sure there had to be a catch.

krallja · on July 26, 2022

The catch was you had to use OS/2. /s

krallja · on July 25, 2022

At least the 6502 has a fairly easy to remember ISA!

alfiedotwtf · on July 25, 2022

Holy shit thanks for the nostalgia blast... I totally forgot about MS-DOS debug.exe!

retrac · on July 25, 2022

If you sell a commercial compiler or assembler, and some company starts selling a product produced with your compiler/assembler, and you know you never sold them a license, you've got yourself someone to sue. For a less successful/commercial product, it could also just be for the joy of finding it used in the wild.

Similar fingerprinting techniques were also used by assembly coders as proof of copyright. When reverse-engineering to a spec, it's not unusual for routines, even long routines, as well as lookup tables, etc. to end up being coded the same, or nearly so, as the original software. This isn't necessarily infringing of copyright -- for example there are only a few sensible ways to check if a character is uppercase in ASCII. It can be hard to tell reverse-engineered and re-implemented code, from disassembled and reshuffled code (which is copyright infringement). If the suspect routine incorporates something like an idiosyncratic sequence of NOPs that serve no particular purpose, coincidence is a lot harder to argue.

woodruffw · on July 25, 2022

Not for fingerprinting, but I wrote a tool that does this to hide messages in x86 binaries[1]. I also wrote an explanation here[2].

[1]: https://github.com/woodruffw/steg86

[2]: https://blog.yossarian.net/2020/08/16/Hiding-messages-in-x86...

bell-cot · on July 25, 2022

One example (scroll down to 'Code fingerprint') - https://en.wikipedia.org/wiki/A86_(software)

rep_lodsb · on July 26, 2022

Can anyone find a version of A86 that actually encodes some information in its output, or is this just a legend? It produces different opcodes for some instructions than MASM, but so do DEBUG and various other assemblers. And the same instruction repeated multiple times always assembles to the same opcode as far as I have seen.

To prove some program was made with A86, there would have to be a more specific signature that wouldn't match any other assembler, like a copyright message encoded bit-by-bit in the opcode choice.

rep_lodsb · on July 26, 2022

Tried it out now with different instructions. There's a pattern, though it's not quite a steganographic message, just a different encoding chosen depending on the second register operand.

    02 C0  ADD AL,AL
    02 C1  ADD AL,CL
    00 D0  ADD AL,DL
    00 D8  ADD AL,BL
    02 C4  ADD AL,AH
    02 C5  ADD AL,CH
    00 F0  ADD AL,DH
    00 F8  ADD AL,BH

Don't know if this would hold up in court for proving that some code was assembled with A86...

edit: other ALU ops are the same; for MOV the "direction bit" is inverted.

a1369209993 · on July 26, 2022

That's not a a86 thing; it's to avoid generating a modrm byte that can be decoded as a return instruction[0] for ROP gadgets.

  $ zasm -Mx86 -='add al al;add al dl;add dl al'
  000000 - 02 C0 00 D0 02 D0                                |......          |
  $ hex C3 | ndisasm -au -
  00000000  C3                ret

0: C2, C3, CA, or CB, so modrm (ax/cx) (dx/bx)

rep_lodsb · on July 26, 2022

ROP wasn't a concept back then. From the manual of the latest version of A86:

    A86 assembler package   V4.05       January 14, 2000

    The entire package is Copyright 2000 Eric Isaacson. All
    rights reserved.

ralferoo · on July 26, 2022

a86[1] was definitely one that did.

It was sold under a shareware model, and if you used it commercially you were supposed to buy the licence. I guess they randomly sampled other people's software to see if it looked like it was built with a86 or not, but honestly I'd never actually heard of them doing that and suspected that it was just enough of a threat to know it was possible that they hoped people would pay up.

[1] https://en.wikipedia.org/wiki/A86_(software)

pdimitar · on July 26, 2022

Slight off-topic and tangent:

I am looking into writing some basic compiler(s) but I really want to emphasize micro-optimizations, maybe even the stochastic ones -- don't ask why, it's just an obsession.

Question: is there an Arduino-like hardware that allows FULL visibility into the efficiency of every piece of code ran on it? On my first job, 20 years ago, there was a team working on embedded boards that maybe did something similar.

But again, I am looking for a piece of hardware -- and a software stack for it -- where you say "here's this chunk of assembly / machine code, run it and give me detailed statistics which instruction got ran the most times, which one took the longest time" etc.?

Is there something like that out there?

kazinator · on July 25, 2022

> "there was no room"

They were saving it for 286, 386, ... MMX, AVX, SSE, ...

olliej · on July 26, 2022

x86 is variable length, and so extensions were able to use prefix bytes to distinguish them.

On the plus side that means common instructions are still actually compact so reducing icache pressure.

rep_lodsb · on July 26, 2022

It's not so compact anymore!

Using 64-bit registers, or any of R8-R15, requires a prefix byte. The single-byte INC/DEC opcodes have been repurposed for this. And for performance reasons having to do with how it affects the flags, you don't want to use INC/DEC (now two bytes) at all.

So now the very common operation of incrementing a register takes at least three bytes, four if it needs the prefix. Legacy instructions have been made invalid - depending more on the whims of Intel/AMD than whether they are actually useful - but still clutter up the opcode space.

olliej · on July 26, 2022

Obviously it has expanded, and not saying "let's completely replace the ISA in 64bit modes" means that a bunch of cruft is going to be consuming the ISA space. At the same time there is a lot of code that isn't using 64bit values so the compact coding can still be used.

But while talking about ISA size and compactness, we need to compare to the real world alternative: the various ARM or RISC-V ISAs, etc are all 4byte per instruction with the occasional 2 byte variable length extensions*. 4 byte does seem to have been what has been selected for.

My experience with code size is wrapped up in jitting JS so I could believe my experience isn't as reflective of real world code size as it could be :D

* fun story: thumb2 had an early bug/errata where a 4 byte branch spanning a page boundary would jump to the wrong page.

JadeNB · on July 26, 2022

I thought the trailing-off title was some sort of clickbait (which would be weird for Chen!); but actually the title is "Yes, the 8086 wanted to be mechanically translatable from the 8080, but why not add the ability to indirect through AX, CX and DX?", and I guess it just got asemantically chopped on HN's end. (To avoid my own clickbait, the one-sentence answer from the article, on which of course the rest of the article elaborates, is "Basically, because there was no room.")

quickthrower2 · on July 26, 2022

Suggestion: Remove the “Yes, “ and “, but” from the title to make it more HN-ish.

rep_lodsb · on July 26, 2022

Memory safety should have been more a concern for the design of the 8086!

And if 8080 backwards compatibility is really necessary, we can write an emulator in Rust, compiled to WebAssembly and running on RISC-V. /s

hacknat · on July 25, 2022

My favorite Chen article:

Why do Windows functions all begin with a pointless MOV EDI, EDI instruction?

https://devblogs.microsoft.com/oldnewthing/20110921-00/?p=95...

omnibrain · on July 25, 2022

It's such a shame that they killed (twice now?) all the comments in the blog.

mananaysiempre · on July 25, 2022

There is a more-or-less comprehensive archive[1] up to 2019 (which should probably be scraped and hoarded). The article in question is there, comments included[2].

Side note: Michael Kaplan’s blog, which Microsoft took down in a remarkably shameful manner, has also been archived[3], while Eric Lippert reposted (most of?) his old articles on his personal WordPress instance[4].

[1] http://bytepointer.com/resources/old_new_thing/index.htm

[2] http://bytepointer.com/resources/old_new_thing/20110921_226_...

[3] http://archives.miloush.net/michkap/archive/

[4] https://ericlippert.com/

omnibrain · on July 25, 2022

Larry Osterman's Blog seems to be gone too...

mananaysiempre · on July 26, 2022

It’s still there[1], although a number of the posts seem to be missing. Inexplicably, (most of) the comments are in place. (There are other Microsoft blogs that have disappeared without a trace, but not this one.)

[1] https://docs.microsoft.com/en-us/archive/blogs/larryosterman...

stevekemp · on July 26, 2022

I haven't heard that name in a few years, but I remember quite vividly that I found his blog by searching for "+int03 +blog".

At the time that seemed like a good way of getting low-level code-related posts.

game-of-throws · on July 25, 2022

They also changed the URL at least once and broke all links to it. His posts deserve better.

password4321 · on July 25, 2022

Yes, I miss my weekly-ish from Raymond's arch nemesis Yuhong Bao.

For this specific blog post there's an HN discussion¹ and archive.today² (beware!?³) did grab comments while the Internet Archive does not load them:

¹ https://news.ycombinator.com/item?id=3022224

² http://archive.today/2015.07.08-134638/http://blogs.msdn.com...

³ https://news.ycombinator.com/item?id=31945924#31946453

m463 · on July 25, 2022

nice use of superscripts

Waterluvian · on July 26, 2022

Would this instruction be optimized away by… I dunno, out of my league here… the CPU microcode or CPU design itself?

Or is it that when you get to assembly-level instructions, you can have total confidence that everything you’re reading will be executed as-is ?

mFixman · on July 25, 2022

I don't understand the reasoning behind this. Why do you need 5 bytes of unexecuted patch space before the program _and_ 2 bytes of patch space at the beginning of the program?

Wouldn't it be the same to have a single 5-byte effectless operation to patch a single long jump instead of needing space for two jumps?

layer8 · on July 25, 2022

The article explains why a MOV is used instead of two NOPs. Five NOPs would obviously be even worse.

colejohnson66 · on July 25, 2022

But NOPs aren’t even executed. They’re swallowed by either the decoder or dispatcher.

avianlyric · on July 25, 2022

Because you can’t atomically replace the NOPs. So there’s nothing to prevent you from inserting your patch while a thread partway through consuming the NOPs, resulting in a portion of your patch being decoded out of order.

wvenable · on July 25, 2022

The article states that it's one cycle and slot per NOP.

colejohnson66 · on July 25, 2022

Modern x86 processors decode multiple instructions per clock. By “slots”, I’m assuming he means entries in the dispatcher or reservation stations. But NOPs don’t even make it to there. As I said, the decoder that encounters it will probably swallow it and emit nothing.

Besides, it sounds like premature optimization. This isn’t the 1980s; An extra clock cycle per function call is not going to make or break your program.

MBCook · on July 25, 2022

Modern.

There is a very good chance this dates back to 16-bit Windows. Even Windows 98 supported the 486 which was not capable of independent execution (that’s P5) or separate decode from execution (P5Pro there).

Those processors weren’t dead until Windows XP.

wvenable · on July 25, 2022

At the time this was relevant, it wouldn't have been premature optimization. Reducing that many cycles per function call would be a reasonable win.

mFixman · on July 25, 2022

But isn't there a 5-byte single instruction that has no effect, like `NOP DWORD ptr [EAX + EAX*1 + 00H]`?

I thought that multibyte NOPs were executed in a single instruction?

MBCook · on July 25, 2022

They may not have been coalesced at the time the decision was made.

layer8 · on July 25, 2022

I’m pretty sure it would be slower, if only by taking up more space in the instruction cache (in the common case where no hotpatch is applied).

saagarjha · on July 25, 2022

5 bytes of nops takes longer than 2 bytes?

jahewson · on July 25, 2022

Longer time to execute.

f311a · on July 25, 2022

After reading the title and domain name, I already knew who the author is.

commonlisper · on July 25, 2022

Exactly! Raymond Chen is doing a great job at sharing interesting stories from a bygone era.

criddell · on July 25, 2022

Bygone? It's a fairly regular occurrence where I search for some Win32 API and end up on Chen's blog. I get what you are saying about articles like this one though. I can't imagine much new work is happening with the 8086 (or 8088 or '186 or '286 or ...)

GabrielTFS · on July 25, 2022

I mean even w.r.t. Windows stuff, quite a large part of what he talks about is variations on "here's an interesting old anecdote from 1995" or "You know that thing that's totally useless now and doesn't matter ? Here's why we absolutely needed it in 1989."

rjbwork · on July 25, 2022

Lol. My exact thoughts before clicking the link "Let me guess, this is Chen"

AceJohnny2 · on July 25, 2022

Everybody Loves Raymond