IMO the base+index encoding (in 16-bit x86) was a mistake, it should have simply allowed indirection through any single register (+ immediate offset), including SP.
• the address calculation would be simple enough to do hardwired even on the original 8086, instead of using slow microcode. The 8086 actually took a different amount of clock cycles depending on the registers used for base and index, because there wasn't enough ROM space to duplicate the code for every combination without using jumps!
• you could use [SP+xxxx] to access local variables (frame pointer omission)
• would be able to emulate the 8080's LDAX/STAX and XTHL in a single instruction
• with so few registers, you really want every one available for memory addressing. Especially AX which is used implicitly by instructions like LODSW and MUL.
TBH I'm still flabbergasted that modern SoCs will "just put a CPU there" rather than a DMA, because we can actually afford it. 6502 cores are sprinkled around like fairy dust.
I remember someone who was acutely familiar with the design of my first computer (the Ohio Scientific Superboard II, an awesome 1970s 6502 machine! http://oldcomputers.net/osi-600.html ) saying that he spotted the same hardware being used, in its entirety, as a printer buffer a few years later. (A printer buffer was basically a device used to cache the output of a computer to a slow printer so the original machine didn't have wait for the printer to finish. A glorified cable, in other words.)
I cut my computing teeth on a sibling of that machine, an Ohio Scientific C2-8P that my brother and I bought in 1979. I was 12, and reverse engineering the games that came on the cassette tapes with the machine are how I got started in software. The arduino uno (atmel/microchip AT328p) is a much more powerful computer lol.
The main reason why we have network is not for pc to pc communication but sharing printer and hard disk. Communication comes later. Do not underestimate the productivity gain of printer buffer.
Well, I was pre-emptively explaining it, because I realised in mid-anecdote that printers don't need that kind of thing these days. Maybe you are right, and it is obvious to someone skilled in the arts, as the patent applications say!
They do need them, it's just that the buffer is built in to the printer now because memory and integration are so cheap.
The giant Kyocera printer at work has a little screen where it shows you it not only loading up the print buffer, but purging the information from the disk afterward so there is no trace of PII remaining.
They have them, because, as you said, it's so cheap. But I think dannyobrien is correct; they probably don't need them, because modern OSes have print queues and the computer is no longer useless while printing.
Even if the printer is dumb and doesn’t interpret a page description language such as PostScript, I don’t think it’s feasible to get the timing right to drive modern printing hardware when the buffer is on some random other piece of hardware whose OS you don’t even control, even if the connection is by cable.
It’s nice to be able to run off a long document, pack up your laptop, and pick up the document on your way out the door to the (courtroom|client|boardroom|exam hall|___etc___).
1982 Victor 9000 / Sirius 1 designed by Chuck Peddle of Commodore/MOS 6502 fame had floppy capacity extended from standard IBM PC 180KB all the way up to 600KB per side. 1.2MB floppy in 1982 using same magnetic medium! How? Software controlled drive with 6502 doing GCR combined with Zoned Constant Linear Velocity and Zone bit recording. There were still hard drives being manufactured not using ZBR 8 years later (Seagate ST-157A)!
There was also an IBM PC 8088 clone in ~1984 with "dedicated Video DMA" implemented by Intel MCS-48 microcontroller (8042/8048, same chip as keyboard controllers at the time), sadly I cant remember the name and google fails me :( I do remember nobody used it for anything and the only coding example I could find at the time was a singular YT video where someone reverse engineered manufacturer demo and reused it for its own sprite engine.
Most FPGA projects need a processor anyway, usually adding a soft core, but a hard core is superior in area and performance -- though you cannot choose the type of core anymore, obviously. I'd argue this is a different scenario because they didn't "sprinkle cores" because they had the area but actually reduced the area needed.
In college (more than a decade ago for me, but still in this millennium) our MOS 6502 hardware/assembly programming lab was called "Microcontroller Design". This was as opposed to the later lab class "Microcomputer Design" which used the more recent Motorola 68k. I suppose the chip that powered the original Macintoshes is slightly closer to today's idea of a microcomputer than the 6502 and its original "microcomputer revolution" Apple II/TI Home Computer/Commodore 64. One of the takeaways from those courses was that the "microcomputer" has moved on so much from what you can breadboard in a lab (the Motorola 68k was one of the last "microcomputer" chipsets that you could breadboard and even that was still a hardware reliability challenge), but MOS 6502 was cemented as a "forever" tool as a "microcontroller". You can still get ICs in bread board forms and build it next to classic TTL logic chips and test/prototype it at human scale clock cycles (things you can see/read in an oscilloscope in real time). You can merge into an SoC core and run it at modern processor clock cycles. There are so many wild uses for the MOS 6502 as a "microcontroller" on the periphery of computing. Between MOS' bankruptcy and the many licensees of the 6502 and the many people that cut their programming teeth in the "microcomputer revolution" of the Apple II et al, the 65xx family likely really is a vestigial "forever chip".
As a programmer, and I've heard this sentiment from others too, I also think the 6502 was also the last "fun" assembly language to write directly without a higher level language, which adds to its legacy.
For a lot of things a 6502 or other 8bit cores are more than enough. Don't need to put a giant ARM or RISC-V into everything to drive a simple display.
For the core alone, sure, but don't you have to run them off static ram or something (that is, not slow DRAM) to get anywhere near those speeds in practice?
While it does exist, I think the answer is not really. Yes sometimes you don't need more than 8-bit but existing solutions from other vendors beat it with integrated RAM+Flash+IO etc
I'm not talking about adding the 32-bit ModRM & SIB byte to the 8086, just the single-register subset with no special cases. Decoding would be exactly the same as for register operands. And there would still be microcode, just not for adding the base+index register - maybe for adding an offset if it is present.
Yes, the 8086 was simple, but all the parts needed to do this should be present: whether an instruction uses register or memory operands is entirely under the control of hardwired logic, the microcode is the same for both (the address calculation subroutine runs first, but is invoked automatically by the hardware).
I would say Intel chose to make addressing more complicated in order to add a feature that complicated register allocation for very little usefulness. If you see something like [BX+SI] in disassembled code, in 99% of cases it's not an actual instruction, but data or misaligned code.
The base+index encoding is a significant code size reduction though, and as the element size is a restricted set of powers of two, the scaling is cheaper than a generic shift. So base+index uses less memory (less icache specifically), less instruction dispatch overhead, and is able to perform the needed math more efficiently. I assume in modern chips most of these costs aren't really an issue, but that wasn't the case back when the ISA was created.
Another big mistake was to make the paragraph size 4 bits. Should have been at least 8 -- that would have given the 8086 a 24-bit address space, which would have made things verrry interesting in the MSDOS / early Windows world.
(I know, they didn't have address pins on the 40-pin package. Still, an architecturally guaranteed 16MB of room to play in would have changed the face of the software industry in the 1980s).
> I know, they didn't have address pins on the 40-pin package.
You are right that doesn't have to be a limitation on internal architecture: The 386SX internally had the 386's 32-bit data bus only 24 bits were carried out to the external pins. This, along with squeezing the 32-bit data lines through a 16-bit data bus by doubling up requests, made it able to be used on cheaper motherboards based on 286 designs.
I'm not sure that this would have made enough difference to be worth the cost of the extra silicon at that time of the 8086's design though. While a flatter and more roomy address space would have been convenient to developers there would be little or no benefit for end users. The 386SX was a success because it could run software intended for the original 386 (then renamed 386DX) on cheaper hardware (and had the same MMU allowing virtual memory so could run larger software then it's 16Mb physical RAM limit made possible, at a further performance cost). Also the extra silicon requirements were not as expensive at the time the 386SX was being designed as when the 8086 was.
In fact there was a chip that did similar for the 8086: the 8088. It had an 8-but external address bus and broke up 16-bit requests to make them over it. Though unlike the 386SX it maintained the same address bus of its relative.
> an architecturally guaranteed 16MB of room to play in would have changed the face of the software industry in the 1980s
I don't think it would have, because such amounts of RAM were prohibitively expensive into the late 80s. It would have changed the way software accessed the available RAM, made life easier for developers and improved compatibility be removing various kludges, but largely computers would have looked and behaved similarly, having largely similar specs.
VisiCorp was developing their VisiOn platform in the 1984-ish timeframe. They were bending heaven and earth to fit into 640K.
Telling their corporate customers, "Buy another 512K of RAM" would probably have saved their company and let them ship a number of world-class productivity products, and given Windows some interesting competition.
It's easy to second guess this stuff, on the other hand 512K of RAM was about $120 in 1985. We shipped the Atari ST with that much, when having that amount of memory on a Mac was pretty rare.
• the address calculation would be simple enough to do hardwired even on the original 8086, instead of using slow microcode. The 8086 actually took a different amount of clock cycles depending on the registers used for base and index, because there wasn't enough ROM space to duplicate the code for every combination without using jumps!
• you could use [SP+xxxx] to access local variables (frame pointer omission)
• would be able to emulate the 8080's LDAX/STAX and XTHL in a single instruction
• with so few registers, you really want every one available for memory addressing. Especially AX which is used implicitly by instructions like LODSW and MUL.