This vulnerability was not caused by OoO or speculative execution. It was caused...

gumby · on Nov 14, 2023

> It was caused by the fact that x86 was designed 45 years ago, and has had feature after feature piled on the same base, which has never been adequately rebuilt.

Itanic would like to object! Unfortunately it can’t get through the door.

epcoa · on Nov 14, 2023

Not entirely pointless, redundant prefixes are occasionally the useful method for alignment.

TheCoreh · on Nov 14, 2023

A more sensible approach for that use-case would be IMO to have well-defined specialized prefixes for padding, instead of relying on the case-by-case behavior of redundant prefixes. (However I understand that there's almost certainly a good historical reason why this was not the way it was done)

kccqzy · on Nov 14, 2023

The easiest way of doing padding is to add a bunch of `nop` instructions which are one byte each.

If you read the manual, Intel encourages minor variations of the `nop` instructions that can be lengthened into different number of bytes (like `nop dword ptr [eax]` or `nop dword ptr [eax + eax*1 + 00000000h]`).

It is never recommended anywhere in my knowledge to rely on redundant prefixes of random non-nop instructions.

epcoa · on Nov 14, 2023

NOPs are not generally free.

It's a pretty old and well known technique:

https://stackoverflow.com/questions/48046814/what-methods-ca...

Note that this technique is really only legitimate where the used prefix already has defined behavior with the given instruction ("Use of repeat prefixes and/or undefined opcodes with other Intel 64 or IA-32 instructions is reserved; such use may cause unpredictable behavior."), and of course the REX prefix has special limitations. The key is redundant, not spurious. It is not a good idea to be doing rep add for example. But otherwise, there is no issue.

epcoa · on Nov 14, 2023

The prefixes are redundant so it's not really case-by-case behavior. You're just repeating the prefix you would be using anyway in that location.

Using specialized prefixes wastes encoding space for no real gain. You realize on most common processors NOP itself is a pseudo-instruction? Even the apparently meme-worthy (see sibling comment) RISC-V, it's ADDI x0, x0, 0.

tedunangst · on Nov 14, 2023

And then there are CPUs that retcon behavioral changes onto nops.

> Moving a register to itself is functionally a nop, but the processor overloads it to signal information about priority.

https://devblogs.microsoft.com/oldnewthing/20180809-00/?p=99...

_a_a_a_ · on Nov 14, 2023

> A program can voluntarily set itself to low priority if it is waiting for a spin lock

What does this even mean? How can a program do this when thread priority is an OS thing? It's seems just weird.

epcoa · on Nov 15, 2023

Hardware threads as in SMT means thread priority is also a hardware thing.

tedunangst · on Nov 15, 2023

It's an SMT CPU that dynamically assigns decode, registers, etc. https://course.ece.cmu.edu/~ece740/f13/lib/exe/fetch.php?med...

shadowgovt · on Nov 15, 2023

Usually, the historical reason is that adding the logic to do something well-defined when unexpected prefixes are used is going to cost ten more transistors per chip, which is going to add to cost to handle a corner case that almost nobody will try to be in anyway. Far better to let whatever the implementation does happen as long as what happens doesn't break the system.

The issue here is their verification of possible internal CPU states didn't account for this one.

(There is, perhaps, an argument to be made that the x86 architecture has become so complex that the emulator between its embarrassingly stupid PDP-11-style single-thread codeflow and the embarrassingly parallel computation it does under the hood to give the user more performance than a really fast PDP-11 cannot be reliably tested to exhaustion, so perhaps something needs to give on the design or the cost of the chips).

bobim · on Nov 14, 2023

Are new ISA solving this? Time to move to Risc V?

dontlaugh · on Nov 14, 2023

RISC V is not great at this either, with the compression extension being common and variable length.

ARM 64 gets this right, with fixed length 32 bit instructions.

snvzz · on Nov 15, 2023

>ARM 64 gets this right, with fixed length 32 bit instructions.

At the expense of code density, yet RISC-V is easy to decode, with implementations going up to 12-way decode (Veyron V2) despite variable length.

ARM64 hardly "gets it right".

camel-cdr · on Nov 15, 2023

I wouldn't say ARM64 gets it wrong either, I think both are viable approaches.

snvzz · on Nov 15, 2023

Both approaches are viable, but RISC-V's approach is better, as it provides higher code density without imposing a significant increase in complexity in exchange.

Higher code density is valuable. E.g.:

- The decoders can see more by looking at a window of code of the same size, or we can have a narrowed window.

- We can have less cache and save area and power. We can also clock the cache higher, enabled by it being smaller, lowering latency cycles.

- Smaller binaries or rom image.

Soon to be available (2024) large, high performance implementations will demonstrate RISC-V advantages well.

epcoa · on Nov 14, 2023

N/A and No.

iforgotpassword · on Nov 14, 2023

Because they cost no/less cycles compared to NOPs?

tedunangst · on Nov 14, 2023

See http://repzret.org/p/repzret/