Speculative execution is what makes our sequential code fast by exploiting parallel execution pipelines that would otherwise sit idle.
Avoiding this would increase complexity somewhere else, e.g. compilers would have to become even more clever or programmers would have to work harder to reduce the branches in their code. Sprinkling code with annotations to prevent speculative execution around sensitive data is of course just another way complexity does go up to let us keep the performance gains of speculation.
> Avoiding this would increase complexity somewhere else, e.g. compilers would have to become even more clever
Not really. Compilers are already stupendously complicated since they inherit a ton of techniques used for older processors that required, e.g. instruction scheduling and other tricks. CPUs getting bigger and faster just added additional complexity to extract even more dynamic parallelism and also do the job of simpler compilers.
It's much easier to turn off the complexity in a compiler and reason about the resulting program. The complexity in the CPU cannot be switched off and is closed source.
They were definitely onto something. They made something competitive performance-wise that wasn't vulnerable to all of these speculative execution vulnerabilities.
Itanium's slowness is generally very exaggerated (at least in part because the first Itanium had a rather slow memory subsystem, and the performance kind of sucked as a result). Circa 2008 or so the fastest database servers available were Itanium. Unfortunately, it emulated x86 extremely slowly and amd64 ran x86 very quickly, so AMD kinda ate Intel's lunch.
If I were at Intel management I might explore with engineering resurrecting the Itanium (rebranded and modernized of course). Today with so much open source there is less instruction set lock in, and with all these vulnerabilities you might be able to market it as a more secure architecture. In that case you might only need to equal x64 performance.
Maybe just don't speculate across power domains? I.e. if a speculative execution would require bringing up the AVX2 unit that might be a bad idea anyway because it slows down everything else even if the branch is mispredicted.
Or at least execute it on the slow path if the unit is not up (lower execution width) without triggering the ramp-up as long as it is speculative. Only ramp-up when the instruction is committed.