This absolutely vindicates Xen's approach vs Linux's. Performance gained by spec...

ajross · on Aug 26, 2022

> This absolutely vindicates Xen's approach vs Linux's

Meh. It likely just points out that Xen is a much more limited software environment with much less dependence on indirect branching[1]. There are environments where IBRS has high cost and ones where it doesn't. Linux is in the former category.

Xen also has the advantage of being a hypervisor, meaning that if all they do is expose IBRS to the guest, they can (somewhat cleverly) claim that any resulting vulnerabilities are the fault of the guest software not implementing them. Linux exposes a Unix userspace, and no one told userspace apps they need to use speculation barriers.

Really this article is mostly just marketing. It's a win for Xen, sure, and they should crow about it. But we should recognize crowing vs. genuine security analysis, too.

[1] Vs. say, Linux, which has an extremely robust polymorphic device/bus/probe model where all the methods are function pointers.

gizmo686 · on Aug 26, 2022

It is easy to be fast if you are willing to ship broken software. The Linux solution was broken. People knew the Linux solution was broken when it was shipped. The Linux developers knew what a non broken solution would be because the CPU manufacturer told them. [0] Linux decided to go with the broken solution. This attitude is not specific to Linux, it is pervasive throughout the entire industry. It is the reason that few people take a security volnerability report seriously until someone turns it into a full exploit.

Frankly, Xen's work here was not at all impressive; they just applied a fix that Intel told everone to apply. The fact that this is a differentiating thing for them to market with is an indictment of everyone who didn't apply it, and the industry conditions that led to them.

[0] In fairness, the reason we are in this mesz is that said CPU manufacturer has been releasing broken products in the name of speed themselves.

ajross · on Aug 26, 2022

> The Linux solution was broken. People knew the Linux solution was broken when it was shipped.

That is not a fair characterization. There are endless mountains of theoretical vulnerabilities[1], and no one (certainly not including Xen) tries to mitigate them all blindly. The dwm post linked in the article explicitly says he's not losing sleep over the issue. Everyone (yes, likely including Xen) believed in good faith that this was not practically exploitable.

> Frankly, Xen's work here was not at all impressive; they just applied a fix that Intel told everone to apply.

And this seems like a misunderstanding too. My gathering from the linked article is that Xen virtualized the barrier mechanism such that the job could be farmed out to guest OSes. Someone running an unpatched Linux under Xen (which is, what, 90+% of the worldwide cloud?) is still vulnerable. But "Xen" is not, which seems maybe less impactful than the marketing being presented would have you believe.

[1] Rowhammer says hi.

bonzini · on Aug 26, 2022

> Xen virtualized the barrier mechanism such that the job could be farmed out to guest OSes.

All hypervisors do that, including KVM. The difference is that because Xen has to let the guest control the speculation control MSRs, it has to read and write the MSR anyway on every guest<->host context switch. Using IBRS in Xen comes essentially for free.

Linux on the other hand does not have to access the speculation control MSR on every userspace<->kernel context switch, and doing so would have had a bigger performance impact than retpolines. Therefore it took a different approach.

Now the performance impact wasn't that bad on Skylake and it probably would have been good to use IBRS on those processors. FWIW very old versions of RHEL (6 and 7) in fact did use IBRS instead of retpoline because we had little time (there were less than two months from the time the team was put together to the time we had to have something ready to be shipped to customers) and it even took days to read people on the issue because of how secret it was. So we didn't want to put the compiler update on the critical path.

paulmd · on Aug 26, 2022

> There are endless mountains of theoretical vulnerabilities[1], and no one (certainly not including Xen) tries to mitigate them all blindly.

I mean, not nobody. That's sort of the raison d'etre of OpenBSD.

We're talking about a distro that wasn't affected by the latest round of speculation vulnerabilities in AMD's SMT implementation because as soon as they heard about Spectre/Meltdown they immediately realized that SMT was gonna be a giant pile of sidechannels and disabled it on all processors, even the ones that were believed safe at the time. They take "defensive engineering" extremely seriously and will mitigate anything that seems plausible.

That was controversial at the time (extreme performance cost! and AMD isn't affected so why do they have to suffer!?) and they ended up being right, there were more vulnerabilities to come based on SMT leaking data to the other thread.

Nobody mitigates implausible/theoretical ideas that don't seem likely to work, but, a good software engineer certainly should be mitigating things that seem like reasonably feasible extensions of existing attacks, and hardening their environments in general to mitigate the impact if something should pop up. That's not extraordinary foresight, that's just part of the job.

Linus's decision did not follow good engineering practices, and there are examples of other OSs and distros that did do it properly. Xen may or may not not be one of them, it's certainly possible to accidentally fall into a safe path (as AMD likely did on Meltdown, given the broad multi-vendor scope of the vuln), or the "right path" could simply have been easy for them to take, but nobody should be defending Linus on the basis of "nobody could have known". The decisions he made were unsafe and incompatible with a defensive-engineering mindset, and he was told this at the time.

Linus's "why are we doing all this over a handful of broken intel processors" mindset is exactly the trap that OpenBSD avoided falling into. They knew it wasn't just going to be just a handful of broken intel processors, SMT is fundamentally a shared resource and once they saw the basis of Spectre-style sidechannels they knew SMT was gonna be a steady drip-drip-drip of vulnerabilities across all architectures. That was very foreseeable, when I saw the OpenBSD thing at the time it was like "yeah, probably gonna end up being a good call...".

For another "yeah, probably gonna be a problem down the road": KPTI really needs to be enabled-by-default on AMD processors. The Prefetch+TLB attack is still un-mitigated in hardware and AMD relies on KPTI for protection, but still recommends it be disabled by default for performance reasons. The data bleed rate is faster than Meltdown and it's really past time to turn it on by default regardless of what it does to AMD's benchmark numbers. It should have been on-by-default in the first place, and now it's actually got demonstrated exploits leaking kernel memory. Another risky, non-defensive call from the Linux tech-leads.

ajross · on Aug 26, 2022

Uh... OpenBSD is susceptible to Retbleed, which doesn't involve SMT behavior, only branch prediction state on a single CPU. The very subject under discussion seems to invalidate your point. OpenBSD, like everyone else, made a call not to patch this vulnerability proactively because it didn't seem exploitable. And like the rest of us, they were wrong (a little -- Retbleed remains a very slow channel, but it's real).

leoc · on Aug 26, 2022

> [0] In fairness, the reason we are in this mesz is that said CPU manufacturer has been releasing broken products in the name of speed themselves.

As bad as Intel's record there was, it's hard to really single it out either. It certainly seems as if the whole industry—CPU manufacturers, integrators, academics, kernel devs, the lot—simply agreed not to notice that this category of vulnerability existed until the moment it was fully impossible to ignore.

bsder · on Aug 27, 2022

IBM noticed. DEC noticed. There were a few others sounding the alarm from way back.

But, hey, those Intel/AMD chips sure are cheap! And, let's be honest, that's all that anybody cares about.

plam503711 · on Aug 26, 2022

The real golden-"I told you so" (that triggered the idea to write this very blog post) comes from a tweet of David Woodhouse last July: https://twitter.com/dwmw2/status/1549042968320811008

eru · on Aug 26, 2022

Direct link: https://lkml.org/lkml/2018/1/22/598

dwmw2 · on Aug 28, 2022

Hah, I wondered why there was an uptick of attention to that old tweet.

We should be slightly careful — while I can't deny that there's a small element of "ITYS" about that tweet, should it really count if I then left it up to Intel to follow up?

I whined that we hadn't shown that it was safe. The credit really does need to go to the RETbleed folks who put in the work to truly demonstrate that it wasn't.

jscipione · on Aug 27, 2022

I understand why in a data center hosting VMs speculative execution would be impractical, yet on my personal computer speculative execution gives a nice performance boost copying small files.

dontlaugh · on Aug 27, 2022

Your computer also runs attacker controlled code, especially in browsers.