Is it time to rewrite the Operating System in Rust? (2018) [video]

bcantrill · on Oct 18, 2021

This talk proved prophetic for me; since giving it three years ago, most of my technical work has been in the development of a de novo operating system in Rust -- not for its own sake, but rather because it is the right approach for the problem at hand. I talked about this a bit in a recent talk[0], but expect much more here soon, as we will be open sourcing this system in a few weeks.

Beyond the new system (called, appropriately enough, Hubris), we have been using Rust at more or less every layer of the stack: microcontroller firmware, early boot software, operating system kernel, hypervisor, and (of course) user-level. Again, this is not by fiat or to do it for its own sake; we are using Rust emphatically because it is the right tool for the job.

More generally: Rust has proved to be even more important that I thought it was when I gave this talk. My blog post from a year ago goes into some of this updated thinking[1], but honestly, my resolve on this continues to strengthen based on my own experience: Rust is -- truly -- a revolution for systems software.

[0] https://www.youtube.com/watch?v=cuvp-e4ztC0

[1] http://dtrace.org/blogs/bmc/2020/10/11/rust-after-the-honeym...

infogulch · on Oct 18, 2021

I liked your talk, and the talk you referenced by Timothy Roscoe [0]. My understanding of your talks is that the issue we seem to be running into with system architecture design is that OS and userspace developers are clinging desperately to a dead model of the system as a homogeneous array of cores attached to large bank of unified memory. This falsehood is so deep that systems basically lie to us about 95% of their internal structure just so that we can continue playing out our little fantasy in obliviousness.

The biggest component of that lie is the unified memory assumption. To be fair to OS & app developers, writing for NUMA is hard, there are enough invariants that must be continuously upheld that it's impossible to just expect authors to keep everything in their in their head at all times. And using a language like C, described as "portable assembler", does not help at all.

Enter Rust, where the novel ownership system and strong type system allows encapsulating special knowledge of the system and packaging it up into contained bundle that can't be misused. Now you can compose multiple of these un-misuse-able (it's a word now) lego bricks reliably because the compiler enforces these invariants, freeing the author from the burden of reflecting on their design from N! perspectives every time they add a line. (Well, they still reflect on it, but they are privileged to reflect on it right after they make a mistake when the compiler complains instead of in an hours-long debugging session using a debugger or, worse, a specialized hardware debugging device.)

---

Your talk focuses on no_std, which is really a foundational necessity of such a new OS (the term "Operating System" feels too small now, maybe "Operating Environment"/OE? idk). I think the next important component is a path out of UMA-land, which I don't think is fully solved in Rust at the moment (that's not a knock, it's not solved anywhere else either). There's an ongoing Internals thread that started as a discussion about usize vs size_t across different architectures and now has dug down to questions such as "what even is a pointer?", "how are pointers represented as data?", and "how should you convert a bucket of bytes into a usable pointer?" [1] -- these are exactly the type of questions that Timothy's talk reveals as important (and has hitherto remained unanswered) and that you hinted at.

During the discussion, Ralf Jung presented an interface that would enable constructing a pointer from an integer by also separately identifying its provenance; I feel like this is a good direction.

    /// Returns a pointer pointing to `addr`, with the provenance
    /// taken from `provenance`.
    fn ptr_from_int<T>(addr: usize, provenance: *const T) -> *const T

---

What do you think of my summary? What do you think of the ongoing discussion on this Internals thread about the right way to construe Rust's memory model? What do you think of think of this idea presented by Ralf?

[0]: https://www.youtube.com/watch?v=36myc8wQhLo

[1]: https://internals.rust-lang.org/t/pre-rfc-usize-is-not-size-...

[2]: https://internals.rust-lang.org/t/pre-rfc-usize-is-not-size-...

infogulch · on Oct 18, 2021

I should start a blog instead of using HN comments as an outlet.

IanSanders · on Oct 18, 2021

Is it time to redesign the entire technology stack?

Edit: Feels like half measures won't move us very far (and taking a step back and refactoring everything, taking into account lessons we learned, while awesome, would likely not be financially viable)

KronisLV · on Oct 18, 2021

For better or worse, we'll eventually reach a stage a few centuries from now when the idea of rewriting all of the software that we're using won't be a matter of not having enough finances or manpower, but rather that the capacity of the human brain won't be able to deal with all of the layers upon layers of abstractions that will have accumulated.

Of course, perhaps it's therefore useful to improve the things that we can for now and push for more and more open software in all the levels of the stack, as opposed to introducing more and more abstractions and shiny new technologies before they're truly ready. Is Rust ready? I have no idea. Of course, people will always prefer having working software now rather than the ideal software for their grandchildren, as evidenced by the mentions of PL/I and the evolution of Linux.

naasking · on Oct 18, 2021

I doubt that software will look anything remotely like what we have now, so I don't think I can agree about all of the layers of abstraction. If I were to hazard a guess, I'd say that software will be described by high-level specifications and compiled to running code subject to a separable executable specification (defining things like resource limits, allocation behaviours, latency requirements, etc).

Superoptimizers and machine learning will be used to ensure the final code conforms to all required specifications, so people generally won't be dealing with lots of abstraction layers in the way they do now.

KronisLV · on Oct 18, 2021

> Superoptimizers and machine learning will be used to ensure the final code conforms to all required specifications, so people generally won't be dealing with lots of abstraction layers in the way they do now.

Have you seen the current state of model driven architecture and the state of trying to do metaprogramming of any sort?

The latter is very stack specific and is really hard to get right, whereas the former has never made its way out of academia. Admittedly, even the people who were forced to use something like the Eclipse Modeling Project ( http://www.eclipse.org/modeling/emf/ ) won't deny that there is definitely potential there, but to a large degree it has failed to materialize.

If there was as much hype around code generation as there was around the new releases of React, then maybe things would be different, but at large i personally doubt that the industry wants to take a step back and spend a decade on the building blocks, rather than getting things done in their day to day lives. Perhaps i'm really skeptical.

naasking · on Oct 18, 2021

> Have you seen the current state of model driven architecture and the state of trying to do metaprogramming of any sort?

Sure, but you described circumstances "centuries from now". Your objection about not taking a step back for a decade or two due to immaturity evaporates on that timeline.

IanSanders · on Oct 18, 2021

> capacity of the human brain won't be able to deal with all of the layers upon layers of abstractions that will have accumulated

That's already the case though.

Maybe we need to take a step back and design an elegantly simple, unambiguous, auditable/analyzable infrastructure, one that's more robust against becoming too complicated in the future. "Structural minimalism", to complement minimalism in user interfaces becoming more and more popular in the recent years.

KronisLV · on Oct 18, 2021

> Maybe we need to take a step back and design an elegantly simple, unambiguous, auditable/analyzable infrastructure, one that's more robust against becoming too complicated in the future. "Structural minimalism", to complement minimalism in user interfaces becoming more and more popular in the recent years.

This sounds good, but in practice there are two types of complexity:

  - domain complexity, which pertains to the domain that we're working in and essentially is forced upon us by it
  - accidental complexity, which is created by all of the sub-optimal decisions and implementation details

You can and should minimize the latter, absolutely, but a lot of the former isn't viable - making systems simpler in some ways (memory allocation and access, for example) could make them perform noticeably worse. No one (amongst the end users) wants to have their hardware become slower, especially when Wirth's Law is still in full effect ( https://en.wikipedia.org/wiki/Wirth%27s_law ).

It would be nice if we could, but even booting operating systems nowadays gets more and more abstractions and complexities added, just look at what Microsoft are doing with Windows 11, essentially creating e-waste for controversial reasons. However, i think that people can at the very least still be competent within small subsets of the larger system, with clearly defined boundaries amongst them, as well as interfaces.

kaba0 · on Oct 18, 2021

There is no such thing. Useful computations are by necessity complex (in both a human-centric view, as well as from a computability view).

There are bad abstractions, of course. But this is just a hay dream that we can design a system that would be less complex than the essential complexity of the problem at hand - and dealing with all the hardware, networking, etc is just simply hard.

worrycue · on Oct 18, 2021

> capacity of the human brain won't be able to deal with all of the layers upon layers of abstractions that will have accumulated

Isn't the point of abstraction so we don't have to deal with the complexity?

mamcx · on Oct 18, 2021

Totally, and probably to be like the older but better stacks :)

ie: Amiga, BeOS, Plan9, HyperCard, FoxPro, ...

ie2: Is not as much a start from zero. Good ideas are not in short supply, is that the bad idea go a full speed and by pure mass it become bigger and bigger. Eventually, a clean start is 100% the cheaper option.

We are due that point!

Sadly, the BEST starting point for that was the introduction of the iPhone (to make stuff like this possible helps a ton to be along a game-change)

guerrilla · on Oct 18, 2021

Anyone who watched the hour long video want to summarize? I dropped out after five minutes of pseudophilosophical rambling.

bitofhope · on Oct 18, 2021

The conclusion slide appears at about 1:02:00 so you can skip there if you just want a four-sentence answer to the question posed in the title.

The first five minutes or so lead up to a rough definition of what he means by "Operating System" for the purposes of this talk, essentially systems software that abstracts some physical or virtual model of hardware and runs programs that use those abstractions. However, in addition to the privileged mode OS kernel he includes things like system libraries and services in the scope of the subject.

This is followed by a historical overview of various operating systems and programming languages — particularly systems languages — and how they shaped each other.

Finally, Rust is compared and contrasted with historical and incumbent languages and the systems developed in those languages.

Despite his enthusiasm about Rust, Bryan finds that rewriting extant OS kernels in Rust should not be the top priority. The Operating System as defined in the introduction includes firmware and user mode services that can benefit from Rust's memory safety even if the system core remains in C or C++. Since Rust interoperates well with other systems programming languages, there is no need for an all-or-nothing approach.

I don't think my summary does justice to the talk. Bryan's talks very often look at things from a historical perspective, which I and others quite enjoy. The rambly tangents are actually my favourite parts of a Cantrill presentation and the conclusions may feel rather mundane if taken on their own. If you don't enjoy the first minutes, you may need to adjust your expectations or simply decide this style is not for you.

KronisLV · on Oct 18, 2021

I'll give it a watch and will summarize, because while i'm not always on board with the Rust hype (and therefore want to know more, to help eliminate my biases, one way or the other).

That said, i rather enjoyed Bryan Cantrill's talk from 2017, "Debugging Under Fire: Keep your Head when Systems have Lost their Mind • Bryan Cantrill • GOTO 2017": https://youtu.be/30jNsCVLpAE

So i wouldn't necessarily turn away from any of his videos just because of the sometimes humorous or not awfully serious tone.

KronisLV · on Oct 18, 2021

Okay, so here's a slightly delayed summary (had to fix some prod issues):

  - a discussion that the parent commenter took an issue with: "What's software? What's hardware? It's hard to answer that."
  -   essentially, an OS is a program that abstracts away hardware
  -   kernel: a piece of the OS that runs with the highest level of privileges, but only a part of the OS
  -   the OS as a whole includes libraries, daemons etc.
  - expansion on the history of OSes and how we got to where we are now
  -   developing OSes isn't always lucrative, you don't hear about the innovative companies that didn't survive
  -   mentions of https://en.wikipedia.org/wiki/Second-system_effect
  -   a brief story about how trying to outsource the PL/I compiler wasn't a good idea
  -   the Unix approach was way more organic in comparison to PL/I, less of a waterfall
  - a little bit about programming languages
  -   a little bit about the history of C and how it wasn't created the exact time as Unix
  -   some words about languages that now seem esoteric, like https://en.wikipedia.org/wiki/Language_H
  -   thoughts on the imporance of macros and the C preprocessor
  - more about OSes in the 1990s
  -   languages like C++ and Java got more popular
  -   many of the OSes of the time suffered from the aforementioned second system effort, were overcomplicated
  -   oftentimes overcomplication also lead to great resource usage with little tangible benefit
  -   with the arrival of Linux, the C based OSes became more entrenched
  -   at the same time, the actual languages that focused on the ease of development (Java, Python, Ruby) also gained popularity, though in a different context
  - software systems in 2010s
  -   without a doubt, it's nice to be able to use higher level abstractions
  -   Node.js got surprisingly popular due to a high-performance runtime with the aforementioned benefits
  -   Go was also developed, though its garbage collector is mentioned as a problem here, because it makes C interop harder
  -   a bit of elaboration about GC and the problems with it, how easy it is to have a reference into a huge graph
  -   essentially, it has its use cases, but at the same time there are problems it's just not suited for (a certain class of software)
  - how Bryan got started with Rust and a bit about it
  -   initially he didn't want to go back to C++, because of a variety of past issues
  -   he got increasingly interested in Rust and its potential benefits
  -   curiously, there is a category of people who are curious about Rust, but haven't actually written any
  -   it's nice to have a language that's built around safety, parallelism and speed
  - more about Rust
  -   its ownership system allows for the power of garbage collection, but the performance of manual memory management
  -   being able to determine when a memory object is no longer in use and being able to do so statically is like a super power
  -   the compiler itself just feels really *friendly* by pointing you to directly where the problems are
  -   composing software becomes easier all of the sudden, when compared to C, since it's hard to get it right there
  -   going back to C or porting C software can actually be somewhat difficult because of unclear ownership
  -   some of the Rust performance gains are actually from good implementation of language internals, like them using b-trees
  -   algebraic types are also nice to have and the FFI in Rust is really well thought out
  -   there's also the "unsafe" keyword which allows loosening the security guarantees when necessary
  - about OS development and Rust
  -   no-one cares about how easy OS components were to develop or how long they took to develop, everyone just wants things to work
  -   a bit of information about having to deal with failed memory allocations, design discussions, language features etc.
  -   lots of OS projects out there in Rust
  -   however, writing your own OS essentially forsakes Linux binary compatibility, so a lot of software won't run anymore
  -   you have to consider what is the actual advantage of rewriting existing software in Rust, safety alone might not be enough
  -   a callback to the fact that an OS is more than just the kernel! you could rewrite systemd in Rust and other pieces of software, however not all software is a good candidate for being rewritten
  -   firmware in user space (e.g. OpenBMC) could probably benefit greatly from Rust as well, in addition to just having open software in the first place

tl;dr - Rust is promising, yet isn't a silver bullet. That said, there are certainly domains and perhaps particular OS components which could benefit from the safety that Rust provides, especially because its performance is also good!

puzzledobserver · on Oct 18, 2021

Haven't watched the video, but I recently watched Timothy Roscoe's OSDI keynote on operating systems and hardware [0], where he argues for the operating system to include all of the kernel, device drivers and blobs that manage the SOC. He points out that the system emerging from the interaction of these components is complex and essentially unarchitected.

After watching his talk, I wanted to figure out how a computer, say a Raspberry PI, _really_ works. And build an operating system for it. Maybe in Rust?

[0] https://www.usenix.org/conference/osdi21/presentation/fri-ke...

guerrilla · on Oct 18, 2021

Wow, impressive notes. Thank you.

That reminds me of the GNU coreutils being rewritten in Rust: https://lib.rs/crates/coreutils

dijit · on Oct 18, 2021

TL;DW: No, it's not that simple, replacing working code is probably not a good idea, hybrid approaches are best.

IshKebab · on Oct 18, 2021

> replacing working code is probably not a good idea

Depends what you mean by "working".

dijit · on Oct 20, 2021

I think that's a pretty moronic take when we're talking about operating systems.

They obviously work. I'm using one right now and the site we're communicating on is using one also.

Or do you have another definition which encapsulate the basic notion of "work" and adds to it?

lnsp · on Oct 18, 2021

I highly recommend to take the time. First, I find Bryan's presentation style very engaging and fun to watch. The short recap on language history is also quite fun. I also would not consider myself to be part of any Rust hype, but the language has sparked my interest once again.

My TLDR: Rust is a great systems language, however in-kernel C code in itself is very safe, its the ecosystem (drivers, firmware etc) around it that could benefit greatly from being rewritten in a safer language.

kaba0 · on Oct 18, 2021

> in-kernel C code in itself is very safe

Any citation on that? Obviously it went through numerous people hours where many bugs were reported/fixed, but afaik it still contains plenty of race conditions, and whole other areas of bugs that are just being discovered through better static analysis programs, and any future code can easily introduce new problems.

lnsp · on Oct 18, 2021

It's around 0:56:12, AFAIK kernel dev has relatively strict guidelines regarding memory management. Cantrill also says that he considers the borrow checker less important if you only write code for a system that does not interact with other libraries, however as soon as you interact with others, things get messy on who owns what and when stuff should be free'd. Since the kernel is a sense a sealed system (ignoring kernel modules), the memory argument probably isn't that important anyway.

I totally agree regarding race conditions though, however I don't think Rust does anything to solve this.

selectnull · on Oct 18, 2021

[flagged]

mijoharas · on Oct 18, 2021

He addresses this in the second slide. :)

jfk13 · on Oct 18, 2021

I'm waiting for a post with the headline: "Is Betteridge's law of headlines always reliable?"

Verdex · on Oct 18, 2021

I mean ... sure? However, I won't be funding it. At least not upfront. [That is, I might buy it after you finish and show it to be good.]

Rust is probably better than C or C++ for writing an OS ... however it has some behaviors that don't make it ideal. Rewriting the OS in Rust will perhaps simply have to start over again when some frustrated OS devs go off and make an even better language for writing the OS.

Heavily augment the OS in Rust is a much more believable effort. The main architecture can be in C and then all the utilities, misc functionality, etc can be in Rust. This is also kind of what Rust was built to do (heavily augment a web browser already written in C++).

Finally, my money is on the far future having most OSes built with multiple languages. Each one specializing in what is required for the given domain. It sounds like Windows is already taking this route (iirc Windows 8 and 10 have some drivers (probably usb) written in the P language).

OSes are important enough for our society that I think it makes sense for us to put some extra effort into their construction. Several custom languages and 50ish years of calendar time to get an output feels like it's going to be worth the cost.

scrubs · on Oct 18, 2021

Assume Rust is a better system language than C, which Linux is written in. In order to get something 10x better ... you're gonna need better algos, or come with something that solves more problems in less code. Otherwise I don't see the point. A better, stronger claim is: let's rewrite sub-system 'X' in Rust which implements simplifications 'A,B,C,...' Unless there's an order of 10 improvement ... the cost will not be overcome by benefits.

matt_s · on Oct 18, 2021

Just stop at "rewrite" - when is a full rewrite of anything really necessary? Whenever I see this, I immediately jump to "its because they want to".

ashtonkem · on Oct 18, 2021

Usually. But then again Linux was a rewrite too. Sometimes unnecessary things become critical, and the distinction is only clear in hindsight.

matt_s · on Oct 18, 2021

From what I recall, Linux was also Linus doing something because he wanted to, not necessarily because there was a business driver. I'm not saying don't, its just good to understand the motivation.

The concept of OSes use file cabinets of papers as part of the abstraction layer. Breaking from that mold might prove fruitful.

colejohnson66 · on Oct 18, 2021

> From what I recall, Linux was also Linus doing something because he wanted to, not necessarily because there was a business driver. I'm not saying don't, its just good to understand the motivation.

People rewrite things in Rust, not because of business reasons, but because they want to. What’s different here?

ink404 · on Oct 18, 2021

isn't it possible for there to be overlap between "because they want to" and "because it's necessary"?

Shadonototra · on Oct 18, 2021

Rust is nice, it tries to solve certain kinds of memory bugs

But it introduce even bigger issues; slow build times

Being able to iterate SUPER QUICK is very important, you want to be able to test and see results as fast as possible, so you get to fix bugs or implement feature with ease

Imagine you get to work on super import product that needs super low latency and close to 0 downtime

You notice a bug, you have to deploy a fix ASAP

If your language makes you test/deploy the issue in hours, then it's very bad

mlindner · on Oct 22, 2021

Most development bugs in other languages are issues of type mismatch (in dynamic languages) or memory issues, both of which aren't a problem in Rust. You don't need to iterate as fast.

Shadonototra · on Oct 26, 2021

> You don't need to

I do need to

nso95 · on Oct 18, 2021

Rust isn't a silver bullet

thewakalix · on Oct 18, 2021

Well, of course. Those are two different metals.

hoseja · on Oct 18, 2021

Perhaps we need a new lang, "Tarnish".

jamincan · on Oct 18, 2021

Projects to integrate Rust into an existing code base are often referred to as oxidation going back to the beginning with Mozilla and Firefox.

thesuperbigfrog · on Oct 18, 2021

No silver bullet exists.

"There is no single development, in either technology or management technique, which by itself promises even one order of magnitude improvement ... in productivity, in reliability, in simplicity." -- Fredrick Brooks

Source: http://worrydream.com/refs/Brooks-NoSilverBullet.pdf

Nevertheless, Rust has shown great promise and implemented great ideas for high performance and safer low-level / system programming.

We need to keep on improving and using Rust and its ideas to improve the high performance, low-level / system programming space.

politician · on Oct 18, 2021

I’d be onboard with a Rust rewrite of Windows. Go for it! It’s the least memory-safe OS with most problems and the widest desktop deployment. Do the Rust folks do rewrites of closed source?

masklinn · on Oct 18, 2021

That’s probably in the works (on an exploratory basis at least) behind closed doors. Microsoft’s infrastructure team has noted they were switching away from C++, and the Security Response Center team considers it the best chance to move forward.

daniel-thompson · on Oct 18, 2021

The business side puts a very high premium on backward compatibility. Until that changes, they need to keep around the cruft / shims / architectural bridges / glue that make that possible. Working with the raw Windows API is already painful enough^; I can't even imagine the effort it would take to reimplement it in Rust.

^ random example: https://caseymuratori.com/blog_0025

sys_64738 · on Oct 18, 2021