From L3 to seL4: What Have We Learnt in 20 Years of L4 Microkernels? [pdf]

Rusky · on April 12, 2014

Microkernels are a nice step toward security, but they're a concept ahead of current hardware design and they don't really bring the flexibility typically promised.

Services (virtual memory/swapping, file systems, the network stack, etc.) in microkernel systems typically can't be modified or replaced by applications any more than in monolithic kernels, which is probably party of why microkernels have stayed in the realm of embedded systems, etc. where you have control over the whole system.

Exokernels bring the flexibility that microkernels don't, by moving the security boundary down the stack. Instead of moving services into trusted user-level processes, they manage protection at the level of hardware resources rather than services. This enables those services to be in untrusted shared libraries that can be securely modified or bypassed on a per-application basis.

Thus, instead of the lingering "eh, it's a little slower but we can ignore that," exokernels provide much better opportunities for optimization and tend to be much faster. For example, a database could choose to discard and regenerate index pages rather than swap them out to disk and back; a file copy program could issue large, asynchronous reads and writes of all the copied files at once; a web server could use its knowledge of HTTP to merge packets, or co-locate files from web pages to improve disk seek time.

Further, exokernels and microkernels are not mutually exclusive; they are rather orthogonal concepts (you could move an exokernel's drivers into user space processes if you wanted). If we had hardware that were more conducive to a microkernel design, for example with direct process switching rather than going through the kernel (32-bit x86 did this with task gates, but they weren't used much and were abandoned with 64-bit), this would probably be the optimal design, rather than a purist microkernel approach. Incidentally, the in-development Mill CPU design does this very efficiently, as well as a few other things that are good for both micro and exo-kernels.

nhaehnle · on April 12, 2014

Exokernels tend to be a good idea for quite a number of things. In fact, all modern desktop systems use what is essentially an exokernel design in their graphics driver stack:

Rather than implementing OpenGL (or Direct3D or whatever) in the kernel, the kernel drivers provide an interface that allows user space to submit hardware-dependent command lists for the GPU.

OpenGL is then implemented as a very thin layer that loads a hardware-dependent user space library. It is this hardware-dependent library that contains the vast majority of the OpenGL implementation, including the shader compilers and everything else that is needed to build the hardware-dependent command list that is ultimately submitted to the kernel.

anon4 · on April 12, 2014

And you've just described the current state of Open-Source video drivers in Linux.

e12e · on April 12, 2014

I believe both Windows NT and OS X do it this way, in addition to closed-source video drivers for Linux and Android? Not certain, and I won't speculate about iOS.

gsnedders · on April 12, 2014

Windows (NT) does since Vista. But I thought OS X still had all of it in kernel-space? Not sure though.

kijiki · on April 12, 2014

All the major OSes evolved through different paths, but eventually ended up with very similar schemes.

Windows (ignoring the 9x series): In NT 3.x, GDI was just a thin DLL that RPC'd over to CSRSS.EXE. CSRSS did the actual drawing directly from userspace. As recently as XP (I haven't checked later) CSRSS still has an elevated IOPL, which meant that despite being a userspace process, VMware had to use binary translation on it, instead of direct execution. In NT4/XP, GDI (and some other stuff) moved into win32k.sys. Userspace would essentially do a syscall, and the kernel would talk to the video card. For 3D graphics, the model is that userspace apps load a video-card specific library to generate command buffers for the video card. When the app wants to execute a command buffer, it uses a syscall; the kernel verifies that the commands are safe, and submits it to the hardware. In Vista and later, a path similar to the 3D path is used for all drawing, only the drawing is done on an app-specific offscreen surface. Another userspace process generates command buffers to composite those offscreen buffers together to generate what you see on the screen.

Linux/X11: In the dark ages, it was very similar to NT 3.x (X came first, I just ended up writing in this order). Applications used xlib to generate X protocol messages which were sent via a unix domain socket, or a TCP socket to the X server. The X server then, from userspace, programmed the video card. This had the same IOPL implications for the X server as CSRSS.EXE. When 3D acceleration was added, it worked very similarly to 3D in NT4/XP. Finally, with compositing and now Wayland, the model is similar to Vista+.

OSX: In NextStep/early OSX, applications drew (software only) into shared memory chunks. A userspace compositor process did software compositing from those shared memory chunks into the video card's VRAM. With middle OSX (can't recall exact versions here), the compositor process started to upload dirty regions from the shared memory chunks into offscreen VRAM surfaces, and then programmed the video card to composite them together. Finally, modern OSX works similar to modern Linux and Vista+.

I just wrote them up in this order arbitrarily. X did drawing via RPC to a user-space process long before NT and NextStep existed. NextStep did compositing long before the other two. Ironically, given the flamewars and marketing of the 90's, Linux/X was exactly as "microkerneley" as NT3.x and NextStep, and more so then NT4. And they all evolved towards very similar architectures.

sitkack · on April 14, 2014

What hardware changes would enable microkernels or exokernels to compare favorably to monolithic kernels?

copergi · on April 12, 2014

>but they're a concept ahead of current hardware design

How so?

>they don't really bring the flexibility typically promised.

Which ones have you tried?

Rusky · on April 12, 2014

On current hardware, RPC is expensive no matter how you optimize it because it goes through the kernel. It could be even cheaper than system calls with better context-switching primitives (tagged TLB or single address space, context switching directly from user-space, etc.)

I have played a lot with L4, but the flexibility problems I'm talking about are intrinsic to the traditional microkernel organization. Any user or application can't replace a privileged system server- they can replace a shared library.

haberman · on April 12, 2014

> Any user or application can't replace a privileged system server- they can replace a shared library.

In my view, "privileged system servers" should only exist inasmuch as they are necessary to be an arbiter of resources or a router of hardware messages.

Take memory: at some level, something has to have authority over which of several competing processes will get ownership of memory pages. And when a page fault occurs for some virtual memory page, there has to be some way of dispatching this to the code that can handle it without triggering a page fault itself.

Both of these cases require centralized, privileged processes by nature.

Anything else that does not similarly require centralized privileged processes are better handled as shared library, I agree.

I'm not sure I see what parts of the "traditional microkernel organization" mandate privileged system servers for things. I haven't looked at L4 in a while, but as I recall the only server that is inherently privileged is a pager, which must be privileged for the two reasons I mentioned before.

Rusky · on April 12, 2014

Yes. I don't know much about how servers are typically done today, but from the research I've read server processes are used for things like file systems, the network stack, virtual memory swapping, or other abstractions on top of the hardware, in addition to L4's user space drivers. However, these servers don't have the right domain-specific knowledge.

For example, when the kernel (or a server) needs to revoke some physical memory from a process, it doesn't have the information it needs to do this well- LRU-page-to-disk is not always the best pattern. A database application could instead discard an index page if regenerating it is faster than loading it from disk- this is faster and more power efficient.

When the kernel needs to allocate disk space to a file system, it doesn't know the expected usage of that space as well as the application often can. Databases, web servers, version control, etc, all know (or can profile) their file system usage (for that matter, the file system itself is sometimes suboptimal)- "this web page will also send these js, css, and image files" for example, so letting the application choose the disk blocks out of the available ones can bring massive performance improvements.

The same applies to network packet merging, file copy operations, scheduling of threads in an application, using the virtual memory system for things like garbage collection and persistent storage, etc.

copergi · on April 12, 2014

>On current hardware, RPC is expensive no matter how you optimize

You mean IPC? While hardware support could make it faster, it is already plenty fast. The whole "L4 only introduces 3% overhead" thing is pretty old news at this point.

>I have played a lot with L4

Then you should know that the performance myths that come from mach's slowness are not accurate.

>Any user or application can't replace a privileged system server

Multi-user systems are a vanishingly small minority at this point. I don't think "you need to be in control of your computer" is a huge show stopper.

Rusky · on April 12, 2014

I mean RPC, because that is the common use case of IPC between applications and servers in microkernel systems. It is more expensive than shared library calls or system calls no matter how much you optimize it on current architectures.

I'm also well aware of L4's performance vs Mach, but like I said: "instead of the lingering 'eh, it's a little slower but we can ignore that,' exokernels provide much better opportunities for optimization and tend to be much faster" than monolithic or microkernel designs.

For example, MIT's Cheetah HTTP static file server (comparable to squid, at the time named Harvest) achieved a 4x performance improvement just by moving to an exokernel and an 8x performance improvement by implementing some of the optimizations I've described in previous comments. This depends on the ability to bypass the privileged servers a microkernel typically uses.

Microkernels are nice for security, for a few percent overhead, and that's good, but that's about all they do, so they're not seen as worthwhile for many uses. Exokernels can provide the same level of security while also providing far greater flexibility and performance. This essentially provides, with a performance gain, what current data centers use virtualization for, which is another few-percent-slowdown.

copergi · on April 13, 2014

I can't tell if that is the most subtle trolling I've seen since the 90s, or if you are seriously saying "yes I know I was lying, but I am still lying so it is cool".

greenyoda · on April 12, 2014

This article is a PDF document. Here's the abstract:

The L4 microkernel has undergone 20 years of use and evolution. It has an active user and developer community, and there are commercial versions which are deployed on a large scale and in safety-critical systems. In this paper we examine the lessons learnt in those 20 years about microkernel design and implementation. We revisit the L4 design papers, and examine the evolution of design and implementation from the original L4 to the latest generation of L4 kernels, especially seL4, which has pushed the L4 model furthest and was the first OS kernel to undergo a complete formal verification of its implementation as well as a sound analysis of worst-case execution times. We demonstrate that while much has changed, the fundamental principles of minimality and high IPC performance remain the main drivers of design and implementation decisions.

kbenson · on April 12, 2014

Oh, the conundrum that is the very technical HN story. Do I dive in and devote the time to learn whether this is paper is as interesting as it seems on the surface, or do I wait for some explanatory posts or even a TL;DR summary to help me decide?

Edit: The paper helpfully provides much of this itself, with boxed section footers with the change from then to now in how that component is handled. It makes for an interesting way to skim and zero in on sections you may find of interest.

e.g. 4.2 Lazy scheduling ends with *Replaced: Lazy scheduling by Benno scheduling"

tptacek · on April 12, 2014

When the story is about L4, and the reader is interested in operating systems, the answer to this conundrum is "very yes". L4 is an incredibly simple, elegant, and important OS design. I have a hard time reading about it and not wanting to play around with it (thankfully, I've had the opportunity on projects).

Two simple uses for L4: first, as a bare-bones but coherent, complete OS for a purpose-built device; second, as the basis for another OS built on top of it as a "personality". L4 implements most of the hairy parts of an OS (perhaps modulo filesystems), allowing you to build the upper-level OS in terms of L4 abstractions.

This paper is a survey of the last 20 years of that microkernel design, which has been picked up and evolved by several teams. One way to approach it would be to read an introduction to one of the original L4 OS's, and then read this one to see which concepts survived into 2014.

panzi · on April 12, 2014

The way to read papers: Read the abstract, if still interested read the conclusion, if still interested read the rest.

twic · on April 12, 2014

While working as a cell biologist, i followed a different approach: ignore the abstract, look at the figures, if still interested skim the materials and methods, and if still interested, read the rest.

The rationale for this is that the most interesting thing is what the authors have discovered, but it's only interesting if they've used sensible and relevant methods, and what is very unlikely to be interesting is the authors' attempt to convince you that the result is stronger and more important than it is.

panzi · on April 12, 2014

Yeah, "my" approach[1] is probably best suited for computer science papers.

[1] It's not really mine, it's what I was told at university.

kbenson · on April 12, 2014

That's probably good in general, I'm not sure how well it holds up to a survey such as this. The paper helpfully assists in this respect though, as I outlined in my edit above.

jacobolus · on April 12, 2014

The OOTB/Mill people are apparently working on porting L4/Linux to their architecture: http://millcomputing.com/topic/security/#post-802

Their machine-supported security features will be very interesting to see realized.

harry8 · on April 12, 2014

"For a successful technology, reality must take precedence over public relations, for Nature cannot be fooled." --Richard Feynman

All of us need to learn this, re-learn it, revisit it, internalise it, live it and breathe it every day. I'm sure I could do better at attaining such an ideal. So too can these gentlemen.