This is crazy. I remember thinking when I first heard about stack and heap growing towards each other: uh-oh. But the problem was so blindingly obvious that I just assumed any system written for anything beyond co-operative multitasking had a fix - because if not there was obviously no actual memory safety...
What would the fix be? The fact that you can point the stack pointer at arbitrary memory and the CPU will treat that memory as the stack is a feature, and an important one, of the ISA.
The real issue here is that writing programs in memory-unsafe languages is inherently difficult and risky, and fewer programs should be written that way.
It looks like the fix is going to involve adding some code to LLVM to probe each stack page when you make a large stack allocation, but once that happens, it's straightforward for clang to implement -fstack-check for C and C++ programs, too.
This is a weird emergent problem from the fact that stack and heap memory are part of the same address space and that the stack is designed to implicitly grow as needed. It's not clear that it's a language or compiler's fault for relying on the stack doing that, nor that it's the platform or kernel's fault for making that approach possible.
I'm not totally sure how you would design a language so that you don't have this problem. I guess you could forbid a function from using more than some small amount of stack, and make sure your stack guard area is at least that big, but that seems like an unfortunate restriction. Maybe something like the Stackless Python approach, where all variables are heap-allocated, would work?
Also, all that said, note that Windows gets this right: MSVC inserts calls to _chkstk() to do stack probing. I believe this entire class of vulnerability doesn't exist on Windows, and probably some people at MS who spend their entire lives in memory-unsafe languages are feeling very smug today.
You're really barking at the wrong tree here. The user of the memory-unsafe language isn't at fault here for doing things the implementation will utilize stack for. The language specification -- the contract between him and the implementation, never stated or implied that he is responsible for making sure that these objects on stack aren't accessed in larger than page sized increments. The word "stack" never appears in the C specification.
At a source level, these programs may as well be 100% bug free.
This is nothing more than a quirk of implementation. And one that affects all languages that use the stack (and by which I mean the stack, not some heap-allocated structure the language provides stack-like operations on). Memory safety doesn't really enter the picture.
The design might be fundamentally broken, but if so you're saying that even with an mmu, we can only have co-operative multitasking. That's not the promise of a multi-user/multi-prosess system.
If "all programs are safe", we could just use the Amiga OS kernel and no longer need an mmu, or a similar design.
[ed: apparently windows NT takes steps to avoid this according to a sibling comment. Not clear, but I assume it implies a performance hit for certain heavy stack usage?]
I don't entirely follow what you're saying in your first sentence, but I'm going to try to respond to what I think you're saying. If I'm off base please let me know!
1. Memory-safe languages are about making sure that the programmer's intended behavior matches the actual behavior, that is, eliminating a class of bugs related to memory unsafety. They are a security scheme insofar as these bugs are security bugs, but they're not an interprocess security scheme. In particular, you can write a memory-safe debugger that goes and makes arbitrary modifications to other processes the OS gives it access to. You can even write a memory-safe program in Rust that goes and edits /proc/self/mem. But in these cases, the programmer is intending to mess with process memory directly, so the language isn't obligated to stop the programmer. It is obligated to stop the programmer from, say, overflowing a string and overwriting the return address.
2. It is certainly possible to design a memory-safe language that is usable for interprocess memory protection. Microsoft had a research OS called Singularity that did exactly this: https://www.microsoft.com/en-us/research/wp-content/uploads/... But it's another step on top of memory safety.
3. Preemptive multitasking and protected memory aren't inherently related (although, yes, in the market, most cooperatively multitasked OSes lacked memory protection, and most OSes with memory protections were preemptively multitasked). You can have a preemptively multitasked system with no MMU at all; you just need to respond to timer interrupts and switch tasks.
You're right, I leapt over some points, and landed slightly outside the discussion - I guess I think of growing the stack and allocating heap memory as something the kernel should be the arbiter of - and that the api should never allow you to grow into your own (or another process') memory.
I suppose it's fine to say mallloc will return memory, but it's up to the process to check if there are any overlaps - but that sounds a little crazy?
That is basically how it works, with the caveat that if you want the kernel to arbitrate your stack expansion you must only expand by a page at a time (and that's what gcc's -fstack-check does).
If you decrement your stack pointer by a large value and then offset it - which is essentially what's happening in these cases - the kernel can't arbitrate that because if the access lands in otherwise allocated memory it doesn't fault and so the kernel never sees the access at all.
I meant that the equivalent for mallloc would be that if you allocate 1mb buffer and a 2mb buffer, the kernel might return a 2mb buffer overlapping your earlier 1mb buffer - and be all like: "you asked for 1, you asked for 2 - and you've got 2 - if you wanted 3, you should've asked for 3". Afaik mallloc doesn't work like that - it assumes that you want more memory (and can fail or succeed etc).
I can see how the current stack/heap thing evolved - but I still think it's crazy :-)
All that's happening here is that userspace is moving its stack pointer into the heap it had previously allocated. Note that "moving the stack pointer" is not a kernel-mediated operation.
No of course, but the fact that you can "ask for more memory" by growing the stack onto your heap (rather than say, having the two start somewhere together and grow apart) - means that there's an asymmetry: mallloc will give you more ram or fail; growing the stack - can make your allocated memory overlap.
Your stack has to grow towards something. Sure, you can have it grow towards the bottom of the address space (which, due to wraparound, is also the top - where it will safely collide with the kernel addresses) but that only works for one stack - as soon as you create another thread, its thread has to grow towards something else.
The thing is that (as I mentioned upthread) "growing the stack" isn't a well-defined operation. What actually happens is that you overflow the stack by a very little bit, and the kernel says "Oh, I bet you want more stack pages" and maps some virtual memory for you. But the kernel is guessing; it has no way of knowing that you meant to grow the stack. Maybe you dereferenced a wild pointer that, by chance, happened to point right below the current stack.
Conversely, if you grow your stack by some value on the order of gigabytes, you're basically coming up with a pointer that appears to have no relation to the stack, and dereferencing it. So the platform is going to do exactly what it does if you were to dereference the same pointer value with no stack involved: read/write memory if it's mapped and segfault if not.
You could totally imagine a platform where growing the stack were a more well-defined operation. You want to avoid each function call and local allocation having the overhead of a system call, though: the nice thing about the current scheme is that it's zero-overhead if there's a mapped stack page. So the scheme was designed (or probably emerged more than was intentionally designed) for the case where syscalls are very slow, MMUs work fine, and perfect memory safety isn't the goal, i.e., the original UNIX target audience. :-)
You could keep a thread-local variable somewhere indicating the current stack limit, and make a system call when you need to increment it. That doesn't require an MMU at all: the userspace API is that you call some system routine when you need to expand the stack, and it says yes or no (or it either says yes or kills you with a segfault, or whatever). In an MMU-less system, you can just keep track of the amount of heap allocation, and have the system routine fail when you're too close to your heap.
Or you could do stack probing, which works but requires an MMU.
There would be a performance hit, but that is likely to be insignificant. Apparently the stack check code touches every stack page.
If performance were a concern (and I don't really think it is), it should be possible to reduce the impact by assuming a sufficiently large stack guard that the majority of functions with fixed size stack frames could never leap over. Then these functions wouldn't need any runtime checks. The only checking you'd have to do is on code that uses VLAs, alloca, or such, along with the few outlier functions that use ridiculously large fixed size buffers. I don't see why you should need to touch every page.