"Also in his Turing Award lecture, he described how he had incorporated a backdoor security hole in the original UNIX C compiler. To do this, the C compiler recognized when it was recompiling itself and the UNIX login program. When it recompiled itself, it modified the compiler so the compiler backdoor was included. When it recompiled the UNIX login program, the login program would allow Thompson to always be able to log in using a fixed set of credentials."
OpenBSD specifically modified malloc() a few years ago to prevent this sort of sneakiness (http://www.tw.openbsd.org/papers/eurobsdcon2009/otto-malloc.... [pdf]). So they route their malloc() calls through mmap() which returns randomized pages, and free() immediately returns memory to the kernel rather than leaving it mapped in the current process.
I'd be surprised if these changes haven't made it into FreeBSD, but afaik Linux doesn't work this way (by default, anyway).
> And of course, some programs use replacements like dlmalloc and do all their own allocation management anyway.
Yeah. I wrote my own allocator in C++ a long time ago. I wouldn't be surprised if there weren't quite a few other bits of software out there doing the same thing.
Partly. They were using their own allocator (openssl_malloc()), but even then they would've been OK if it weren't for the OBO error elsewhere in the heartbeat implementation. If they were using an OS-supplied malloc() instead of openssl_malloc(), the bug would've still been exploitable on some operating systems, but not others.
Either way, "don't write your own allocator" is a good lesson to learn.
Unless, of course, you're doing it for fun. In which case, efficient heap management really is a neat exercise.
I don't think that source refutes the parent's point: only large allocations go directly to mmap/munmap as opposed to being cached. Of course, there are other anti-exploitation measures too...
It also makes the assumption that it's a little-endian system. On a big-endian system, the high order byte of the timestamp would be modified, which would probably be too obvious.
"In C/C++, you can use bugs in one part of a program to cause trouble in another. That’s pretty darn underhanded."
I would argue every language has that property. But with C/C++ being so closely tied to the ABI of the machine perhaps they are more underhanded than others. But to me, this branding does feel a bit unfair.
...Except, of course, the the compiler can and will optimize away any such memset.
Do not attempt to write secure software with C / C++. It is a Bad Idea(TM). Because what you wrote is not what gets run.
For instance: there is no way to write a portable secure memset in portable C / C++. (You can write a "secure" memset that works in all current compilers, but that is not the same thing. What doesn't get optimized now can and will be optimized tomorrow.)
I am not talking about if it does currently optimize away a function call to an alternative to memset, I am talking about if it is allowed to optimize it away.
The compiler is allowed to optimize away that function call regardless of if it is memset or your own alternative.
Note that that is not the same thing as saying if it does currently optimize away that function call.
I'm not aware of any portable language that offers semantics that guarantee data is never rendered accessible to the outside world (I could well believe that current implementations don't, but I'm not aware of anything that specifies it). It always comes down to platform-specific APIs. But that's not a reason to not use C - unless you can suggest a better alternative?
Quite frankly, assembly currently is about it. And that's not the most portable thing out there, I know.
I think that C or C++ could, without too much effort, support semantics that would allow for this sort of thing. Something as simple as a "secure" keyword that could be applied to variables (where it means "leak as little as possible about this variable when it goes out of scope") or functions (where it means the same, but for the function itself and all locals of the function).
I don't get it, the compiler operates within the memory model specified by the language. If it "optimizes" a memset it does not change the behavior of the program (or it is a bug in the compiler which is a different topic).
Common misconception with C. A pointer does not mean a pointer to a sequence of capacitors in your RAM-memory. It really means a pointer to an abstract and temporary variable. How this abstract variable is executed on your hardware is implementation specific. Everything except input, output and explicitly defined side effects (volatile) is of no interest.
Really, you can print a c program on piece of paper and ask some slave to "execute" the program in his head given some input x, how he "implements" memset will surely be different than what a computer would, and if you only ask for the output y he will surely see that this memset doesn't affect y at all and skip doing it.
You're correct - the compiler operates within the memory model of the language. But C / C++'s memory model is broken w.r.t. security.
There is no way to ensure that something is actually overwritten, because under the memory models of C and C++ you cannot ever read that memory again, even though in actuality you can.
I'm not sure what you mean here, if you have a volatile pointer that points to a memory buffer returned by malloc, how can the compiler prevents a write through the pointer from happening?
Edit: unless your point is a temporary copy can be spilled in memory and this copy will stay in memory and won't be overwritten?
Because the compiler will optimize it out. Even if you return the random data / constant the compiler will optimize out the store to the variable and just pass it through directly.
The problem is that the memory model specified by the language is a subset of the memory model specified by the hardware. This leads to exploitable systems when you lift those blinders.
Use a compiler that has extensions that _do_ guarantee that memory gets erased. That's what the gcc function attributes are for.
Alternatively, use a library that the C compiler doesn't know so much about that it will attempt to remove calling functions in them.
If you copy your standard library's memset in a separate DLL that is not the standard C library, the compiler will not even see the code during compilation, so it has to compile a function call.
The linker (or a JIT in your C runtime) is allowed to remove calls to the function, if it can prove that it doesn't have side effects. However, to prove that, it has to look at the assembly of the function; it cannot use the way simpler heuristic "it came from <memory.h> and is called memset"
If you copy your standard library's memset in a separate DLL that is not the standard C library, the compiler will not even see the code during compilation, so it has to compile a function call.
Although I'd like things to behave this way, I don't think this is true. The C standard library was incorporated into the language spec for C89. The behaviors of the named functions within it are specified, and the compiler is allowed to inline it's own version (ignoring your custom code) and then optimize out the inlined portion.
So while it's possible that the external linkage approach still works with certain compilers, it's not portable. I believe you are OK with the external approach if you use a non-standard name (my_secure_memset_pretty_please()), but that just shifts the problem to forcing the compiler to generate your external function without making the same dangerous optimizations.
In the end, I fear you are left with three options: blind faith, non-standard language extensions, or switching to a more secure language (likely assembly). If there are other options, I love to hear about them.
In practice, of course, memset actually works, because it's a function and the compiler's usage tracing is nowhere near able to spot that you don't reference those zeros that you write.
(IoT security is doom for other reasons though, mostly UI, updatability and cloud services)
LLVM can and will do it. It will assume it knows what a function named "memcpy" (for example) does and optimizes accordingly. (Look at TargetLibraryInfo.cpp and grep for LibFunc::memset in, for example, SimplifyLibCalls.cpp.)
(That said, I think TheLoneWolfling is being too strong with his/her claims. You can get modern compilers to avoid dangerous optimizations; it's just not for the faint of heart.)
Also: isn't that a bug? Is there something in a C / C++ standard that states that a function named "memcpy" (for example) is necessarily the normal function?
Compilers have been doing this for a long time. The optimizations that this enables are essential for performance. They shouldn't stop; if the spec prohibits it, the spec should change (and if it doesn't, the compilers should ignore the spec).
And as for the second part... Meh. I don't see any optimizations that hard-coding calling something named "memcpy" (or whatever) does that cannot be enabled by looking at the actual code that gets linked. Albeit with more difficulty.
W.r.t. 1, the compiler's definition of no optimization today is not the same thing as it was last version, or will be next version. For instance, on IA-64 there are things the compiler has to do that are typically considered optimizations.
W.r.t. 2, you have to make sure there is no link-time optimization happening.
However, that is not an inherent restriction - that is only a restriction on current compilers. It is entirely possible for a compiler to read the assembly of things being linked and optimize based on that.
That does not solve the problem. That only hides it and means it will be deadly later.
For instance, when someone runs it in an emulator for backwards compatibility purposes. Or when someone runs it in a JITter. Or even just if the compiler decides to special-case for the existing link target.
C11 actually adds the memset_s function, which is guaranteed by the language spec not to be optimized away:
> memset may be optimized away (under the as-if rules) if the object modified by this function is not accessed again for the rest of its lifetime. For that reason, this function cannot be used to scrub memory (e.g. to fill an array that stored a password with zeroes). This optimization is prohibited for memset_s: it is guaranteed to perform the memory write.
> For instance: there is no way to write a portable secure memset in portable C / C++.
Of course there is, you just use the volatile keyword. volatile guarantees that all read/writes have corresponding memory accesses and cannot be optimized away.
It's not going to be as fast as memset but it's definitely portable and it won't be THAT slow. Then for platforms that have memset_s defer to that instead, otherwise fallback to the totally portable volatile + for loop.
And that doesn't even get into the other aspect of it:
Namely that C / C++ allows temporary copies of variables that are not cleared afterwards. The most obvious case of this being things being temporarily copied into registers / stack, but there are other examples as well.
But that does not work. Full stop. The compiler can optimize in ways that still leak the contents of the thing that was supposed to be memset-ted away.
C / C++ are very good languages in all sorts of ways. However, there are components that currently have... flaws. This being one of them. As such, I complain about said flaws, in the hopes that someone will take notice, and/or someone will point me in the direction of things that contain the good parts of C / C++ without said flaws.
I have already learned a fair bit about bounds checking, SIMD instructions, etc, etc from this. And I always want to know more.
*
And no, it is the same problem. Namely, that the memory models of C and C++ doesn't match with the underlying hardware, and the mismatch is such that things that are trivial to do on the underlying hardware are literally impossible to do with C and C++.
Part of this is for compatibility purposes, but there are ways to keep the compatibility that don't present this sort of problem.
Volatile should only be used with hardware registers. It doesn't do exactly what you want here. It will guarantee that memory will be accessed but it doesn't guarantee the ordering which can lead to some really nasty behaviour.
The only place that keywork should be used is a qualifier for member functions or when used in an embedded sense. It's not well defined outside of that scope.
If you're modifying memory immediately before freeing it (i.e. after the last time you read it), don't you have to be extra super careful to do so in a way that the compiler won't optimize the operation into nothingness? (I don't program in compiled languages very much, so I don't know the details about this.)
The compiler can, and will, make copies of data behind the scenes. And not erase said copies.
What we really need is a keyword / modifier that says that when X passes out of scope no state related to X may be leaked. Ideally, that can be applied to a function / block as well as a variable.
(Or rather, not necessarily no state. Read "as little state as possible", preferably with modifiers that panic unless the compiler can ensure specific things.)
The C standard works at an abstraction level that makes it unsuitable for security applications, I would advocate for a new language here. It needs serious PL research with informarion flow reasoning, what we need is just a new kind of language much more machine aware (yes, more low-level) than C.
On the other hand, something as simple as a keyword marking a variable as "as secure as possible given hardware constraints (read: should wipe any temporary copies and the variable itself after it goes out of scope, should attempt to prevent it from being written to non-volatile storage, that sort of thing)" (sort of like how inline works), with compilers required to bail if the constraint cannot be done to the level specified, would be a massive step in the right direction.
1. memcpy is less safe than memmove and strncpy. strncpy should be used.
2. The two character arrays should use the same constant in defining their length, and that constant should be used both in the struct definitions and here in the copy operation.
3. The code is written in C in spite of it being 2014 at the time.
"Also in his Turing Award lecture, he described how he had incorporated a backdoor security hole in the original UNIX C compiler. To do this, the C compiler recognized when it was recompiling itself and the UNIX login program. When it recompiled itself, it modified the compiler so the compiler backdoor was included. When it recompiled the UNIX login program, the login program would allow Thompson to always be able to log in using a fixed set of credentials."