Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The real troubles are undefined behavior and aliasing. Buffer overflows are just a well known gimmick of the language that is more or less controllable with some discipline. Aliasing is hell. You cannot even use a global variable safely!


Isn't much of the undefined behavior in C that people love to complain about intentionally left in the standard for the purpose of optimization? Similarly, bounds checks are necessary in insecure contexts (ie most places) but you probably don't want them slowing down (for example) an MD simulation.

Edit: But to be clear, C really ought to have first class arrays. If you truly don't want bounds checks in a specific scenario for some arcane reason, you could still explicitly pass a raw pointer and index on that. (The same as you would in any sane systems language.)


UB has nothing to do with optimization. It's about working around the differences between all the platforms that a C program might need to be compiled for. UB covers things like the layout of a signed integer(might be two's compliment, or it might not). It's about letting the platform or compiler dictate what the program does in the rare case where the program does something that might result in different behavior on different compilers and platforms.

Note that I'm using "platform" to refer to the CPU instruction set.


Signed integer overflow could have been marked as implementation-defined rather than undefined behavior. That would have meant that compiling a program with overflows on most systems would produce the same results, but compiling it for the occasional rare sign-and-magnitude machine would produce slightly different results. However, they didn't do this. Instead, they said that it's undefined behavior, which means that any program that overflows integers has no guarantees about it's behavior at all - it could crash right away, generate the correct result 99 out of 100 times, or the compiler could outright reject the program.

A good example of this is calling functions with the wrong parameter types. UB in C, but practically allowed by every compiler. No machine would care if you do this... until WASM came along and suddenly every function call is checked at module instantiation time for exactly this behavior. This is because all WASM embedders are fundamentally optimizing compilers. And what is the mother of all optimizations? Inlining: the process of copypasting code from a function into wherever it is called. If a function is being called with the wrong arguments, how do you practically do that? You can't.

It is meaningless to talk about UB without also talking about optimizations. If you do not optimize code, then you do not have UB. You have behavior that is defined by something - if not the language spec, then the implementation of that spec, or a particular version of a compiler. There are plenty of systems with undocumented behavior that is nonetheless still defined, deterministic, and accessible. Saying that something is UB goes one step beyond that: it is saying that regardless of your mental model of the underlying machine, the language does not work that way, and the optimizer is free to delete or misinterpret any code that relies on UB.


> UB has nothing to do with optimization. It's about working around the differences between all the platforms that a C program might need to be compiled for.

That's what it used to mean. But at some points compilers people decided that since UB means literally "anything can happen" they can make optimizers optimize the shit out of the code assuming that UB can't be there.

C code that used to work 20 years ago, because the UB in it resulted in some weird but non-catastrophic behavior, doesn't work at all compiled modern compilers.


> UB has nothing to do with optimization.

Other commenters already responded to this, but I thought I'd link an article I came across a while back that gives a concrete and easy to understand example of how UB can be leveraged for optimization by modern compilers. (https://devblogs.microsoft.com/oldnewthing/20140627-00/?p=63...)


> You cannot even use a global variable safely!

Sorry? I’m unsure what you mean here, because there are plenty of ways to use globals in ways I would call “safe”: no undefined behavior, correct output, …


In one C file you can declare a global:

    int x;
and in another:

    int* x;
and you'll be mixing pointers and ints and it won't be detected.


I was not talking about this, but about aliasing a variable on the same translation unit.

    int x = 7;
    void f() { /* do things using x */ }
    void insidious_function(int *p) { *p = 3; }
now, inside f you cannot be sure that x equals 7, even if you never write into it. You may call some functions, that in turn call the insidious function that receives the address of x as a parameter. There's no way to be sure that the value of x is not changed, just by looking at your code.


Isn't that what asan/ubsan is for?

Granted, it's not static analysis, but it should catch most aliasing related errors, no?


No. There has been some effort in that direction, somebody proposed a Clang "type sanitizer" patch, but it wasn't merged.


I'm fully in the camp of C plus powerful analysis tools, plus a high-level language (Python or Scheme).


Since powerful analysis tools aren't defined by ISO what they are supposed to be, there are plenty of C compilers that will never get them.


only if your tests exercise that code path


You can't do compile time bounds checking, if that's what you're implying.


Most, but not all. They are excellent tools but not perfect by any means.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: