The real troubles are undefined behavior and aliasing. Buffer overflows are just...

Reelin · on Sept 12, 2020

Isn't much of the undefined behavior in C that people love to complain about intentionally left in the standard for the purpose of optimization? Similarly, bounds checks are necessary in insecure contexts (ie most places) but you probably don't want them slowing down (for example) an MD simulation.

Edit: But to be clear, C really ought to have first class arrays. If you truly don't want bounds checks in a specific scenario for some arcane reason, you could still explicitly pass a raw pointer and index on that. (The same as you would in any sane systems language.)

jschwartzi · on Sept 13, 2020

UB has nothing to do with optimization. It's about working around the differences between all the platforms that a C program might need to be compiled for. UB covers things like the layout of a signed integer(might be two's compliment, or it might not). It's about letting the platform or compiler dictate what the program does in the rare case where the program does something that might result in different behavior on different compilers and platforms.

Note that I'm using "platform" to refer to the CPU instruction set.

kmeisthax · on Sept 13, 2020

Signed integer overflow could have been marked as implementation-defined rather than undefined behavior. That would have meant that compiling a program with overflows on most systems would produce the same results, but compiling it for the occasional rare sign-and-magnitude machine would produce slightly different results. However, they didn't do this. Instead, they said that it's undefined behavior, which means that any program that overflows integers has no guarantees about it's behavior at all - it could crash right away, generate the correct result 99 out of 100 times, or the compiler could outright reject the program.

A good example of this is calling functions with the wrong parameter types. UB in C, but practically allowed by every compiler. No machine would care if you do this... until WASM came along and suddenly every function call is checked at module instantiation time for exactly this behavior. This is because all WASM embedders are fundamentally optimizing compilers. And what is the mother of all optimizations? Inlining: the process of copypasting code from a function into wherever it is called. If a function is being called with the wrong arguments, how do you practically do that? You can't.

It is meaningless to talk about UB without also talking about optimizations. If you do not optimize code, then you do not have UB. You have behavior that is defined by something - if not the language spec, then the implementation of that spec, or a particular version of a compiler. There are plenty of systems with undocumented behavior that is nonetheless still defined, deterministic, and accessible. Saying that something is UB goes one step beyond that: it is saying that regardless of your mental model of the underlying machine, the language does not work that way, and the optimizer is free to delete or misinterpret any code that relies on UB.

dpc_pw · on Sept 13, 2020

> UB has nothing to do with optimization. It's about working around the differences between all the platforms that a C program might need to be compiled for.

That's what it used to mean. But at some points compilers people decided that since UB means literally "anything can happen" they can make optimizers optimize the shit out of the code assuming that UB can't be there.

C code that used to work 20 years ago, because the UB in it resulted in some weird but non-catastrophic behavior, doesn't work at all compiled modern compilers.

Reelin · on Sept 14, 2020

> UB has nothing to do with optimization.

Other commenters already responded to this, but I thought I'd link an article I came across a while back that gives a concrete and easy to understand example of how UB can be leveraged for optimization by modern compilers. (https://devblogs.microsoft.com/oldnewthing/20140627-00/?p=63...)

saagarjha · on Sept 13, 2020

> You cannot even use a global variable safely!

Sorry? I’m unsure what you mean here, because there are plenty of ways to use globals in ways I would call “safe”: no undefined behavior, correct output, …

WalterBright · on Sept 13, 2020

In one C file you can declare a global:

    int x;

and in another:

    int* x;

and you'll be mixing pointers and ints and it won't be detected.

enriquto · on Sept 14, 2020

I was not talking about this, but about aliasing a variable on the same translation unit.

    int x = 7;
    void f() { /* do things using x */ }
    void insidious_function(int *p) { *p = 3; }

now, inside f you cannot be sure that x equals 7, even if you never write into it. You may call some functions, that in turn call the insidious function that receives the address of x as a parameter. There's no way to be sure that the value of x is not changed, just by looking at your code.

xxpor · on Sept 12, 2020

Isn't that what asan/ubsan is for?

Granted, it's not static analysis, but it should catch most aliasing related errors, no?

nsajko · on Sept 12, 2020

No. There has been some effort in that direction, somebody proposed a Clang "type sanitizer" patch, but it wasn't merged.

fizixer · on Sept 12, 2020

I'm fully in the camp of C plus powerful analysis tools, plus a high-level language (Python or Scheme).

pjmlp · on Sept 13, 2020

Since powerful analysis tools aren't defined by ISO what they are supposed to be, there are plenty of C compilers that will never get them.

Ar-Curunir · on Sept 12, 2020

only if your tests exercise that code path

_28jh · on Sept 13, 2020

You can't do compile time bounds checking, if that's what you're implying.

saagarjha · on Sept 13, 2020

Most, but not all. They are excellent tools but not perfect by any means.