Agreed, for a sufficiently hard definition of hard real time.
Hard real time gets a lot of theoretical attention, because it's sometimes provable. Whether the decisions you're making by the deadline are sensible and not, say, cranking the horizontal stabilizer all the way down when one of the angle-of-attack sensors is broken, are far more consequential in most systems than missing a tick.
Companies definitely shoot themselves in the foot by focusing so hard on the hard real time constraint that they make it 10x harder to reason about the actual behavior of the system, and then they get the more important thing wrong. I've seen this in a few systems, where it's almost impossible to discover what the control loop behavior is from reading the code.
Problem with GC is that its preferred mode of operation is to stop the world for much more than one tick.
Scanning a respectable-size heap on a respectably fast machine, sans fancy GC optimizations, could easily take 30 seconds. Modern production GCs rarely pause you for 30 seconds. Real time GCs certainly try very hard to avoid ever doing that. But:
- The optimizations that make a GC run faster than 30 sec are speculative: they target empirically found common cases. Not common cases of something the programmer has control over, but common cases of heap shape, which is a chaotic function of the GC itself, the way the OS lays out memory, the program, the program’s input, and lots of other stuff. Those common case optimizations are successful enough that GC cycle times often look more like 30 milliseconds than 30 seconds. So, the terrifying thought if you’re using a GC in real time is: what if at the worst time possible, the heap shape becomes something that isn’t common case, and the GC that you thought took 30 ms now takes 30 sec.
- Real time GCs can let the program do work while the GC is happening, so even if it takes 30 seconds, the program can keep chugging along. But here’s the catch: no memory is being reclaimed until the GC reaches the end, and some or all memory allocated during the GC cycle will remain unfreed until the next GC cycle. So if your 30ms concurrent collector decides to take 30sec instead, you’ll either run out of memory and crash or run out of memory and pause the world for 30sec.
Basically, the more you know about RTGC, the less you’ll want to use them when lives matter.
Hard real time gets a lot of theoretical attention, because it's sometimes provable. Whether the decisions you're making by the deadline are sensible and not, say, cranking the horizontal stabilizer all the way down when one of the angle-of-attack sensors is broken, are far more consequential in most systems than missing a tick.
Companies definitely shoot themselves in the foot by focusing so hard on the hard real time constraint that they make it 10x harder to reason about the actual behavior of the system, and then they get the more important thing wrong. I've seen this in a few systems, where it's almost impossible to discover what the control loop behavior is from reading the code.