It's like watching a murder mystery unfold. It feels really daunting to dive this deep into a bug on its vague symptoms. It's probably the selection bias for what gets on the HN front page, but it feels like a large minority here can tackle something like this. I have trouble imaging having that much of a handle on Linux to feel comfortable hot patching the kernel because I suspect something is wrong in the networking stack.
As somebody who implemented a small user-space tcp long ago, I get always uneasy when people tell me they just put events into some message queue and never consider all the edge cases that can happen when either the MQ or the consuming servers choke up. The problems are pretty much the same as with TCP flow control. It is easy to build a software that only appears to be working well.