There likely was monitoring for various "problems" in production - error rates, validation failures etc, or even just good old crash counts.
An alert may have fired that lead to someone debugging the issue in detail.
I can totally imagine a slow creeping Metric Of Death that has slowly slowly slowly been creeping up for ages and then suddenly breaches some threshold and then becomes a problem.
An alert may have fired that lead to someone debugging the issue in detail.
I can totally imagine a slow creeping Metric Of Death that has slowly slowly slowly been creeping up for ages and then suddenly breaches some threshold and then becomes a problem.