I suspect that most of the knowledge on this topic is embedded in source code of most prominent compiler tool chain and the head of their dev.
:(
I think they might an interesting intersection here we ML, where can could learn the comment mistake pattern made by real user and either error correct better, or at least provide pin point accurate error messages.