Re Distributed Debugging / Centralized Monitoring, Logging and Alerting, this is exactly the kind of problems that our team at Takipi (www.takipi.com) tackles. It's a new way to get all of the information you need (source, stack, and state) to understand what's going on in a large distributed deployment in production - without relying on logs