We've fleshed this out a bit more in the Prometheus best practices at http://prometheus.io/docs/practices/alerting/
Taking this approach at our company greatly reduced the alert count and improved responsiveness with no degradation in service.
We've fleshed this out a bit more in the Prometheus best practices at http://prometheus.io/docs/practices/alerting/
Taking this approach at our company greatly reduced the alert count and improved responsiveness with no degradation in service.