I often see rate limits framed as a way to protect system capacity, but not enou...

I often see rate limits framed as a way to protect system capacity, but not enough follow-through with understanding why, even with appropriate rate limits, systems can fall over and require manual intervention to restore.

The easiest way to explain this is with a simple sequence of events: the database has a temporary issue, system capacity drops, clients start timing out or getting errors, load amplification kicks in with retries and request queueing, load is now higher than normal while capacity is lower than normal, devs work hard to get the database back in order, database looks restored but now the system has 3x the load it did before the incident, other heroic efforts are needed to shed load and/or upscale capacity, whew it's all working again! In the post-mortem there are lots of questions about why rate-limiting didn't protect the system. Unfortunately, the rate limit values required to restore the saturated system are far too low for normal usage, and the values needed for normal operation are too high to prevent the system from getting saturated.

Fundamentally, there's really no way for a rate limiting (which only understands incoming load) to balance the equation load <= capacity. For that, we need a back-pressure mechanism like circuit breaking, concurrency limiting, or adaptive request queueing. Fortunately, rate limiting and back-pressure work well together and don't have to know about each other.