A few of extra considerations picked up over many years of hard lessons: 1. Rate...

dskrvk · on May 17, 2024

> add a deterministic offset to each user/session window so that no large group of rate limits can expire at the same time

Did you mean non-deterministic (like jitter)?

refibrillator · on May 17, 2024

GP meant deterministically add jitter.

Long ago I was responsible for implementing a “rate limiting algorithm”, but not for HTTP requests. It was for an ML pipeline, with human technicians in a lab preparing reports for doctors and in dire cases calling their phone direct. Well my algorithm worked great, it reduced a lot of redundant work while preserving sensitivity to critical events. Except, some of the most common and benign events had a rate limit of 1 per day.

So every midnight UTC, the rate limit quotas for all patients would “reset” as the time stamp rolled over. Suddenly the humans in the lab would be overwhelmed with a large amount of work in a very short time. But by the end of the shift, there would be hardly anything left to do.

Fortunately it was trivial to add a random but deterministic per patient offset (I hashed the patient id into a numeric offset).

That smoothly distributed the work throughout the day, to the relief of quite a few folks.

solatic · on May 18, 2024

> Rate limits don't really protect against backend capacity issues

Yes and no, there's a little more nuance here. You're correct that the business signing up X new users, each with new rate ~limits~ allocations, does not in and of itself scale up your backend resources, i.e. it's not naively going to vertically scale a Postgres database you rely on. But having a hardware rate limiter in front is like setting the value on the "max" setting on your autoscaler - it prevents autoscaling cost from skyrocketing out of control when the source of the traffic is malicious/result of a bug/"bad"; instead a human is put in the loop to guage that the traffic is "good" and therefore the rate limit should be increased.

> Rate limits that will be lifted when someone complains might as well be advisory-only and not actually return a 429

How does one set an "advisory-only" rate limit that's not a 429? You can still return a body with a 429 with directions on how to ask for a rate limit increase. I don't think of 4xx as meaning that the URL will never return something other than a 4xx, rather that the URL will continue to return 4xx without human intervention. For example, if you're going to write blog.example.com/new-blog-entry, before you publish it, it's a 404, then after the blog post is published, it will return a 200 instead.

abrahms · on May 26, 2024

You could set it in a header (w3c baggage?) which is monitored by a dashboard, as an example.

IgorPartola · on May 17, 2024

What exactly do you mean by your first point?

m104 · on May 20, 2024

I often see rate limits framed as a way to protect system capacity, but not enough follow-through with understanding why, even with appropriate rate limits, systems can fall over and require manual intervention to restore.

The easiest way to explain this is with a simple sequence of events: the database has a temporary issue, system capacity drops, clients start timing out or getting errors, load amplification kicks in with retries and request queueing, load is now higher than normal while capacity is lower than normal, devs work hard to get the database back in order, database looks restored but now the system has 3x the load it did before the incident, other heroic efforts are needed to shed load and/or upscale capacity, whew it's all working again! In the post-mortem there are lots of questions about why rate-limiting didn't protect the system. Unfortunately, the rate limit values required to restore the saturated system are far too low for normal usage, and the values needed for normal operation are too high to prevent the system from getting saturated.

Fundamentally, there's really no way for a rate limiting (which only understands incoming load) to balance the equation load <= capacity. For that, we need a back-pressure mechanism like circuit breaking, concurrency limiting, or adaptive request queueing. Fortunately, rate limiting and back-pressure work well together and don't have to know about each other.

trevor-e · on May 17, 2024

A rate-limit is most often some arbitrarily configured static value, e.g. each org can make 10 req/s. It's much harder to configure rate limits against some dynamic value like system load so most go with the static value approach.

Like the OP said, this doesn't protect you from going over your system capacity, you can still have 10 million orgs all requesting 10 req/s which can take down your system while abiding by your rate limits.

throwaway63467 · on May 17, 2024

Maybe that per-customer rate limits don’t guarantee the whole backend won’t go over capacity? Though I guess many apis will have global rate limits as well for these cases.

jgalt212 · on May 17, 2024

> protect against backend capacity issues

That's our primary use case, so I am also curious to hear more.

m104 · on May 20, 2024

Answered above!

foota · on May 17, 2024

Great advice!

Ideally, you can provide isolation between users on the same "tier" so that no one user can crowd out others.