Thinking this could be useful in a multi tenants service where you need to fairly allocate job processing capacity across tenants to a number of background workers (like data export api requests, encoding requests etc.)
That was my first thought as well. However, in a lot of real world cases, what matters is not the frequency of requests, but the duration of the jobs. For instance, one client might request a job that takes minutes or hours to complete, while another may only have requests that take a couple of seconds to complete. I don't think this library handles such cases.
Lots of heuristics continue to work pretty well as long as the least and greatest are within an order of magnitude of each other. It’s one of the reasons why we break stories down to 1-10 business days. Anything bigger and the statistical characteristics begin to break down.
That said, it’s quite easy for a big job to exceed 50x the cost of the smallest job.
defining a unit of processing like duration or quantity and then feeding the algorithm with the equivalent of units consumed (pre or post processing a request) might help.
To mitigate this case you could limit capacity in terms of concurrency instead of request rate. Basically it would be like a fairly-acquired semaphore.
I believe nginx+ has a feature that does max-conns by IP address. It’s a similar solution to what you describe. Of course that falls down wrt fairness when fanout causes the cost of a request to not be proportional to the response time.
The text suggests a method for managing GPU or rate-limited resources across multiple clients. It highlights the problem of spikey workloads, where a client might generate a large number of events (e.g., from a CSV upload) causing resource starvation. The text advises against using naive solutions like FIFO, which could disadvantage clients with steady live traffic.
Rate limiters are used to protect servers from overload and to prevent attackers--or even legitimate but unintentionally greedy tenants--from starving other tenants of resources. They are a key component of a resilient distributed system.