Below I'm going to provide a deep dive in to rate limiting algoriths and how we selected the right algorithm for rate limiting with Dragonly. If you want to jump to the section on using this API click here.
At my previous company, we faced the challenge of handling over 300K incoming requests per second. Although we successfully constructed a stateless auto-scaling system that could handle the volume, there were some backend services that struggled with the load. Consequently, we developed a rate limiter to manage the traffic to these services. Despite several iterations and multiple post-mortem analyses, our solution remained imperfect. In reality, creating a reliable rate limiter for high throughput workloads is a significant challenge.
This problem was not unique to our team. Rate limiters are used in a variet of use cases to regulate the rate of executing actions or accessing resources. Common use cases include:
As developers, we all need to use or build a rate limiter at some point in our careers. There are many algorithms to choose from, let's look at some of most common ones first.
Each algorithm fits a slightly different use case. In the case of Dragonfly, we were looking for an algorithm that would be versatile enough to cover a wide range of use cases but will require low computational and memory overhead. We decided to implement a variation of the leaky bucket algorithm called the "Generic Cell Rate Algorithm" (GCRA). Now, let's delve into the specifics of this algorithm.
Although its name can be confusing ("Cell Rate"), GCRA is a form of a leaky bucket algorithm that was used to regulate traffic in a communication technology called Asynchronous Transfer Mode (ATM). In ATM, data was split into fixed-size packets called "cells" (Hence the name). GCRA was used by ATM network's scheduler to control the flow of "cells". If too many cells came in too quickly, GCRA would delay or drop them out.
In GCRA “emission interval” (T) represents the rate at which the bucket is drained. Each request has a "cost" which is a multiplier of the "emission interval" to represent the increment time units to be added to the bucket. GCRA uses a single time variable called “theoretical arrival time” (TAT) to mark the next time a request can arrive without being rejected.
TAT is seeded on the first request to be the arrival time of the request plus a request cost. When a subsequent request arrives, we calculate the earliest arrival time of a request by subtracting burst capacity (τ + T) from TAT. If the earliest arrival time is in the past, we allow the request. Otherwise, the request is rejected. After a successful request, a new TAT is calculated by adding the request cost (fill the bucket).
The description is confusing enough, and a code sample will not make it much more clear, so instead, let's look at a visual abstraction.
As you can see, the algorithm requires storing a single variable (TAT) and performing only a few simple arithmetic operations. This makes it perfect for in-memory data stores. Now Let's see how it is implemented in Dragonfly.
Dragonfly is an in-memory datastore that scales vertically to support millions of operations per second and terabyte-sized workloads on a single instance. It is fully compatible with the Redis ecosystem and requires no code changes to implement. Using Dragonfly's CL.THROTTLE command (shout out to the redis-cell project that origingally added this commannd for Redis) you can run various rate limiter configurations to best suit your use case.
CL.THROTTLE emailGW 20 120 60 1 ▲ ▲ ▲ ▲ ▲ | | | | └──── apply 1 token | | └──┴─────── 120 tokens / 60 seconds | └───────────── 20 Capacity └─────────────────── key "emailGW"
Let's take a look at each of the parameters in the CL.THROTTLE emailGW 20 120 60 1 command. emailGW is a bucket with a capacity of 20 leaking at the rate of 120 unites per 60 seconds (2 unites per second). A single token is applied against the rate limit of the key emailGW.
Key: The key name (emailGW) is used to identify the rate limiter in Dragonfly. It can be any string you choose and is used to store the state of the rate limiter.
Capacity: Represents the bucket capacity. Or the maximum number of units allowed in a time window. In this case, 20.
Leak rate: This parameter sets the rate at which the bucket leaks, which is the number of units (requests) leaking from the bucket on every time window. In this case, the rate is 120 units per 60 seconds (two units every second).
Time window: This parameter sets the time window for the rate limiter. In this case, the time window is 60 seconds.
Apply Unit: This parameter sets the cost of the request. In this case, a single unit (1) should be applied against the rate limit of the key. If omitted, the default value is 1.
127.0.0.1:6379> CL.THROTTLE emailGW 20 120 60 1 1) (integer) 0 ──── limited? 2) (integer) 21 ──── RateLimit-Limit 3) (integer) 20 ──── RateLimit-Remaining 4) (integer) -1 ──── Retry-After 5) (integer) 0 ──── RateLimit-Reset
Let's look at the meaning of each item in the result array.
I assume you have a Dragonfly instance running. If not, please refer to the getting started guide to get a Dragonfly running on your machine. Also, ensure you are connected to Dragonfly using the redis-cli tool.
Let's start with a simple example where you have an application that accepts new user registrations. Suppose, after each registration, you have to send a welcome email, where the email gateway has a 120 emails per minute limit.
A trivial use of CL.THROTTLE will look like this (with capacity = 0).
127.0.0.1:6379> CL.THROTTLE emailGW 0 120 60 1) (integer) 0 2) (integer) 1 3) (integer) 0 4) (integer) -1 5) (integer) 2
In this case, a request is allowed every 0.5 seconds. Any additional request during the 0.5 seconds will be rejected. This is nice, but in most cases, we are not getting an email precisely every 0.5 seconds. This "Discrete" behavior can be useful in cases where the operation occupies the CPU for 0.5 seconds. But in most cases, we would like a more smooth behavior. For this, we can use the capacity parameter.
The capacity parameter allows us to define the number of units the limiter will accept in the given window without limiting when the requests are coming. In the following example, we allow up to 10 requests to run simultaneously.
127.0.0.1:6379> CL.THROTTLE emailGW 10 120 60 1) (integer) 0 2) (integer) 11 3) (integer) 10 4) (integer) -1 5) (integer) 0
Note: In this configuration, we theoretically allow 130 requests per 60 seconds as the bucket capacity is 10 leaking at a rate of 120 units per minute. To smooth this behavior, you can configure CL.THROTTLE emailGW 10 1200 600. (Now it will allow a max of 121 requests per minute and not 130)
With request cost
Not all requests are equal. Some take more resources than others. For example, an HTTP server can count a simple GET request as a single unit but a more complex "Search" request as two units or more. The "Apply units" parameter allows to assign a cost factor to every request. In the next example, we apply a cost of 2 units for emails with attachments, so 120 emails could be sent per minute but only 60 emails with attachments (or 90 emails and 15 with attachments)
127.0.0.1:6379> CL.THROTTLE emailGW 10 120 60 2 1) (integer) 0 2) (integer) 11 3) (integer) 9 4) (integer) -1 5) (integer) 1
What if we keep the same key and change the other parameters?
Rate limiting parameters are provided with every call to CL.THROTTLE so that limits can easily be reconfigured on the fly. In most cases, you will use the exact same parameters (configuration) for a given key. But in some cases, like with the usage of apply units, requests for the same key with a different configuration makes sense.
Rate limiters are an essential part of software, especially software that deals with heavy traffic, in order to regulate resource access and prevent overloading or spamming. We added CL.THROTTLE to Dragonfly to simplify the work of developers and allow you to create realtime applications easily while protecting your users and backend resources.
The IETF is currently publishing a standard for HTTP rate-limiting headers to be used with RESTful API servers, for example. So maybe in my next post, I will create a real example using the Dragonfly rate limiter for a RESTful API server. Stay tuned!