Rate Limiting Algorithms Explained

Every API needs rate limiting. Without it, a single misbehaving client can overwhelm your servers, a bot can scrape your entire database, or a billing exploit can cost you thousands. But not all rate limiters are equal - each algorithm has distinct tradeoffs in accuracy, memory usage, and burst handling.

This guide implements three production-grade algorithms from scratch, explains when to use each, and shows how to deploy them with Redis for distributed systems.

Why Rate Limiting Matters

Protection: Prevent DDoS attacks and resource exhaustion
Fairness: Ensure no single client monopolizes capacity
Cost Control: Limit expensive operations (AI API calls, database queries)
Compliance: Enforce contractual API usage limits per tier

Algorithm 1: Token Bucket

The token bucket is the most widely used algorithm (used by AWS, Stripe, and most API gateways). Imagine a bucket that holds tokens. Tokens are added at a fixed rate. Each request consumes one token. If the bucket is empty, the request is rejected.

Key Properties

Allows bursts: A full bucket can handle a burst of requests up to the bucket capacity
Smooth refill: Tokens regenerate at a constant rate regardless of traffic
Simple and efficient: O(1) per request, minimal memory

import time

class TokenBucket:
    def __init__(self, capacity: int, refill_rate: float):
        """
        capacity: maximum tokens in the bucket
        refill_rate: tokens added per second
        """
        self.capacity = capacity
        self.refill_rate = refill_rate
        self.tokens = capacity
        self.last_refill = time.monotonic()

    def allow_request(self) -> bool:
        now = time.monotonic()
        elapsed = now - self.last_refill

        # Refill tokens based on elapsed time
        self.tokens = min(
            self.capacity,
            self.tokens + elapsed * self.refill_rate
        )
        self.last_refill = now

        if self.tokens >= 1:
            self.tokens -= 1
            return True
        return False

# Usage: 10 requests per second, burst up to 20
limiter = TokenBucket(capacity=20, refill_rate=10)

for i in range(25):
    if limiter.allow_request():
        print(f"Request {i}: allowed")
    else:
        print(f"Request {i}: rate limited (429)")

Distributed Token Bucket with Redis

import redis
import time

r = redis.Redis()

def token_bucket_redis(key: str, capacity: int, refill_rate: float) -> bool:
    pipe = r.pipeline()
    now = time.time()

    # Atomic Lua script for thread safety
    lua_script = """
    local key = KEYS[1]
    local capacity = tonumber(ARGV[1])
    local refill_rate = tonumber(ARGV[2])
    local now = tonumber(ARGV[3])

    local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
    local tokens = tonumber(bucket[1]) or capacity
    local last_refill = tonumber(bucket[2]) or now

    local elapsed = now - last_refill
    tokens = math.min(capacity, tokens + elapsed * refill_rate)

    if tokens >= 1 then
        tokens = tokens - 1
        redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
        redis.call('EXPIRE', key, math.ceil(capacity / refill_rate) + 1)
        return 1
    else
        redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
        redis.call('EXPIRE', key, math.ceil(capacity / refill_rate) + 1)
        return 0
    end
    """
    result = r.eval(lua_script, 1, key, capacity, refill_rate, now)
    return result == 1

Algorithm 2: Sliding Window Log

The sliding window log tracks the exact timestamp of every request. To check if a new request is allowed, count all requests in the past N seconds. This is the most accurate algorithm but uses more memory.

Key Properties

Exact counting: No boundary issues - the window truly slides
No bursts at boundaries: Unlike fixed windows, you cannot get 2x the limit at a window edge
Higher memory: Stores a timestamp per request (O(n) where n is the limit)

import time
from collections import deque

class SlidingWindowLog:
    def __init__(self, max_requests: int, window_seconds: float):
        self.max_requests = max_requests
        self.window_seconds = window_seconds
        self.requests = deque()

    def allow_request(self) -> bool:
        now = time.monotonic()
        window_start = now - self.window_seconds

        # Remove expired timestamps
        while self.requests and self.requests[0] <= window_start:
            self.requests.popleft()

        if len(self.requests) < self.max_requests:
            self.requests.append(now)
            return True
        return False

# Usage: 100 requests per 60 seconds
limiter = SlidingWindowLog(max_requests=100, window_seconds=60)

Redis Implementation with Sorted Sets

def sliding_window_redis(key: str, max_requests: int, window_secs: int) -> bool:
    lua_script = """
    local key = KEYS[1]
    local max_req = tonumber(ARGV[1])
    local window = tonumber(ARGV[2])
    local now = tonumber(ARGV[3])

    -- Remove expired entries
    redis.call('ZREMRANGEBYSCORE', key, 0, now - window)

    -- Count current requests
    local count = redis.call('ZCARD', key)

    if count < max_req then
        redis.call('ZADD', key, now, now .. '-' .. math.random(1000000))
        redis.call('EXPIRE', key, window)
        return 1
    end
    return 0
    """
    result = r.eval(lua_script, 1, key, max_requests, window_secs, time.time())
    return result == 1

Algorithm 3: Leaky Bucket

The leaky bucket processes requests at a fixed rate, like water dripping from a bucket with a hole. Incoming requests are queued. If the queue is full, new requests are rejected. This produces the smoothest output rate.

Key Properties

Constant output rate: Requests are processed at exactly the configured rate - no bursts
Queue-based: Excess requests wait in a queue rather than being immediately rejected
Best for: APIs that call expensive downstream services at a fixed rate

import time
import threading
from collections import deque

class LeakyBucket:
    def __init__(self, capacity: int, leak_rate: float):
        """
        capacity: max queued requests
        leak_rate: requests processed per second
        """
        self.capacity = capacity
        self.leak_rate = leak_rate
        self.queue = deque()
        self.lock = threading.Lock()
        self.last_leak = time.monotonic()

    def allow_request(self) -> bool:
        with self.lock:
            now = time.monotonic()
            elapsed = now - self.last_leak

            # Drain processed requests
            leaked = int(elapsed * self.leak_rate)
            if leaked > 0:
                for _ in range(min(leaked, len(self.queue))):
                    self.queue.popleft()
                self.last_leak = now

            if len(self.queue) < self.capacity:
                self.queue.append(now)
                return True
            return False

# Usage: queue up to 50 requests, process 10/second
limiter = LeakyBucket(capacity=50, leak_rate=10)

Algorithm Comparison

Feature	Token Bucket	Sliding Window	Leaky Bucket
Burst Handling	Allows controlled bursts	No bursts	No bursts (queued)
Accuracy	Good	Exact	Exact output rate
Memory	O(1)	O(n) per client	O(queue size)
Complexity	Simple	Moderate	Moderate
Best For	General API limiting	Strict per-client limits	Fixed-rate downstream calls
Used By	AWS, Stripe, nginx	Custom implementations	Network traffic shaping

HTTP Response Headers

Always communicate rate limit status to clients via standard headers:

# Successful response
HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 67
X-RateLimit-Reset: 1714233600

# Rate limited response
HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1714233600
Retry-After: 32

Django Middleware Example

from django.http import JsonResponse

class RateLimitMiddleware:
    def __init__(self, get_response):
        self.get_response = get_response

    def __call__(self, request):
        client_key = self.get_client_key(request)

        if not token_bucket_redis(f"rate:{client_key}", capacity=100, refill_rate=1.67):
            return JsonResponse(
                {"error": "Rate limit exceeded. Try again later."},
                status=429,
                headers={"Retry-After": "60"}
            )

        return self.get_response(request)

    def get_client_key(self, request):
        # Prefer API key, fall back to IP
        api_key = request.headers.get("X-API-Key")
        if api_key:
            return f"api:{api_key}"
        return f"ip:{request.META.get('REMOTE_ADDR')}"

Key Takeaways

Token bucket is the default choice - simple, efficient, handles bursts gracefully
Sliding window log is most accurate - use when strict per-client limits matter more than memory
Leaky bucket smooths output rate - ideal when downstream services need constant throughput
Use Redis Lua scripts for atomic, distributed rate limiting across multiple servers
Always return rate limit headers so clients can self-throttle
Rate limit by API key first, IP second - IP-based limiting breaks behind shared proxies

Rate limiting is not just a security feature - it is a reliability feature. A well-implemented rate limiter protects your servers, your budget, and your users from each other. Pick the right algorithm for your use case, implement it with Redis for distribution, and always communicate limits clearly to your clients.

Rate Limiting Algorithms Explained: Token Bucket, Sliding Window, and Leaky Bucket

Why Rate Limiting Matters

Algorithm 1: Token Bucket

Key Properties

Distributed Token Bucket with Redis

Algorithm 2: Sliding Window Log

Key Properties

Redis Implementation with Sorted Sets

Algorithm 3: Leaky Bucket

Key Properties

Algorithm Comparison

HTTP Response Headers

Django Middleware Example

Key Takeaways

Stuck on implementation?

Related Production Resources

Free learning tracks

Interactive engineering labs

Production cheatsheets

Key terms

Discussion

Discussion is unavailable

Why Rate Limiting Matters

Algorithm 1: Token Bucket

Key Properties

Distributed Token Bucket with Redis

Algorithm 2: Sliding Window Log

Key Properties

Redis Implementation with Sorted Sets

Algorithm 3: Leaky Bucket

Key Properties

Algorithm Comparison

HTTP Response Headers

Django Middleware Example

Key Takeaways

Stuck on implementation?

Related Production Resources

Free learning tracks

Interactive engineering labs

Production cheatsheets

Key terms

Discussion

Discussion is unavailable

Continue Reading

OAuth2 Private Key JWT: Build Client Authentication Without Shared Secrets

OIDC Workload Federation: Build Secretless Service Access