May 24, 2024Last modified May 28, 2025
Notes on rate limiting and related algorithms
Rate limiting is a mechanism to protect your services from getting overused or misused through excessive number of requests. Rate limiters are often part of a API gateway. Gateway configurations specify the TPS limits for each route.
The other places where rate limiters can be implemented are your application, in service middleware & through cloud services.
Why Implement Rate Limiting ?
- Prevent Resource Starvation: Protect servers from being overwhelmed
- Security: Mitigate brute force attacks and DDoS attempts
- Fair Usage: Ensure equitable access for all users
- Cost Control: Manage operational costs by limiting excessive API usage
- Traffic Shaping: Smooth out request spikes
Rate limiting algorithms
- Token bucket
- Leaky bucket
- Fixed window
- Sliding window
Comparing all algorithms
Algorithm | When/Where Used | Key Nuance | Pros | Cons |
---|---|---|---|---|
Fixed Window Counter | Simple APIs, quick implementations | Resets counter abruptly at window end |
|
|
Token Bucket | APIs needing burst handling (e.g., payment gateways) | Refills tokens gradually but allows bursts up to capacity |
|
|
Leaky Bucket | Network traffic shaping, smooth output systems | Converts bursts into steady stream via queue |
|
|
Sliding Window Log | High-precision systems (e.g., banking APIs) | Tracks exact timestamp of every request |
|
|
Sliding Window Counter | General-purpose APIs (best balance) | Hybrid: weights current + previous window counts |
|
|
GCRA | Telecom systems, ATM networks | Uses "theoretical arrival time" calculations |
|
|
HTTP Headers for Rate Limiting
Well-implemented APIs communicate rate limits through headers:
- X-RateLimit-Limit: Maximum allowed requests
- X-RateLimit-Remaining: Remaining requests in window
- X-RateLimit-Reset: Time when limit resets (UTC timestamp)
- Retry-After: How long to wait after being rate limited
Implementation best practices
- Return Proper Status Codes:
- 429 Too Many Requests when rate limited
- 503 Service Unavailable for severe throttling
- Provide Clear Documentation about your rate limits
- Implement Graceful Degradation instead of hard cuts
- Consider Tiered Limits for different user types
- Monitor and Adjust limits based on actual usage patterns
- Use Exponential Backoff for clients to retry after being limited