May 24, 2024Last modified May 28, 2025

Notes on rate limiting and related algorithms

Rate limiting is a mechanism to protect your services from getting overused or misused through excessive number of requests. Rate limiters are often part of a API gateway. Gateway configurations specify the TPS limits for each route.

The other places where rate limiters can be implemented are your application, in service middleware & through cloud services.

Why Implement Rate Limiting ?

  • Prevent Resource Starvation: Protect servers from being overwhelmed
  • Security: Mitigate brute force attacks and DDoS attempts
  • Fair Usage: Ensure equitable access for all users
  • Cost Control: Manage operational costs by limiting excessive API usage
  • Traffic Shaping: Smooth out request spikes

Rate limiting algorithms

  1. Token bucket
const DEFAULT_BUCKET_CAPACITY = 10;
const DEFULT_REFILL_INTERVAL = 1000;
const DEFAULT_REFILL_RATE = 2;

class TokenWindowRL {
constructor(options = {}) {

this.bucketCapacity = options?.bucketCapacity ?? DEFAULT_BUCKET_CAPACITY;
this.refillInterval = options?.refillInterval ?? DEFULT_REFILL_INTERVAL;
this.refillRate = options?.refillRate ?? DEFAULT_REFILL_RATE;
this.callers = new Map();
}

_refillBucket(callerBucket, currentTime){
const durationPassedSinceLastRefill = currentTime - callerBucket?.lastRefillTime
const intervalsSinceLastRefill = Math.floor(durationPassedSinceLastRefill / this.refillInterval);

if(intervalsSinceLastRefill > 0){
callerBucket.tokens = Math.min(this.bucketCapacity, callerBucket.tokens + ( intervalsSinceLastRefill * this.refillRate ))
callerBucket.lastRefillTime += intervalsSinceLastRefill * this.refillInterval;
}
}

shouldAllowRequest(callerId){

const currentTime = Date.now();

// case : new caller
if(!this.callers.has(callerId)){
this.callers.set(callerId, {
tokens : this.bucketCapacity,
lastRefillTime : currentTime
})
}

// case : handle existing caller
const callerBucket = this.callers.get(callerId);
this._refillBucket(callerBucket, currentTime);

if(callerBucket.tokens > 0){
callerBucket.tokens -=1;
return true; // allow request
}

return false;
}
}

// Allow bursts of 10 requests, with 2 tokens added every second
const limiter = new TokenWindowRL();

// Simulate a burst of requests
console.log("Burst test:");
for (let i = 0; i < 12; i++) {
console.log(`Request ${i+1}: ${limiter.shouldAllowRequest("user123") ? "Allowed" : "Rate limited"}`);
}

// Simulate waiting and trying again
setTimeout(() => {
console.log("\nAfter 1 second:");
for (let i = 0; i < 3; i++) {
console.log(`Request ${i+1}: ${limiter.shouldAllowRequest("user123") ? "Allowed" : "Rate limited"}`);
}
}, 1000);
const DEFAULT_BUCKET_CAPACITY = 10;
const DEFULT_REFILL_INTERVAL = 1000;
const DEFAULT_REFILL_RATE = 2;

class TokenWindowRL {
constructor(options = {}) {

this.bucketCapacity = options?.bucketCapacity ?? DEFAULT_BUCKET_CAPACITY;
this.refillInterval = options?.refillInterval ?? DEFULT_REFILL_INTERVAL;
this.refillRate = options?.refillRate ?? DEFAULT_REFILL_RATE;
this.callers = new Map();
}

_refillBucket(callerBucket, currentTime){
const durationPassedSinceLastRefill = currentTime - callerBucket?.lastRefillTime
const intervalsSinceLastRefill = Math.floor(durationPassedSinceLastRefill / this.refillInterval);

if(intervalsSinceLastRefill > 0){
callerBucket.tokens = Math.min(this.bucketCapacity, callerBucket.tokens + ( intervalsSinceLastRefill * this.refillRate ))
callerBucket.lastRefillTime += intervalsSinceLastRefill * this.refillInterval;
}
}

shouldAllowRequest(callerId){

const currentTime = Date.now();

// case : new caller
if(!this.callers.has(callerId)){
this.callers.set(callerId, {
tokens : this.bucketCapacity,
lastRefillTime : currentTime
})
}

// case : handle existing caller
const callerBucket = this.callers.get(callerId);
this._refillBucket(callerBucket, currentTime);

if(callerBucket.tokens > 0){
callerBucket.tokens -=1;
return true; // allow request
}

return false;
}
}

// Allow bursts of 10 requests, with 2 tokens added every second
const limiter = new TokenWindowRL();

// Simulate a burst of requests
console.log("Burst test:");
for (let i = 0; i < 12; i++) {
console.log(`Request ${i+1}: ${limiter.shouldAllowRequest("user123") ? "Allowed" : "Rate limited"}`);
}

// Simulate waiting and trying again
setTimeout(() => {
console.log("\nAfter 1 second:");
for (let i = 0; i < 3; i++) {
console.log(`Request ${i+1}: ${limiter.shouldAllowRequest("user123") ? "Allowed" : "Rate limited"}`);
}
}, 1000);
  1. Leaky bucket
const BUCKET_CAPACITY = 10;
const DRIP_RATE = 2;
const DRIP_INTERVAL = 2000;

class LeakyBucketRL {
constructor(options = {}) {
this.bucketCapacity = options?.bucketCapacity ?? BUCKET_CAPACITY;
this.dripRate = options?.dripRate ?? DRIP_RATE;
this.dripInterval = options?.dripInterval ?? DRIP_INTERVAL;

this.callers = new Map();
}

_drip(callerBucket, currentTime) {
const passedInterval = currentTime - callerBucket?.lastDripTime;
const intervalDripRate = Math.floor(passedInterval / this.dripRate); // process these many requests

if (callerBucket.requests.length > intervalDripRate) {
callerBucket.requests = callerBucket.requests.slice(intervalDripRate);
callerBucket.lastDripTime += intervalDripRate * this.dripInterval;
}
}

allowRequest(callerId) {
if (!this.callers.has(callerId)) {
this.callers.set(callerId, {
requests: [],
lastDripTime: Date.now(),
});
}

const callerBucket = this.callers.get(callerId);
_drip(callerBucket, Date.now());

if (callerBucket.requests.length < this.bucketCapacity) {
callerBucket.requests.push('');
return true;
}

return false;
}
}
const BUCKET_CAPACITY = 10;
const DRIP_RATE = 2;
const DRIP_INTERVAL = 2000;

class LeakyBucketRL {
constructor(options = {}) {
this.bucketCapacity = options?.bucketCapacity ?? BUCKET_CAPACITY;
this.dripRate = options?.dripRate ?? DRIP_RATE;
this.dripInterval = options?.dripInterval ?? DRIP_INTERVAL;

this.callers = new Map();
}

_drip(callerBucket, currentTime) {
const passedInterval = currentTime - callerBucket?.lastDripTime;
const intervalDripRate = Math.floor(passedInterval / this.dripRate); // process these many requests

if (callerBucket.requests.length > intervalDripRate) {
callerBucket.requests = callerBucket.requests.slice(intervalDripRate);
callerBucket.lastDripTime += intervalDripRate * this.dripInterval;
}
}

allowRequest(callerId) {
if (!this.callers.has(callerId)) {
this.callers.set(callerId, {
requests: [],
lastDripTime: Date.now(),
});
}

const callerBucket = this.callers.get(callerId);
_drip(callerBucket, Date.now());

if (callerBucket.requests.length < this.bucketCapacity) {
callerBucket.requests.push('');
return true;
}

return false;
}
}
  1. Fixed window
const FIXED_WINDOW_SIZE = 5000;
const MAX_ALLOWED_REQUESTS = 5;

class FixedWindowRL{
constructor(options={}){
this.windowSize = options?.windowSize || FIXED_WINDOW_SIZE;
this.maxAllowedRequests = options?.maxAllowedRequests || MAX_ALLOWED_REQUESTS;
this.tracker = new Map();
}

allowUserRequest(userId){
const record = this.tracker.get(userId);
const currentTime = Date.now();

// new user or time window expired
if(!record || currentTime - record?.startTime >= this.windowSize ){
this.tracker.set(userId, {
count : 1,
startTime : currentTime
});
return true;
}

// no of requests limit reached
if(record.count >= this.maxAllowedRequests ){
return false;
}

record.count++;
return true;
}
}

// Allow 5 requests per minute
const limiter = new FixedWindowRL(60000, 5);

// Simulate requests
for (let i = 0; i < 6; i++) {
const allowed = limiter.allowUserRequest("user123");
console.log(`Request ${i + 1}: ${allowed ? "Allowed" : "Rate limited"}`);
}
const FIXED_WINDOW_SIZE = 5000;
const MAX_ALLOWED_REQUESTS = 5;

class FixedWindowRL{
constructor(options={}){
this.windowSize = options?.windowSize || FIXED_WINDOW_SIZE;
this.maxAllowedRequests = options?.maxAllowedRequests || MAX_ALLOWED_REQUESTS;
this.tracker = new Map();
}

allowUserRequest(userId){
const record = this.tracker.get(userId);
const currentTime = Date.now();

// new user or time window expired
if(!record || currentTime - record?.startTime >= this.windowSize ){
this.tracker.set(userId, {
count : 1,
startTime : currentTime
});
return true;
}

// no of requests limit reached
if(record.count >= this.maxAllowedRequests ){
return false;
}

record.count++;
return true;
}
}

// Allow 5 requests per minute
const limiter = new FixedWindowRL(60000, 5);

// Simulate requests
for (let i = 0; i < 6; i++) {
const allowed = limiter.allowUserRequest("user123");
console.log(`Request ${i + 1}: ${allowed ? "Allowed" : "Rate limited"}`);
}
  1. Sliding window
const DEFAULT_WINDOW_SIZE = 10000;
const DEFAULT_MAX_REQUESTS = 10;

class SlidingWindowRL {
constructor(options = {}){
this.msWindowSize = options?.msWindowSize ?? DEFAULT_WINDOW_SIZE;
this.maxRequests = options?.maxRequests ?? DEFAULT_MAX_REQUESTS;
this.callers = new Map();
}

shouldAllowRequest(caller){

// case - first request from caller
if(!this.callers.has(caller)){
this.callers.set(caller, {
requests : [Date.now()]
})
}

const callerRequestsLog = this.callers.get(caller);

// cleanup requsts older than current time window
const currentTimeWindow = Date.now() - this.msWindowSize;

callerRequestsLog.requests = callerRequestsLog.requests.filter(ts => ts >= currentTimeWindow );

if(callerRequestsLog.requests.length < this.maxRequests) {
callerRequestsLog.requests.push(Date.now());
return true;
}

return false;

}
}
const DEFAULT_WINDOW_SIZE = 10000;
const DEFAULT_MAX_REQUESTS = 10;

class SlidingWindowRL {
constructor(options = {}){
this.msWindowSize = options?.msWindowSize ?? DEFAULT_WINDOW_SIZE;
this.maxRequests = options?.maxRequests ?? DEFAULT_MAX_REQUESTS;
this.callers = new Map();
}

shouldAllowRequest(caller){

// case - first request from caller
if(!this.callers.has(caller)){
this.callers.set(caller, {
requests : [Date.now()]
})
}

const callerRequestsLog = this.callers.get(caller);

// cleanup requsts older than current time window
const currentTimeWindow = Date.now() - this.msWindowSize;

callerRequestsLog.requests = callerRequestsLog.requests.filter(ts => ts >= currentTimeWindow );

if(callerRequestsLog.requests.length < this.maxRequests) {
callerRequestsLog.requests.push(Date.now());
return true;
}

return false;

}
}

Comparing all algorithms

AlgorithmWhen/Where UsedKey NuanceProsCons
Fixed Window CounterSimple APIs, quick implementationsResets counter abruptly at window end
  • Simple to implement
  • Low memory usage
  • Allows bursts at window edges
  • Less precise
Token BucketAPIs needing burst handling (e.g., payment gateways)Refills tokens gradually but allows bursts up to capacity
  • Allows controlled bursts
  • Flexible rate adjustment
  • More complex than fixed window
  • Still permits some bursts
Leaky BucketNetwork traffic shaping, smooth output systemsConverts bursts into steady stream via queue
  • Ensures smooth output rate
  • Prevents all bursts
  • Queue memory overhead
  • Stricter than necessary for some use cases
Sliding Window LogHigh-precision systems (e.g., banking APIs)Tracks exact timestamp of every request
  • Most accurate limiting
  • No edge-case bursts
  • High memory usage
  • CPU intensive for cleanup
Sliding Window CounterGeneral-purpose APIs (best balance)Hybrid: weights current + previous window counts
  • Good precision with low memory
  • No sudden resets
  • Slightly more complex than fixed window
  • Still approximates
GCRATelecom systems, ATM networksUses "theoretical arrival time" calculations
  • Extremely precise
  • Good for cell-based systems
  • Complex to implement
  • Overkill for most web APIs

HTTP Headers for Rate Limiting

Well-implemented APIs communicate rate limits through headers:

  • X-RateLimit-Limit: Maximum allowed requests
  • X-RateLimit-Remaining: Remaining requests in window
  • X-RateLimit-Reset: Time when limit resets (UTC timestamp)
  • Retry-After: How long to wait after being rate limited

Implementation best practices

  1. Return Proper Status Codes:
  • 429 Too Many Requests when rate limited
  • 503 Service Unavailable for severe throttling
  1. Provide Clear Documentation about your rate limits
  2. Implement Graceful Degradation instead of hard cuts
  3. Consider Tiered Limits for different user types
  4. Monitor and Adjust limits based on actual usage patterns
  5. Use Exponential Backoff for clients to retry after being limited