May 24, 2024Last modified May 28, 2025

Notes on rate limiting and related algorithms

Rate limiting is a mechanism to protect your services from getting overused or misused through excessive number of requests. Rate limiters are often part of a API gateway. Gateway configurations specify the TPS limits for each route.

The other places where rate limiters can be implemented are your application, in service middleware & through cloud services.

Why Implement Rate Limiting ?

Prevent Resource Starvation: Protect servers from being overwhelmed
Security: Mitigate brute force attacks and DDoS attempts
Fair Usage: Ensure equitable access for all users
Cost Control: Manage operational costs by limiting excessive API usage
Traffic Shaping: Smooth out request spikes

Rate limiting algorithms

Token bucket

const DEFAULT_BUCKET_CAPACITY = 10;
const DEFULT_REFILL_INTERVAL = 1000;
const DEFAULT_REFILL_RATE = 2;

class TokenWindowRL {
    constructor(options = {}) {

        this.bucketCapacity = options?.bucketCapacity ?? DEFAULT_BUCKET_CAPACITY;
        this.refillInterval = options?.refillInterval ?? DEFULT_REFILL_INTERVAL;
        this.refillRate = options?.refillRate ?? DEFAULT_REFILL_RATE;
        this.callers = new Map();
    }

    _refillBucket(callerBucket, currentTime){
        const durationPassedSinceLastRefill = currentTime - callerBucket?.lastRefillTime
        const intervalsSinceLastRefill = Math.floor(durationPassedSinceLastRefill / this.refillInterval);

        if(intervalsSinceLastRefill > 0){
            callerBucket.tokens = Math.min(this.bucketCapacity, callerBucket.tokens + ( intervalsSinceLastRefill * this.refillRate ))
            callerBucket.lastRefillTime += intervalsSinceLastRefill * this.refillInterval;
        }
    }

    shouldAllowRequest(callerId){

        const currentTime = Date.now();

        // case : new caller
        if(!this.callers.has(callerId)){
            this.callers.set(callerId, {
                tokens : this.bucketCapacity,
                lastRefillTime : currentTime
            })
        }

        // case : handle existing caller
        const callerBucket = this.callers.get(callerId);
        this._refillBucket(callerBucket, currentTime);

        if(callerBucket.tokens > 0){
            callerBucket.tokens -=1;
            return true; // allow request
        }

        return false; 
    }
}

// Allow bursts of 10 requests, with 2 tokens added every second
const limiter = new TokenWindowRL();

// Simulate a burst of requests
console.log("Burst test:");
for (let i = 0; i < 12; i++) {
    console.log(`Request ${i+1}: ${limiter.shouldAllowRequest("user123") ? "Allowed" : "Rate limited"}`);
}

// Simulate waiting and trying again
setTimeout(() => {
    console.log("\nAfter 1 second:");
    for (let i = 0; i < 3; i++) {
        console.log(`Request ${i+1}: ${limiter.shouldAllowRequest("user123") ? "Allowed" : "Rate limited"}`);
    }
}, 1000);

const DEFAULT_BUCKET_CAPACITY = 10;
const DEFULT_REFILL_INTERVAL = 1000;
const DEFAULT_REFILL_RATE = 2;

class TokenWindowRL {
    constructor(options = {}) {

        this.bucketCapacity = options?.bucketCapacity ?? DEFAULT_BUCKET_CAPACITY;
        this.refillInterval = options?.refillInterval ?? DEFULT_REFILL_INTERVAL;
        this.refillRate = options?.refillRate ?? DEFAULT_REFILL_RATE;
        this.callers = new Map();
    }

    _refillBucket(callerBucket, currentTime){
        const durationPassedSinceLastRefill = currentTime - callerBucket?.lastRefillTime
        const intervalsSinceLastRefill = Math.floor(durationPassedSinceLastRefill / this.refillInterval);

        if(intervalsSinceLastRefill > 0){
            callerBucket.tokens = Math.min(this.bucketCapacity, callerBucket.tokens + ( intervalsSinceLastRefill * this.refillRate ))
            callerBucket.lastRefillTime += intervalsSinceLastRefill * this.refillInterval;
        }
    }

    shouldAllowRequest(callerId){

        const currentTime = Date.now();

        // case : new caller
        if(!this.callers.has(callerId)){
            this.callers.set(callerId, {
                tokens : this.bucketCapacity,
                lastRefillTime : currentTime
            })
        }

        // case : handle existing caller
        const callerBucket = this.callers.get(callerId);
        this._refillBucket(callerBucket, currentTime);

        if(callerBucket.tokens > 0){
            callerBucket.tokens -=1;
            return true; // allow request
        }

        return false; 
    }
}

// Allow bursts of 10 requests, with 2 tokens added every second
const limiter = new TokenWindowRL();

// Simulate a burst of requests
console.log("Burst test:");
for (let i = 0; i < 12; i++) {
    console.log(`Request ${i+1}: ${limiter.shouldAllowRequest("user123") ? "Allowed" : "Rate limited"}`);
}

// Simulate waiting and trying again
setTimeout(() => {
    console.log("\nAfter 1 second:");
    for (let i = 0; i < 3; i++) {
        console.log(`Request ${i+1}: ${limiter.shouldAllowRequest("user123") ? "Allowed" : "Rate limited"}`);
    }
}, 1000);

Leaky bucket

const BUCKET_CAPACITY = 10;
const DRIP_RATE = 2;
const DRIP_INTERVAL = 2000;

class LeakyBucketRL {
  constructor(options = {}) {
    this.bucketCapacity = options?.bucketCapacity ?? BUCKET_CAPACITY;
    this.dripRate = options?.dripRate ?? DRIP_RATE;
    this.dripInterval = options?.dripInterval ?? DRIP_INTERVAL;

    this.callers = new Map();
  }

  _drip(callerBucket, currentTime) {
    const passedInterval = currentTime - callerBucket?.lastDripTime;
    const intervalDripRate = Math.floor(passedInterval / this.dripRate); // process these many requests

    if (callerBucket.requests.length > intervalDripRate) {
      callerBucket.requests = callerBucket.requests.slice(intervalDripRate);
      callerBucket.lastDripTime += intervalDripRate * this.dripInterval;
    }
  }

  allowRequest(callerId) {
    if (!this.callers.has(callerId)) {
      this.callers.set(callerId, {
        requests: [],
        lastDripTime: Date.now(),
      });
    }

    const callerBucket = this.callers.get(callerId);
    _drip(callerBucket, Date.now());

    if (callerBucket.requests.length < this.bucketCapacity) {
      callerBucket.requests.push('');
      return true;
    }

    return false;
  }
}

const BUCKET_CAPACITY = 10;
const DRIP_RATE = 2;
const DRIP_INTERVAL = 2000;

class LeakyBucketRL {
  constructor(options = {}) {
    this.bucketCapacity = options?.bucketCapacity ?? BUCKET_CAPACITY;
    this.dripRate = options?.dripRate ?? DRIP_RATE;
    this.dripInterval = options?.dripInterval ?? DRIP_INTERVAL;

    this.callers = new Map();
  }

  _drip(callerBucket, currentTime) {
    const passedInterval = currentTime - callerBucket?.lastDripTime;
    const intervalDripRate = Math.floor(passedInterval / this.dripRate); // process these many requests

    if (callerBucket.requests.length > intervalDripRate) {
      callerBucket.requests = callerBucket.requests.slice(intervalDripRate);
      callerBucket.lastDripTime += intervalDripRate * this.dripInterval;
    }
  }

  allowRequest(callerId) {
    if (!this.callers.has(callerId)) {
      this.callers.set(callerId, {
        requests: [],
        lastDripTime: Date.now(),
      });
    }

    const callerBucket = this.callers.get(callerId);
    _drip(callerBucket, Date.now());

    if (callerBucket.requests.length < this.bucketCapacity) {
      callerBucket.requests.push('');
      return true;
    }

    return false;
  }
}

Fixed window

const FIXED_WINDOW_SIZE = 5000;
const MAX_ALLOWED_REQUESTS = 5;

class FixedWindowRL{
    constructor(options={}){
        this.windowSize = options?.windowSize || FIXED_WINDOW_SIZE;
        this.maxAllowedRequests = options?.maxAllowedRequests || MAX_ALLOWED_REQUESTS;
        this.tracker = new Map();
    }

    allowUserRequest(userId){
        const record = this.tracker.get(userId);
        const currentTime = Date.now();

        // new user or time window expired
        if(!record || currentTime - record?.startTime >= this.windowSize ){
            this.tracker.set(userId, {
                count : 1,
                startTime : currentTime
            });
            return true;
        }

        // no of requests limit reached
        if(record.count >= this.maxAllowedRequests ){
            return false;
        }

        record.count++;
        return true;
    }
}

// Allow 5 requests per minute
const limiter = new FixedWindowRL(60000, 5);

// Simulate requests
for (let i = 0; i < 6; i++) {
    const allowed = limiter.allowUserRequest("user123");
    console.log(`Request ${i + 1}: ${allowed ? "Allowed" : "Rate limited"}`);
}

const FIXED_WINDOW_SIZE = 5000;
const MAX_ALLOWED_REQUESTS = 5;

class FixedWindowRL{
    constructor(options={}){
        this.windowSize = options?.windowSize || FIXED_WINDOW_SIZE;
        this.maxAllowedRequests = options?.maxAllowedRequests || MAX_ALLOWED_REQUESTS;
        this.tracker = new Map();
    }

    allowUserRequest(userId){
        const record = this.tracker.get(userId);
        const currentTime = Date.now();

        // new user or time window expired
        if(!record || currentTime - record?.startTime >= this.windowSize ){
            this.tracker.set(userId, {
                count : 1,
                startTime : currentTime
            });
            return true;
        }

        // no of requests limit reached
        if(record.count >= this.maxAllowedRequests ){
            return false;
        }

        record.count++;
        return true;
    }
}

// Allow 5 requests per minute
const limiter = new FixedWindowRL(60000, 5);

// Simulate requests
for (let i = 0; i < 6; i++) {
    const allowed = limiter.allowUserRequest("user123");
    console.log(`Request ${i + 1}: ${allowed ? "Allowed" : "Rate limited"}`);
}

Sliding window

const DEFAULT_WINDOW_SIZE = 10000;
const DEFAULT_MAX_REQUESTS = 10;

class SlidingWindowRL {
  constructor(options = {}){
    this.msWindowSize = options?.msWindowSize ?? DEFAULT_WINDOW_SIZE;
    this.maxRequests = options?.maxRequests ?? DEFAULT_MAX_REQUESTS;
    this.callers = new Map();
  }
  
  shouldAllowRequest(caller){
    
    // case - first request from caller
    if(!this.callers.has(caller)){
      this.callers.set(caller, {
        requests : [Date.now()]
      })
    }
    
    const callerRequestsLog = this.callers.get(caller);
    
    // cleanup requsts older than current time window
    const currentTimeWindow = Date.now() - this.msWindowSize;
    
    callerRequestsLog.requests = callerRequestsLog.requests.filter(ts => ts >= currentTimeWindow );
    
    if(callerRequestsLog.requests.length < this.maxRequests) {
       callerRequestsLog.requests.push(Date.now());
       return true;
    }
    
    return false;
    
  }
}

const DEFAULT_WINDOW_SIZE = 10000;
const DEFAULT_MAX_REQUESTS = 10;

class SlidingWindowRL {
  constructor(options = {}){
    this.msWindowSize = options?.msWindowSize ?? DEFAULT_WINDOW_SIZE;
    this.maxRequests = options?.maxRequests ?? DEFAULT_MAX_REQUESTS;
    this.callers = new Map();
  }
  
  shouldAllowRequest(caller){
    
    // case - first request from caller
    if(!this.callers.has(caller)){
      this.callers.set(caller, {
        requests : [Date.now()]
      })
    }
    
    const callerRequestsLog = this.callers.get(caller);
    
    // cleanup requsts older than current time window
    const currentTimeWindow = Date.now() - this.msWindowSize;
    
    callerRequestsLog.requests = callerRequestsLog.requests.filter(ts => ts >= currentTimeWindow );
    
    if(callerRequestsLog.requests.length < this.maxRequests) {
       callerRequestsLog.requests.push(Date.now());
       return true;
    }
    
    return false;
    
  }
}

Comparing all algorithms

Algorithm	When/Where Used	Key Nuance	Pros	Cons
Fixed Window Counter	Simple APIs, quick implementations	Resets counter abruptly at window end	Simple to implement Low memory usage	Allows bursts at window edges Less precise
Token Bucket	APIs needing burst handling (e.g., payment gateways)	Refills tokens gradually but allows bursts up to capacity	Allows controlled bursts Flexible rate adjustment	More complex than fixed window Still permits some bursts
Leaky Bucket	Network traffic shaping, smooth output systems	Converts bursts into steady stream via queue	Ensures smooth output rate Prevents all bursts	Queue memory overhead Stricter than necessary for some use cases
Sliding Window Log	High-precision systems (e.g., banking APIs)	Tracks exact timestamp of every request	Most accurate limiting No edge-case bursts	High memory usage CPU intensive for cleanup
Sliding Window Counter	General-purpose APIs (best balance)	Hybrid: weights current + previous window counts	Good precision with low memory No sudden resets	Slightly more complex than fixed window Still approximates
GCRA	Telecom systems, ATM networks	Uses "theoretical arrival time" calculations	Extremely precise Good for cell-based systems	Complex to implement Overkill for most web APIs

HTTP Headers for Rate Limiting

Well-implemented APIs communicate rate limits through headers:

X-RateLimit-Limit: Maximum allowed requests
X-RateLimit-Remaining: Remaining requests in window
X-RateLimit-Reset: Time when limit resets (UTC timestamp)
Retry-After: How long to wait after being rate limited

Implementation best practices

Return Proper Status Codes:

429 Too Many Requests when rate limited
503 Service Unavailable for severe throttling

Provide Clear Documentation about your rate limits
Implement Graceful Degradation instead of hard cuts
Consider Tiered Limits for different user types
Monitor and Adjust limits based on actual usage patterns
Use Exponential Backoff for clients to retry after being limited