Rate Limiter

Overview
Introduction
A rate limiter is a tool or mechanism that controls the number of requests or actions a user or system can perform within a specific time frame.
Think of it like a bouncer at a club, deciding how many people can enter at a time to keep everything running smoothly.
Use Cases
- Prevent Overload: They stop too many requests from overwhelming a system, ensuring it remains responsive and doesn’t crash.
- Protect Against DDoS Attacks: Rate limiters help mitigate Distributed Denial of Service (DDoS) attacks by controlling the flood of incoming traffic from multiple sources aimed at overwhelming a system.
- Fair Usage: They ensure that all users have equal access to resources, preventing a few users from hogging all the bandwidth or computing power.
- Cost Management: By controlling usage, they help manage and predict costs, especially in services that charge based on usage.
Where to Implement?
- Where: On the user’s device, such as within a mobile app or web browser.
- Purpose: To prevent the client from making too many requests to the server, either accidentally or intentionally.
- Pros:
- Immediate Feedback: Users receive instant responses about request limits without needing server interaction.
- Reduced Server Load: Limits some traffic before it even reaches the server, saving resources.
- Cons:
- Easily Bypassed: Users can modify or disable the limiter, rendering it ineffective.
- Inconsistent Enforcement: Different clients might implement rate limiting differently, leading to uneven control.
- Where: On the backend servers that handle requests from users.
- Purpose: To protect the server from being overwhelmed by too many requests at once.
- Pros:
- Centralized Control: Ensures consistent enforcement of rate limits across all clients.
- Enhanced Security: More difficult for attackers to bypass compared to client-side limits.
- Cons:
- Increased Server Load: Rate limiting logic consumes server resources.
- Scalability Challenges: May require additional infrastructure to handle high traffic volumes efficiently.
- Where: In the intermediary layer between the client and the server, often part of the infrastructure like APIs or gateways.
- Purpose: To manage and distribute incoming traffic before it reaches the main server, providing an additional layer of protection.
- Pros:
- Scalable Management: Can handle high volumes of traffic more efficiently by offloading rate limiting from the main servers.
- Flexible Policies: Easily apply different rate limiting rules for various services or user groups.
- Cons:
- Additional Complexity: Introduces another component to manage and maintain within the system architecture.
- Potential Bottleneck: If not properly scaled, the middleware itself can become a point of failure or congestion.
Middleware rate limiters are commonly used in large-scale systems with microservices architectures. They come in the form of API gateways (AWS API Gateway) to enforce rate limits, authenticate requests, and route traffic efficiently.
Requirements
- Define Rate Limits: Specify the maximum number of requests allowed per user, IP address, or service within a certain time frame (e.g., 100 requests per minute).
- Configurability and Flexibility: Allow dynamic configuration of rate limits without requiring system downtime, and support different policies (e.g., fixed window, sliding window).
- Low Latency and High Performance: Implement the rate limiter with minimal impact on request processing time.
Popular Rate Limiting Algorithms
- How It Works:
- Imagine a bucket that holds a fixed number of tokens.
- Tokens are added to the bucket at a steady rate (e.g., 5 tokens per second).
- Each incoming request requires a token to proceed.
- If a token is available, it's removed from the bucket, and the request is allowed.
- If no tokens are available, the request is denied or queued until a token becomes available.
- Example:
- API Rate Limiting: An API allows users to make up to 100 requests per minute. Tokens are added to the bucket at a rate of ~1.67 tokens per second. If a user bursts and sends 50 requests quickly, the bucket allows it as long as there are enough tokens. Subsequent requests are limited based on the token availability.
- How It Works:
- Visualize a bucket with a small hole at the bottom through which water (requests) leaks out at a constant rate.
- Incoming requests are added to the bucket.
- If the bucket overflows (i.e., too many requests arrive too quickly), excess requests are discarded or delayed.
- This ensures that requests are processed at a steady, predictable rate.
- Example:
- Streaming Services: A video streaming service uses the leaky bucket algorithm to ensure a smooth playback experience. Requests to stream video segments are regulated so that data flows at a consistent rate, preventing buffering and ensuring uninterrupted viewing.
- How It Works:
- Time is divided into fixed intervals or "windows" (e.g., 1 minute).
- The number of requests is counted within each window.
- Once the limit is reached within a window, additional requests are blocked until the next window starts.
- Simple to implement but can lead to bursts at window boundaries.
- Example:
- Login Attempts: A website restricts users to 10 login attempts per hour using a fixed window. If a user exceeds this limit within the hour, further login attempts are blocked until the next hour begins, reducing the risk of brute-force attacks.
- How It Works:
- Similar to the fixed window but offers more granularity.
- Instead of fixed intervals, it continuously tracks the number of requests in the past defined period (e.g., last 60 seconds).
- As time moves forward, the window "slides," removing old requests and adding new ones.
- Prevents the burst issue seen in fixed windows by distributing request limits more evenly over time.
- Example:
- Chat Applications: A messaging app limits users to sending 20 messages per minute using a sliding window. This ensures that users can't flood the chat with messages all at once and maintains a steady flow of communication, enhancing user experience.
Architecture
Real World Example
Additional Discussion Points
Master System Design Interviews
Get ready for the exact system design questions top tech companies are asking right now. Read comprehensive editorial write-ups and practice with our AI whiteboard that simulates a real, step-by-step interviewer experience.
See All System Designs →