API rate limiting: 4 algorithms every developer should know
Your batch job fires 200 requests in 3 seconds and every response comes back 429 Too Many Requests.
Your webhook processor hammers a third-party API and gets blocked for 15 minutes. A customer's integration
goes silent because their retry loop burns through the daily quota in the first hour. These failures share
one root cause: the code doesn't respect rate limits.
This guide covers the four core rate limiting algorithms, shows you how to read X-RateLimit
headers from any API, and gives you copy-paste Node.js code for retry logic with exponential backoff.
The four rate limiting algorithms
Every rate limiter answers the same question: "should this request go through, or should I reject it?" The four algorithms differ in how they track time and handle bursts.
1. Fixed window
The simplest approach. Divide time into fixed intervals (e.g., 1 minute). Count requests per interval. When the count hits the limit, reject everything until the next interval starts.
Window 1 (00:00-01:00) Window 2 (01:00-02:00)
|----|----|----|----|----|----|----|----|----|
R R R R R X X R R R R R X
R = allowed request (5 per window)
X = rejected (429 Too Many Requests)
Problem: 5 requests at 00:59 + 5 at 01:01 = 10 in 2 seconds Fixed window is easy to build: one counter and one timestamp per client. The drawback is the boundary problem. A client can send the full limit at the end of one window and the full limit at the start of the next, getting 2x the intended rate in a short burst. GitHub's older API rate limiter used fixed windows; they've since moved to more sophisticated approaches.
2. Sliding window
Instead of resetting at fixed boundaries, the window slides with each request. At any given moment, the limiter looks back over the last N seconds and counts requests in that span.
Time: 00:00 00:15 00:30 00:45 01:00 01:15
|------|------|------|------|------|
R R R R R | R
|
<-- 60s window slides -->
At 01:00, the window looks back to 00:00.
At 01:15, the window looks back to 00:15.
Requests from 00:00-00:15 drop off the count.
Sliding window eliminates the boundary burst problem. The cost is higher memory: you store a timestamp
for every request, not a single counter. Redis ZRANGEBYSCORE makes this practical at scale.
Cloudflare and many API gateways use sliding windows for per-user rate limits.
3. Token bucket
Picture a bucket that holds tokens. Each request costs one token. Tokens refill at a fixed rate. If the bucket is empty, the request gets rejected. If the bucket is full, excess tokens don't accumulate.
Bucket capacity: 10 tokens
Refill rate: 2 tokens/second
Time 0s: [##########] 10 tokens - Full bucket
Request: [#########-] 9 tokens - 1 token consumed
Request: [########--] 8 tokens - 1 token consumed
...burst of 8 requests...
Time 0s: [----------] 0 tokens - Empty, next request blocked
Time 1s: [##--------] 2 tokens - 2 tokens refilled
Request: [#---------] 1 token - 1 token consumed
Time 2s: [###-------] 3 tokens - 2 more refilled Token bucket is the most popular algorithm in production. Stripe, AWS API Gateway, and most cloud providers use variants of it. The bucket capacity controls burst size, and the refill rate controls sustained throughput. Two parameters give you fine-grained control over traffic shape.
4. Leaky bucket
The inverse of token bucket. Requests fill a bucket. The bucket drains at a constant rate. If the bucket overflows, excess requests get rejected. The output rate stays constant regardless of input bursts.
Incoming requests fill the bucket.
Bucket drains at a fixed rate (1 req/200ms).
Fast burst: R R R R R R
|||||
Bucket: [######----] 6 queued
|
Drain: |-R---R---R---R---R---R-|
0ms 200ms 400ms 600ms 800ms 1000ms
Overflow: If bucket is full, new requests get 429. Leaky bucket works well for traffic shaping where you need a steady output rate: queue workers, webhook delivery, and video encoding pipelines. The trade-off is that bursts get queued rather than served; latency increases under load.
Comparing the four algorithms
| Algorithm | Burst allowed? | Memory | Common use case |
|---|---|---|---|
| Fixed window | Edge bursts (2x at boundary) | Low (1 counter) | Simple counters, analytics |
| Sliding window | Smooth, no boundary spikes | Medium (per-request timestamp) | Per-user API limits |
| Token bucket | Controlled bursts up to capacity | Low (2 values) | Most production APIs (Stripe, AWS) |
| Leaky bucket | Queued, constant output rate | Medium (queue) | Traffic shaping, queue workers |
Reading X-RateLimit headers
Most APIs include rate limit information in response headers. Three headers tell you everything you need to stay under the limit:
X-RateLimit-Limit: maximum requests allowed per windowX-RateLimit-Remaining: requests you have left in the current windowX-RateLimit-Reset: Unix timestamp (seconds) when the window resets
When you exceed the limit, the response status is 429 Too Many Requests and the
Retry-After header tells you how many seconds to wait.
Try it against botoi's API. This curl command hashes a string and prints the rate limit headers:
curl -s -D - -X POST https://api.botoi.com/v1/hash/sha256 \
-H "Content-Type: application/json" \
-d '{"text": "rate limit test"}' 2>&1 | grep -i "x-ratelimit\|retry-after" Response headers:
x-ratelimit-limit: 5
x-ratelimit-remaining: 4
x-ratelimit-reset: 1743897600 This tells you the limit is 5 requests per window, you have 4 remaining after this request, and the window resets at the given Unix timestamp. Track these values in your HTTP client to avoid hitting 429s in the first place.
Tip: Convert X-RateLimit-Reset to a wait time:
waitMs = (resetTimestamp - Math.floor(Date.now() / 1000)) * 1000
Retry logic with exponential backoff in Node.js
When a 429 hits, don't retry immediately. A tight retry loop makes the problem worse: you stay rate-limited longer and the server marks you as abusive. Use exponential backoff with jitter instead.
async function fetchWithRetry(url, options, maxRetries = 4) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
const response = await fetch(url, options);
if (response.status !== 429) {
return response;
}
// Read the server's preferred wait time
const retryAfter = response.headers.get("Retry-After");
let waitMs;
if (retryAfter) {
// Retry-After can be seconds or an HTTP date
waitMs = Number(retryAfter) * 1000 ||
new Date(retryAfter).getTime() - Date.now();
} else {
// Exponential backoff: 1s, 2s, 4s, 8s
waitMs = Math.pow(2, attempt) * 1000;
}
// Add random jitter (0-500ms) to prevent thundering herd
waitMs += Math.random() * 500;
console.log(
"Rate limited. Attempt " + (attempt + 1) +
"/" + maxRetries +
". Waiting " + Math.round(waitMs) + "ms"
);
await new Promise((resolve) => setTimeout(resolve, waitMs));
}
throw new Error("Max retries exceeded for " + url);
} Use it with any endpoint:
// Call botoi's hash endpoint with automatic retry
const response = await fetchWithRetry(
"https://api.botoi.com/v1/hash/sha256",
{
method: "POST",
headers: {
"Content-Type": "application/json",
"X-API-Key": process.env.BOTOI_API_KEY,
},
body: JSON.stringify({ text: "hello world" }),
}
);
const data = await response.json();
console.log(data);
The function checks for a Retry-After header first. If the server tells you how long
to wait, respect it. If no header exists, it falls back to exponential backoff: 1 second, 2 seconds,
4 seconds, 8 seconds. Random jitter (0-500ms) prevents the thundering herd problem where hundreds of
clients retry at the exact same moment.
Proactive throttling: avoid 429s before they happen
Reactive retry handles failures after they occur. Proactive throttling prevents them. If you know the rate limit (from docs or headers), pace your requests on the client side.
class RateLimiter {
constructor(maxPerWindow, windowMs) {
this.maxPerWindow = maxPerWindow;
this.windowMs = windowMs;
this.timestamps = [];
}
async waitForSlot() {
const now = Date.now();
// Remove timestamps outside the current window
this.timestamps = this.timestamps.filter(
(t) => now - t < this.windowMs
);
if (this.timestamps.length >= this.maxPerWindow) {
// Wait until the oldest request falls out of the window
const oldestInWindow = this.timestamps[0];
const waitMs = this.windowMs - (now - oldestInWindow) + 10;
await new Promise((resolve) => setTimeout(resolve, waitMs));
}
this.timestamps.push(Date.now());
}
}
// botoi anonymous: 5 req/min
const limiter = new RateLimiter(5, 60_000);
async function callBotoiSafely(endpoint, body) {
await limiter.waitForSlot();
return fetch("https://api.botoi.com/v1/" + endpoint, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify(body),
});
}
// Process a batch without hitting 429s
const urls = ["stripe.com", "github.com", "vercel.com"];
for (const domain of urls) {
const res = await callBotoiSafely("dns/a", { domain });
const data = await res.json();
console.log(domain, data);
} This client-side rate limiter tracks request timestamps in a sliding window. Before each request, it checks whether the window is full and waits if needed. You send requests at the maximum safe rate without a single 429.
Botoi's rate limiting model
Botoi uses a two-tier rate limiting system:
| Plan | Burst (per minute) | Quota | Auth |
|---|---|---|---|
| Free ($0) | 5 req/min | 100/day | None (IP-based) |
| Starter ($9/mo) | 30 req/min | 300,000/month | API key |
| Pro ($49/mo) | 300 req/min | 3,000,000/month | API key |
| Business ($199/mo) | 1,000 req/min | 30,000,000/month | API key |
Anonymous access tracks requests by IP address. The daily count resets at midnight UTC via a Cloudflare KV counter. Authenticated requests use the API key for identification, and limits are enforced through Unkey's token bucket rate limiter at the edge.
Every response from api.botoi.com includes the three X-RateLimit headers
described above, so your retry logic works the same way regardless of plan.
Proven approaches for API consumers
- Read the headers on every response. Don't hard-code rate limits from documentation. APIs change limits without notice. The headers are the source of truth.
- Use exponential backoff with jitter. Fixed retry intervals cause synchronized retries across clients. Jitter spreads the load.
- Batch where the API supports it. One request with 10 items costs 1 rate limit token. Ten individual requests cost 10.
- Cache responses. If the data doesn't change between requests, store the result and skip the API call. DNS records, SSL certificates, and WHOIS data rarely change within minutes.
- Use a queue for background work. Don't fire API calls from a hot loop. Push work onto a queue (BullMQ, SQS, Cloudflare Queues) and process items at a controlled rate.
- Monitor your remaining quota. Log
X-RateLimit-Remainingto your metrics dashboard. Set an alert when it drops below 20% of the limit.
Key points
- Four algorithms dominate. Fixed window is simplest. Token bucket is most popular. Sliding window eliminates boundary bursts. Leaky bucket smooths output.
- X-RateLimit headers are your API. Read
Limit,Remaining, andReseton every response to stay under the cap. - Exponential backoff with jitter handles 429s. Copy the
fetchWithRetryfunction above into your codebase and wrap every external API call. - Proactive throttling prevents 429s. Pace your requests on the client side instead of waiting for the server to push back.
- No account required to test. Hit any botoi endpoint at api.botoi.com with 5 free requests per minute to see rate limit headers in action.
Frequently asked questions
- What is API rate limiting and why do APIs use it?
- Rate limiting caps how many requests a client can make in a time window. APIs use it to protect servers from overload, prevent abuse, ensure fair resource sharing across clients, and keep infrastructure costs predictable. Without it, a single client could starve all others.
- What do X-RateLimit headers mean?
- X-RateLimit-Limit is the max requests allowed per window. X-RateLimit-Remaining is how many you have left. X-RateLimit-Reset is a Unix timestamp when the window resets. Retry-After (on 429 responses) tells you how many seconds to wait before retrying.
- How should I handle a 429 Too Many Requests response?
- Read the Retry-After header and wait that many seconds. If no Retry-After header exists, use exponential backoff: wait 1 second after the first 429, 2 seconds after the second, 4 after the third, and so on. Add random jitter (0-500ms) to prevent thundering herd problems when many clients retry at the same time.
- Which rate limiting algorithm is the most common?
- Token bucket is the most common in production APIs. Stripe, AWS, and most cloud providers use variants of it. Token bucket allows controlled bursts while enforcing a sustained rate, which matches real traffic patterns better than fixed windows.
- Does botoi rate limit anonymous requests?
- Yes. Anonymous requests (no API key) get 5 requests per minute burst and 100 requests per day, tracked by IP address. Authenticated requests on paid plans get higher limits: Starter allows 30/min, Pro allows 300/min, and Business allows 1,000/min.
More guide posts
Start building with botoi
150+ API endpoints for lookup, text processing, image generation, and developer utilities. Free tier, no credit card.