What is API rate limiting and why do APIs use it?

Rate limiting caps how many requests a client can make in a time window. APIs use it to protect servers from overload, prevent abuse, ensure fair resource sharing across clients, and keep infrastructure costs predictable. Without it, a single client could starve all others.

What do X-RateLimit headers mean?

X-RateLimit-Limit is the max requests allowed per window. X-RateLimit-Remaining is how many you have left. X-RateLimit-Reset is a Unix timestamp when the window resets. Retry-After (on 429 responses) tells you how many seconds to wait before retrying.

How should I handle a 429 Too Many Requests response?

Read the Retry-After header and wait that many seconds. If no Retry-After header exists, use exponential backoff: wait 1 second after the first 429, 2 seconds after the second, 4 after the third, and so on. Add random jitter (0-500ms) to prevent thundering herd problems when many clients retry at the same time.

Which rate limiting algorithm is the most common?

Token bucket is the most common in production APIs. Stripe, AWS, and most cloud providers use variants of it. Token bucket allows controlled bursts while enforcing a sustained rate, which matches real traffic patterns better than fixed windows.

Does botoi rate limit anonymous requests?

Yes. Anonymous requests (no API key) get 5 requests per minute burst and 100 requests per day, tracked by IP address. Authenticated requests on paid plans get higher limits: Starter allows 30/min, Pro allows 300/min, and Business allows 1,000/min.

guide

API rate limiting: 4 algorithms every developer should know

Apr 5, 2026 | 9 min read

Data visualization with streaming lines and analytics — Photo by Joshua Sortino on Unsplash

Your batch job fires 200 requests in 3 seconds and every response comes back 429 Too Many Requests. Your webhook processor hammers a third-party API and gets blocked for 15 minutes. A customer's integration goes silent because their retry loop burns through the daily quota in the first hour. These failures share one root cause: the code doesn't respect rate limits.

This guide covers the four core rate limiting algorithms, shows you how to read X-RateLimit headers from any API, and gives you copy-paste Node.js code for retry logic with exponential backoff.

The four rate limiting algorithms

Every rate limiter answers the same question: "should this request go through, or should I reject it?" The four algorithms differ in how they track time and handle bursts.

1. Fixed window

The simplest approach. Divide time into fixed intervals (e.g., 1 minute). Count requests per interval. When the count hits the limit, reject everything until the next interval starts.

Window 1 (00:00-01:00)    Window 2 (01:00-02:00)
 |----|----|----|----|----|----|----|----|----|
 R  R  R  R  R  X  X       R  R  R  R  R  X

 R = allowed request (5 per window)
 X = rejected (429 Too Many Requests)

 Problem: 5 requests at 00:59 + 5 at 01:01 = 10 in 2 seconds

Fixed window is easy to build: one counter and one timestamp per client. The drawback is the boundary problem. A client can send the full limit at the end of one window and the full limit at the start of the next, getting 2x the intended rate in a short burst. GitHub's older API rate limiter used fixed windows; they've since moved to more sophisticated approaches.

2. Sliding window

Instead of resetting at fixed boundaries, the window slides with each request. At any given moment, the limiter looks back over the last N seconds and counts requests in that span.

Time: 00:00  00:15  00:30  00:45  01:00  01:15
       |------|------|------|------|------|
       R   R      R     R  R  |  R
                              |
                    <-- 60s window slides -->

 At 01:00, the window looks back to 00:00.
 At 01:15, the window looks back to 00:15.
 Requests from 00:00-00:15 drop off the count.

Sliding window eliminates the boundary burst problem. The cost is higher memory: you store a timestamp for every request, not a single counter. Redis ZRANGEBYSCORE makes this practical at scale. Cloudflare and many API gateways use sliding windows for per-user rate limits.

3. Token bucket

Picture a bucket that holds tokens. Each request costs one token. Tokens refill at a fixed rate. If the bucket is empty, the request gets rejected. If the bucket is full, excess tokens don't accumulate.

Bucket capacity: 10 tokens
Refill rate: 2 tokens/second

Time 0s:  [##########] 10 tokens   - Full bucket
Request:  [#########-]  9 tokens   - 1 token consumed
Request:  [########--]  8 tokens   - 1 token consumed
...burst of 8 requests...
Time 0s:  [----------]  0 tokens   - Empty, next request blocked

Time 1s:  [##--------]  2 tokens   - 2 tokens refilled
Request:  [#---------]  1 token    - 1 token consumed
Time 2s:  [###-------]  3 tokens   - 2 more refilled

Token bucket is the most popular algorithm in production. Stripe, AWS API Gateway, and most cloud providers use variants of it. The bucket capacity controls burst size, and the refill rate controls sustained throughput. Two parameters give you fine-grained control over traffic shape.

4. Leaky bucket

The inverse of token bucket. Requests fill a bucket. The bucket drains at a constant rate. If the bucket overflows, excess requests get rejected. The output rate stays constant regardless of input bursts.

Incoming requests fill the bucket.
Bucket drains at a fixed rate (1 req/200ms).

Fast burst:  R R R R R R
             |||||
Bucket:      [######----]  6 queued
             |
Drain:       |-R---R---R---R---R---R-|
             0ms 200ms 400ms 600ms 800ms 1000ms

Overflow:    If bucket is full, new requests get 429.

Leaky bucket works well for traffic shaping where you need a steady output rate: queue workers, webhook delivery, and video encoding pipelines. The trade-off is that bursts get queued rather than served; latency increases under load.

Comparing the four algorithms

Algorithm	Burst allowed?	Memory	Common use case
Fixed window	Edge bursts (2x at boundary)	Low (1 counter)	Simple counters, analytics
Sliding window	Smooth, no boundary spikes	Medium (per-request timestamp)	Per-user API limits
Token bucket	Controlled bursts up to capacity	Low (2 values)	Most production APIs (Stripe, AWS)
Leaky bucket	Queued, constant output rate	Medium (queue)	Traffic shaping, queue workers

Reading X-RateLimit headers

Most APIs include rate limit information in response headers. Three headers tell you everything you need to stay under the limit:

X-RateLimit-Limit: maximum requests allowed per window
X-RateLimit-Remaining: requests you have left in the current window
X-RateLimit-Reset: Unix timestamp (seconds) when the window resets

When you exceed the limit, the response status is 429 Too Many Requests and the Retry-After header tells you how many seconds to wait.

Try it against botoi's API. This curl command hashes a string and prints the rate limit headers:

curl -s -D - -X POST https://api.botoi.com/v1/hash/sha256 \
  -H "Content-Type: application/json" \
  -d '{"text": "rate limit test"}' 2>&1 | grep -i "x-ratelimit\|retry-after"

Response headers:

x-ratelimit-limit: 5
x-ratelimit-remaining: 4
x-ratelimit-reset: 1743897600

This tells you the limit is 5 requests per window, you have 4 remaining after this request, and the window resets at the given Unix timestamp. Track these values in your HTTP client to avoid hitting 429s in the first place.

Tip: Convert X-RateLimit-Reset to a wait time: waitMs = (resetTimestamp - Math.floor(Date.now() / 1000)) * 1000

Retry logic with exponential backoff in Node.js

When a 429 hits, don't retry immediately. A tight retry loop makes the problem worse: you stay rate-limited longer and the server marks you as abusive. Use exponential backoff with jitter instead.

async function fetchWithRetry(url, options, maxRetries = 4) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    const response = await fetch(url, options);

    if (response.status !== 429) {
      return response;
    }

    // Read the server's preferred wait time
    const retryAfter = response.headers.get("Retry-After");
    let waitMs;

    if (retryAfter) {
      // Retry-After can be seconds or an HTTP date
      waitMs = Number(retryAfter) * 1000 ||
        new Date(retryAfter).getTime() - Date.now();
    } else {
      // Exponential backoff: 1s, 2s, 4s, 8s
      waitMs = Math.pow(2, attempt) * 1000;
    }

    // Add random jitter (0-500ms) to prevent thundering herd
    waitMs += Math.random() * 500;

    console.log(
      "Rate limited. Attempt " + (attempt + 1) +
      "/" + maxRetries +
      ". Waiting " + Math.round(waitMs) + "ms"
    );

    await new Promise((resolve) => setTimeout(resolve, waitMs));
  }

  throw new Error("Max retries exceeded for " + url);
}

Use it with any endpoint:

// Call botoi's hash endpoint with automatic retry
const response = await fetchWithRetry(
  "https://api.botoi.com/v1/hash/sha256",
  {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      "X-API-Key": process.env.BOTOI_API_KEY,
    },
    body: JSON.stringify({ text: "hello world" }),
  }
);

const data = await response.json();
console.log(data);

The function checks for a Retry-After header first. If the server tells you how long to wait, respect it. If no header exists, it falls back to exponential backoff: 1 second, 2 seconds, 4 seconds, 8 seconds. Random jitter (0-500ms) prevents the thundering herd problem where hundreds of clients retry at the exact same moment.

Proactive throttling: avoid 429s before they happen

Reactive retry handles failures after they occur. Proactive throttling prevents them. If you know the rate limit (from docs or headers), pace your requests on the client side.

class RateLimiter {
  constructor(maxPerWindow, windowMs) {
    this.maxPerWindow = maxPerWindow;
    this.windowMs = windowMs;
    this.timestamps = [];
  }

  async waitForSlot() {
    const now = Date.now();
    // Remove timestamps outside the current window
    this.timestamps = this.timestamps.filter(
      (t) => now - t < this.windowMs
    );

    if (this.timestamps.length >= this.maxPerWindow) {
      // Wait until the oldest request falls out of the window
      const oldestInWindow = this.timestamps[0];
      const waitMs = this.windowMs - (now - oldestInWindow) + 10;
      await new Promise((resolve) => setTimeout(resolve, waitMs));
    }

    this.timestamps.push(Date.now());
  }
}

// botoi anonymous: 5 req/min
const limiter = new RateLimiter(5, 60_000);

async function callBotoiSafely(endpoint, body) {
  await limiter.waitForSlot();
  return fetch("https://api.botoi.com/v1/" + endpoint, {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify(body),
  });
}

// Process a batch without hitting 429s
const urls = ["stripe.com", "github.com", "vercel.com"];
for (const domain of urls) {
  const res = await callBotoiSafely("dns/a", { domain });
  const data = await res.json();
  console.log(domain, data);
}

This client-side rate limiter tracks request timestamps in a sliding window. Before each request, it checks whether the window is full and waits if needed. You send requests at the maximum safe rate without a single 429.

Botoi's rate limiting model

Botoi uses a two-tier rate limiting system:

Plan	Burst (per minute)	Quota	Auth
Free ($0)	5 req/min	100/day	None (IP-based)
Starter ($9/mo)	30 req/min	300,000/month	API key
Pro ($49/mo)	300 req/min	3,000,000/month	API key
Business ($199/mo)	1,000 req/min	30,000,000/month	API key

Anonymous access tracks requests by IP address. The daily count resets at midnight UTC via a Cloudflare KV counter. Authenticated requests use the API key for identification, and limits are enforced through Unkey's token bucket rate limiter at the edge.

Every response from api.botoi.com includes the three X-RateLimit headers described above, so your retry logic works the same way regardless of plan.

Proven approaches for API consumers

Read the headers on every response. Don't hard-code rate limits from documentation. APIs change limits without notice. The headers are the source of truth.
Use exponential backoff with jitter. Fixed retry intervals cause synchronized retries across clients. Jitter spreads the load.
Batch where the API supports it. One request with 10 items costs 1 rate limit token. Ten individual requests cost 10.
Cache responses. If the data doesn't change between requests, store the result and skip the API call. DNS records, SSL certificates, and WHOIS data rarely change within minutes.
Use a queue for background work. Don't fire API calls from a hot loop. Push work onto a queue (BullMQ, SQS, Cloudflare Queues) and process items at a controlled rate.
Monitor your remaining quota. Log X-RateLimit-Remaining to your metrics dashboard. Set an alert when it drops below 20% of the limit.

Key points

Four algorithms dominate. Fixed window is simplest. Token bucket is most popular. Sliding window eliminates boundary bursts. Leaky bucket smooths output.
X-RateLimit headers are your API. Read Limit, Remaining, and Reset on every response to stay under the cap.
Exponential backoff with jitter handles 429s. Copy the fetchWithRetry function above into your codebase and wrap every external API call.
Proactive throttling prevents 429s. Pace your requests on the client side instead of waiting for the server to push back.
No account required to test. Hit any botoi endpoint at api.botoi.com with 5 free requests per minute to see rate limit headers in action.

Frequently asked questions

What is API rate limiting and why do APIs use it?: Rate limiting caps how many requests a client can make in a time window. APIs use it to protect servers from overload, prevent abuse, ensure fair resource sharing across clients, and keep infrastructure costs predictable. Without it, a single client could starve all others.
What do X-RateLimit headers mean?: X-RateLimit-Limit is the max requests allowed per window. X-RateLimit-Remaining is how many you have left. X-RateLimit-Reset is a Unix timestamp when the window resets. Retry-After (on 429 responses) tells you how many seconds to wait before retrying.
How should I handle a 429 Too Many Requests response?: Read the Retry-After header and wait that many seconds. If no Retry-After header exists, use exponential backoff: wait 1 second after the first 429, 2 seconds after the second, 4 after the third, and so on. Add random jitter (0-500ms) to prevent thundering herd problems when many clients retry at the same time.
Which rate limiting algorithm is the most common?: Token bucket is the most common in production APIs. Stripe, AWS, and most cloud providers use variants of it. Token bucket allows controlled bursts while enforcing a sustained rate, which matches real traffic patterns better than fixed windows.
Does botoi rate limit anonymous requests?: Yes. Anonymous requests (no API key) get 5 requests per minute burst and 100 requests per day, tracked by IP address. Authenticated requests on paid plans get higher limits: Starter allows 30/min, Pro allows 300/min, and Business allows 1,000/min.

Start building with botoi

150+ API endpoints for lookup, text processing, image generation, and developer utilities. Free tier, no credit card.

View API docs Try free tools

API rate limiting: 4 algorithms every developer should know

The four rate limiting algorithms

1. Fixed window

2. Sliding window

3. Token bucket

4. Leaky bucket

Comparing the four algorithms

Reading X-RateLimit headers

Retry logic with exponential backoff in Node.js

Proactive throttling: avoid 429s before they happen

Botoi's rate limiting model

Proven approaches for API consumers

Key points

Frequently asked questions

More guide posts

API key vs JWT vs OAuth2: pick the right auth for your API

MCP vs A2A: picking the right AI agent protocol

REST vs GraphQL vs gRPC: a decision framework for 2026

Start building with botoi