API Rate Limiting: Protecting Your Services Without Hurting Your Users
A complete guide to API rate limiting — algorithms, Redis implementation, per-endpoint limits, rate limit headers, graceful degradation, and strategies that protect without frustrating legitimate users.

James Ross Jr.
Strategic Systems Architect & Enterprise Software Developer
Rate limiting is one of those things you do not think about until you need it urgently — usually because something is hammering your API and degrading service for everyone. At that point, implementing it under pressure is much harder than having it in place from the start.
Good rate limiting protects your services while being nearly invisible to legitimate users. Bad rate limiting blocks legitimate users and fails to stop determined abusers.
The Algorithms
Three main algorithms are in common use:
Token Bucket: A bucket starts with N tokens. Each request consumes one token. Tokens are added at a fixed rate. Requests are rejected when the bucket is empty. Allows short bursts up to the bucket size, then throttles to the replenishment rate.
Fixed Window Counter: Count requests per fixed time window (e.g., 100 requests per minute, resetting at the top of each minute). Simple but has a burst vulnerability at the window boundary — a user can make 100 requests at 11:59:55 and 100 more at 12:00:05.
Sliding Window: Count requests in the last N seconds, using a sliding window. More accurate than fixed window but more complex to implement efficiently.
Sliding Window Log: Store the timestamp of each request. Count requests in the window by looking at the log. Most accurate but consumes more memory.
For most web APIs, a sliding window counter is the right balance: accurate enough to prevent the burst vulnerability, efficient enough to run at scale.
Redis Implementation
// lib/rateLimit.ts
import Redis from 'ioredis'
interface RateLimitResult {
allowed: boolean
limit: number
remaining: number
resetAt: Date
retryAfter?: number // seconds until next request allowed
}
export async function rateLimit(
redis: Redis,
identifier: string,
config: {
limit: number
windowMs: number
keyPrefix?: string
}
): Promise<RateLimitResult> {
const { limit, windowMs, keyPrefix = 'rl' } = config
const now = Date.now()
const windowStart = now - windowMs
const key = `${keyPrefix}:${identifier}`
const pipeline = redis.pipeline()
// Remove entries outside the current window
pipeline.zremrangebyscore(key, '-inf', windowStart)
// Count current requests in window
pipeline.zcard(key)
// Add current request with timestamp as score
pipeline.zadd(key, now, `${now}-${Math.random()}`)
// Set expiry on the key
pipeline.pexpire(key, windowMs)
const results = await pipeline.exec()
const count = (results?.[1]?.[1] as number) ?? 0
const allowed = count < limit
const remaining = Math.max(0, limit - count - 1)
if (!allowed) {
// Calculate when the oldest request in the window expires
const oldestEntry = await redis.zrange(key, 0, 0, 'WITHSCORES')
const oldestTimestamp = Number(oldestEntry[1]) || now
const resetAt = new Date(oldestTimestamp + windowMs)
return {
allowed: false,
limit,
remaining: 0,
resetAt,
retryAfter: Math.ceil((resetAt.getTime() - now) / 1000),
}
}
return {
allowed: true,
limit,
remaining,
resetAt: new Date(now + windowMs),
}
}
Middleware Integration
Add rate limiting as middleware in your HTTP framework:
// middleware/rateLimit.ts (Hono)
import { createMiddleware } from 'hono/factory'
export function rateLimitMiddleware(
config: {
limit: number
windowMs: number
keyPrefix?: string
keyGenerator?: (c: Context) => string
onRejected?: (c: Context, result: RateLimitResult) => Response
}
) {
return createMiddleware(async (c, next) => {
const identifier = config.keyGenerator
? config.keyGenerator(c)
: getRequestIP(c) ?? 'unknown'
const result = await rateLimit(redis, identifier, config)
// Always set rate limit headers
c.header('X-RateLimit-Limit', String(config.limit))
c.header('X-RateLimit-Remaining', String(result.remaining))
c.header('X-RateLimit-Reset', String(Math.ceil(result.resetAt.getTime() / 1000)))
if (!result.allowed) {
c.header('Retry-After', String(result.retryAfter))
if (config.onRejected) {
return config.onRejected(c, result)
}
return c.json({
error: {
code: 'RATE_LIMITED',
message: 'Too many requests. Please wait before retrying.',
retryAfter: result.retryAfter,
},
}, 429)
}
await next()
})
}
Apply at different granularities:
// Global: 1000 requests per 15 minutes per IP
app.use('*', rateLimitMiddleware({
limit: 1000,
windowMs: 15 * 60 * 1000,
keyPrefix: 'global',
}))
// Auth endpoints: 10 attempts per 15 minutes
app.use('/api/auth/*', rateLimitMiddleware({
limit: 10,
windowMs: 15 * 60 * 1000,
keyPrefix: 'auth',
}))
// Authenticated users: by user ID instead of IP
app.use('/api/*', rateLimitMiddleware({
limit: 500,
windowMs: 60 * 1000,
keyPrefix: 'user',
keyGenerator: (c) => {
const userId = c.get('userId')
return userId ?? getRequestIP(c) ?? 'unknown'
},
}))
Multi-Tier Rate Limiting
Different API consumers have different needs. Tiered limits give premium users more capacity without removing protection:
interface RateLimitTier {
limit: number
windowMs: number
}
const tiers: Record<string, RateLimitTier> = {
free: { limit: 100, windowMs: 60 * 60 * 1000 }, // 100/hour
pro: { limit: 1000, windowMs: 60 * 60 * 1000 }, // 1000/hour
enterprise: { limit: 10000, windowMs: 60 * 60 * 1000 }, // 10000/hour
}
app.use('/api/*', async (c, next) => {
const apiKey = c.req.header('X-API-Key')
const tier = apiKey ? await getTierForKey(apiKey) : 'free'
const config = tiers[tier]
const result = await rateLimit(redis, apiKey ?? getRequestIP(c)!, {
...config,
keyPrefix: `tier:${tier}`,
})
if (!result.allowed) {
return c.json({ error: 'Rate limit exceeded', tier }, 429)
}
await next()
})
Per-Endpoint Limits
Some endpoints are more expensive than others. Apply different limits:
// Search is expensive — limit more aggressively
app.get('/api/search', rateLimitMiddleware({
limit: 30,
windowMs: 60 * 1000, // 30 per minute
keyPrefix: 'search',
}), searchHandler)
// Webhooks and mutations
app.post('/api/webhooks', rateLimitMiddleware({
limit: 5,
windowMs: 60 * 1000, // 5 per minute
keyPrefix: 'webhooks',
}), webhookHandler)
// AI/expensive operations
app.post('/api/ai/generate', rateLimitMiddleware({
limit: 10,
windowMs: 60 * 60 * 1000, // 10 per hour
keyPrefix: 'ai',
}), aiHandler)
Graceful Degradation
Rate limiting should degrade gracefully when Redis is unavailable. Fail open (allow the request) rather than fail closed (block everything):
export async function rateLimitWithFallback(
redis: Redis | null,
identifier: string,
config: RateLimitConfig
): Promise<RateLimitResult> {
if (!redis) {
// Redis unavailable: allow all requests, log the issue
console.error('Rate limiter unavailable: Redis connection failed')
return { allowed: true, limit: config.limit, remaining: config.limit, resetAt: new Date() }
}
try {
return await rateLimit(redis, identifier, config)
} catch (err) {
console.error('Rate limit check failed:', err)
return { allowed: true, limit: config.limit, remaining: config.limit, resetAt: new Date() }
}
}
Avoiding Common Mistakes
Using IP address as the only identifier. Many legitimate users share IPs (corporate NATs, VPN services). Rate limiting by IP alone is blunt. Rate limit by user ID for authenticated endpoints and use IP only as a fallback.
Setting limits too low. If your legitimate users frequently hit the limit, the limit is wrong. Check your rate limit hit logs — if mostly legitimate users are getting 429s, raise the limit.
Not communicating the limit to clients. Return X-RateLimit-* headers on every response, not just 429s. Well-implemented API clients use these headers to throttle themselves and never hit the limit.
No retry-after header on 429. When you return 429, always include Retry-After. Clients that respect this stop hammering immediately and retry at the right time. Clients that do not get this header just keep retrying, making the problem worse.
Rate limiting is a protection mechanism and a service quality guarantee. Legitimate users should never notice it. Abusive requests should be stopped cleanly and quickly.
Implementing rate limiting for an API or dealing with traffic abuse issues? I can help design a strategy that protects without frustrating legitimate users. Book a call: calendly.com/jamesrossjr.