API Rate Limiting: Protecting Your Services Without Hurting Your Users

Rate limiting is one of those things you do not think about until you need it urgently — usually because something is hammering your API and degrading service for everyone. At that point, implementing it under pressure is much harder than having it in place from the start.

Good rate limiting protects your services while being nearly invisible to legitimate users. Bad rate limiting blocks legitimate users and fails to stop determined abusers.

The Algorithms

Three main algorithms are in common use:

Token Bucket: A bucket starts with N tokens. Each request consumes one token. Tokens are added at a fixed rate. Requests are rejected when the bucket is empty. Allows short bursts up to the bucket size, then throttles to the replenishment rate.

Fixed Window Counter: Count requests per fixed time window (e.g., 100 requests per minute, resetting at the top of each minute). Simple but has a burst vulnerability at the window boundary — a user can make 100 requests at 11:59:55 and 100 more at 12:00:05.

Sliding Window: Count requests in the last N seconds, using a sliding window. More accurate than fixed window but more complex to implement efficiently.

Sliding Window Log: Store the timestamp of each request. Count requests in the window by looking at the log. Most accurate but consumes more memory.

For most web APIs, a sliding window counter is the right balance: accurate enough to prevent the burst vulnerability, efficient enough to run at scale.

Redis Implementation

// lib/rateLimit.ts
import Redis from 'ioredis'

Interface RateLimitResult {
 allowed: boolean
 limit: number
 remaining: number
 resetAt: Date
 retryAfter?: number // seconds until next request allowed
}

Export async function rateLimit(
 redis: Redis,
 identifier: string,
 config: {
 limit: number
 windowMs: number
 keyPrefix?: string
 }
): Promise<RateLimitResult> {
 const { limit, windowMs, keyPrefix = 'rl' } = config
 const now = Date.now()
 const windowStart = now - windowMs
 const key = `${keyPrefix}:${identifier}`

 const pipeline = redis.pipeline()

 // Remove entries outside the current window
 pipeline.zremrangebyscore(key, '-inf', windowStart)

 // Count current requests in window
 pipeline.zcard(key)

 // Add current request with timestamp as score
 pipeline.zadd(key, now, `${now}-${Math.random()}`)

 // Set expiry on the key
 pipeline.pexpire(key, windowMs)

 const results = await pipeline.exec()
 const count = (results?.[1]?.[1] as number) ?? 0

 const allowed = count < limit
 const remaining = Math.max(0, limit - count - 1)

 if (!allowed) {
 // Calculate when the oldest request in the window expires
 const oldestEntry = await redis.zrange(key, 0, 0, 'WITHSCORES')
 const oldestTimestamp = Number(oldestEntry[1]) || now
 const resetAt = new Date(oldestTimestamp + windowMs)

 return {
 allowed: false,
 limit,
 remaining: 0,
 resetAt,
 retryAfter: Math.ceil((resetAt.getTime() - now) / 1000),
 }
 }

 return {
 allowed: true,
 limit,
 remaining,
 resetAt: new Date(now + windowMs),
 }
}

Middleware Integration

Add rate limiting as middleware in your HTTP framework:

// middleware/rateLimit.ts (Hono)
import { createMiddleware } from 'hono/factory'

Export function rateLimitMiddleware(
 config: {
 limit: number
 windowMs: number
 keyPrefix?: string
 keyGenerator?: (c: Context) => string
 onRejected?: (c: Context, result: RateLimitResult) => Response
 }
) {
 return createMiddleware(async (c, next) => {
 const identifier = config.keyGenerator
 ? config.keyGenerator(c)
 : getRequestIP(c) ?? 'unknown'

 const result = await rateLimit(redis, identifier, config)

 // Always set rate limit headers
 c.header('X-RateLimit-Limit', String(config.limit))
 c.header('X-RateLimit-Remaining', String(result.remaining))
 c.header('X-RateLimit-Reset', String(Math.ceil(result.resetAt.getTime() / 1000)))

 if (!result.allowed) {
 c.header('Retry-After', String(result.retryAfter))

 if (config.onRejected) {
 return config.onRejected(c, result)
 }

 return c.json({
 error: {
 code: 'RATE_LIMITED',
 message: 'Too many requests. Please wait before retrying.',
 retryAfter: result.retryAfter,
 },
 }, 429)
 }

 await next()
 })
}

Apply at different granularities:

// Global: 1000 requests per 15 minutes per IP
app.use('*', rateLimitMiddleware({
 limit: 1000,
 windowMs: 15 * 60 * 1000,
 keyPrefix: 'global',
}))

// Auth endpoints: 10 attempts per 15 minutes
app.use('/api/auth/*', rateLimitMiddleware({
 limit: 10,
 windowMs: 15 * 60 * 1000,
 keyPrefix: 'auth',
}))

// Authenticated users: by user ID instead of IP
app.use('/api/*', rateLimitMiddleware({
 limit: 500,
 windowMs: 60 * 1000,
 keyPrefix: 'user',
 keyGenerator: (c) => {
 const userId = c.get('userId')
 return userId ?? getRequestIP(c) ?? 'unknown'
 },
}))

Multi-Tier Rate Limiting

Different API consumers have different needs. Tiered limits give premium users more capacity without removing protection:

interface RateLimitTier {
 limit: number
 windowMs: number
}

Const tiers: Record<string, RateLimitTier> = {
 free: { limit: 100, windowMs: 60 * 60 * 1000 }, // 100/hour
 pro: { limit: 1000, windowMs: 60 * 60 * 1000 }, // 1000/hour
 enterprise: { limit: 10000, windowMs: 60 * 60 * 1000 }, // 10000/hour
}

App.use('/api/*', async (c, next) => {
 const apiKey = c.req.header('X-API-Key')
 const tier = apiKey ? await getTierForKey(apiKey) : 'free'
 const config = tiers[tier]

 const result = await rateLimit(redis, apiKey ?? getRequestIP(c)!, {
 ...config,
 keyPrefix: `tier:${tier}`,
 })

 if (!result.allowed) {
 return c.json({ error: 'Rate limit exceeded', tier }, 429)
 }

 await next()
})

Per-Endpoint Limits

Some endpoints are more expensive than others. Apply different limits:

// Search is expensive — limit more aggressively
app.get('/api/search', rateLimitMiddleware({
 limit: 30,
 windowMs: 60 * 1000, // 30 per minute
 keyPrefix: 'search',
}), searchHandler)

// Webhooks and mutations
app.post('/api/webhooks', rateLimitMiddleware({
 limit: 5,
 windowMs: 60 * 1000, // 5 per minute
 keyPrefix: 'webhooks',
}), webhookHandler)

// AI/expensive operations
app.post('/api/ai/generate', rateLimitMiddleware({
 limit: 10,
 windowMs: 60 * 60 * 1000, // 10 per hour
 keyPrefix: 'ai',
}), aiHandler)

Graceful Degradation

Rate limiting should degrade gracefully when Redis is unavailable. Fail open (allow the request) rather than fail closed (block everything):

export async function rateLimitWithFallback(
 redis: Redis | null,
 identifier: string,
 config: RateLimitConfig
): Promise<RateLimitResult> {
 if (!redis) {
 // Redis unavailable: allow all requests, log the issue
 console.error('Rate limiter unavailable: Redis connection failed')
 return { allowed: true, limit: config.limit, remaining: config.limit, resetAt: new Date() }
 }

 try {
 return await rateLimit(redis, identifier, config)
 } catch (err) {
 console.error('Rate limit check failed:', err)
 return { allowed: true, limit: config.limit, remaining: config.limit, resetAt: new Date() }
 }
}

Avoiding Common Mistakes

Using IP address as the only identifier. Many legitimate users share IPs (corporate NATs, VPN services). Rate limiting by IP alone is blunt. Rate limit by user ID for authenticated endpoints and use IP only as a fallback.

Setting limits too low. If your legitimate users frequently hit the limit, the limit is wrong. Check your rate limit hit logs — if mostly legitimate users are getting 429s, raise the limit.

Not communicating the limit to clients. Return X-RateLimit-* headers on every response, not just 429s. Well-implemented API clients use these headers to throttle themselves and never hit the limit.

No retry-after header on 429. When you return 429, always include Retry-After. Clients that respect this stop hammering immediately and retry at the right time. Clients that do not get this header just keep retrying, making the problem worse.

Rate limiting is a protection mechanism and a service quality guarantee. Legitimate users should never notice it. Abusive requests should be stopped cleanly and quickly.

Implementing rate limiting for an API or dealing with traffic abuse issues? I can help design a strategy that protects without frustrating legitimate users. Book a call: calendly.com/jamesrossjr.