βΆToken bucket vs sliding window vs fixed window β which one should I use?
Fixed window: simplest, per-minute counters, prone to edge-case bursts at window boundary. Token bucket: smooth refill at constant rate, handles bursts gracefully, industry standard for public APIs (GitHub, Stripe use it). Sliding window: accurate but complex, higher memory cost in distributed systems. Default to token bucket for new APIs; fixed window only for simple internal services.
βΆHow do I implement distributed rate limiting across multiple servers?
Single-server rate limiting (in-memory counters) fails immediately when you scale. Use Redis as the source of truth: each request increments a counter with TTL = window size. Lua scripts atomically check-and-increment to avoid race conditions. For geo-distributed systems (multi-region), replicate Redis across regions or accept slightly stale limits for performance.
βΆWhat headers should I return: RateLimit-* or X-RateLimit-*?
IETF standard is RateLimit-Limit / RateLimit-Remaining / RateLimit-Reset (RFC 6585). GitHub uses X-RateLimit-* (legacy de-facto standard, still widely recognized). Return both for compatibility: RateLimit-* for modern clients, X-RateLimit-* for legacy. Always include Retry-After header with 429 responses (seconds or HTTP-date format).
βΆPer-user, per-IP, or per-API-key rate limiting β how do I choose?
Per-user (for authenticated requests): fairest, prevents account abuse. Per-IP (for public/unauthenticated endpoints): blocks abusers but punishes office-behind-NAT. Per-API-key: best for B2B APIs, allows tiered limits per subscription. Combine all three: strictest limit applies. Example: 1000/day per user, 100/min per IP, 10k/day per API key.
βΆHow do I handle rate limiting with exponential backoff on the client side?
Check Retry-After header (in seconds or HTTP-date). Implement exponential backoff: wait 2^attempt seconds (1, 2, 4, 8, 16β¦) with jitter (Β±20%) to avoid thundering herd. For 429 responses, obey Retry-After strictly. For other transient failures (5xx), backoff is optional. Libraries: `axios-retry` (Node.js), `tenacity` (Python), `@shopify/network` (TypeScript).
βΆFree tier vs paid tier rate limiting β how do I differentiate?
Embed tier in the request context (user object or API key lookup). Apply loose limits to free tier (100 req/min), tight to premium (1000 req/min). Use same algorithm for both; just parameterize the limit. Track limit exhaustion per tier for analytics/billing. Offer burst allowance to premium tiers (smooth out spikes).
βΆShould I return 429 Conflict or 503 Unavailable when rate limited?
Use HTTP 429 Too Many Requests (RFC 6585) for client-side rate limit hits. Use 503 Service Unavailable only for server overload (not the client's fault). 429 signals 'retry later' (idempotent), while 503 suggests 'backend is down' (may not be safe to retry). Always pair 429 with Retry-After header.