Skip to main content
JobCannon
All skills

API Rate Limiting

Protecting APIs from abuse while ensuring fair access

β¬’ TIER 3Tech
+$10k-
Salary impact
4 months
Time to learn
Medium
Difficulty
4
Careers
AT A GLANCE

API rate limiting controls request volume per client using token bucket, sliding window, or fixed window algorithms. Distributed implementations via Redis handle multi-server environments. Understanding rate limit headers (RateLimit-* vs X-RateLimit-*), per-user/IP/key strategies, and retry-after semantics is essential for backend systems. Career path: Practitioner (fixed window, basic HTTP 429, $95-125k) β†’ Architect (distributed token bucket, multi-tier limits, $140-190k) β†’ Expert (adaptive limits, quota billing integration, $180-260k) over 4-6 months.

What is API Rate Limiting

API rate limiting controls how many requests clients can make within a time window, protecting services from abuse, DDoS attacks, and noisy neighbors. Implementing effective rate limiting requires understanding token bucket, sliding window, and fixed window algorithms, along with distributed rate limiting in multi-server environments. This is a critical system design skill tested in senior engineering interviews and essential for anyone building public APIs or multi-tenant platforms.

πŸ”§ TOOLS & ECOSYSTEM
RedisUpstashCloudflare Rate LimitingAWS API Gateway throttlingKongEnvoyTykBottleneckLua scripts in RedisNGINX limit_reqNode.js rate-limiter-flexibleExpress rate-limit

πŸ’° Salary by region

RegionJuniorMidSenior
USA$110k$155k$210k
UKΒ£65kΒ£95kΒ£135k
EU€72k€105k€145k
CANADAC$118kC$165kC$225k

❓ FAQ

Token bucket vs sliding window vs fixed window β€” which one should I use?
Fixed window: simplest, per-minute counters, prone to edge-case bursts at window boundary. Token bucket: smooth refill at constant rate, handles bursts gracefully, industry standard for public APIs (GitHub, Stripe use it). Sliding window: accurate but complex, higher memory cost in distributed systems. Default to token bucket for new APIs; fixed window only for simple internal services.
How do I implement distributed rate limiting across multiple servers?
Single-server rate limiting (in-memory counters) fails immediately when you scale. Use Redis as the source of truth: each request increments a counter with TTL = window size. Lua scripts atomically check-and-increment to avoid race conditions. For geo-distributed systems (multi-region), replicate Redis across regions or accept slightly stale limits for performance.
What headers should I return: RateLimit-* or X-RateLimit-*?
IETF standard is RateLimit-Limit / RateLimit-Remaining / RateLimit-Reset (RFC 6585). GitHub uses X-RateLimit-* (legacy de-facto standard, still widely recognized). Return both for compatibility: RateLimit-* for modern clients, X-RateLimit-* for legacy. Always include Retry-After header with 429 responses (seconds or HTTP-date format).
Per-user, per-IP, or per-API-key rate limiting β€” how do I choose?
Per-user (for authenticated requests): fairest, prevents account abuse. Per-IP (for public/unauthenticated endpoints): blocks abusers but punishes office-behind-NAT. Per-API-key: best for B2B APIs, allows tiered limits per subscription. Combine all three: strictest limit applies. Example: 1000/day per user, 100/min per IP, 10k/day per API key.
How do I handle rate limiting with exponential backoff on the client side?
Check Retry-After header (in seconds or HTTP-date). Implement exponential backoff: wait 2^attempt seconds (1, 2, 4, 8, 16…) with jitter (Β±20%) to avoid thundering herd. For 429 responses, obey Retry-After strictly. For other transient failures (5xx), backoff is optional. Libraries: `axios-retry` (Node.js), `tenacity` (Python), `@shopify/network` (TypeScript).
Free tier vs paid tier rate limiting β€” how do I differentiate?
Embed tier in the request context (user object or API key lookup). Apply loose limits to free tier (100 req/min), tight to premium (1000 req/min). Use same algorithm for both; just parameterize the limit. Track limit exhaustion per tier for analytics/billing. Offer burst allowance to premium tiers (smooth out spikes).
Should I return 429 Conflict or 503 Unavailable when rate limited?
Use HTTP 429 Too Many Requests (RFC 6585) for client-side rate limit hits. Use 503 Service Unavailable only for server overload (not the client's fault). 429 signals 'retry later' (idempotent), while 503 suggests 'backend is down' (may not be safe to retry). Always pair 429 with Retry-After header.

Not sure this skill is for you?

Take a 10-min Career Match β€” we'll suggest the right tracks.

Find my best-fit skills β†’

Find your ideal career path

Skill-based matching across 2,536 careers. Free, ~10 minutes.

Take Career Match β€” free β†’