Skip to main content
JobCannon
All skills

Load Balancing & Scaling

Distribute traffic, scale horizontally: high availability, performance

⬢ TIER 3Tech
+$20k-
Salary impact
5 months
Time to learn
Medium
Difficulty
1
Careers
TL;DR

Load balancing distributes requests across multiple servers to eliminate single points of failure and enable horizontal scaling. Backend/SRE career path: Practitioner (Layer 4/7 LBs, basic sticky sessions, $95-130k) → Specialist (multi-region failover, auto-scaling policies, health checks, $140-190k) → Staff-level SRE (global LB architectures, DNS-based routing, blue-green deployments, $200-280k+) over 4-6 months. Core patterns: round-robin, least-connections, weighted round-robin, IP hash for sticky sessions. Tools: NGINX, HAProxy, AWS ALB/NLB, GCP LB, Cloudflare, Envoy, Traefik, Istio, Kong.

What is Load Balancing & Scaling

Load balancing distributes traffic across multiple servers. Scaling = adding capacity (horizontal: more servers; vertical: bigger servers). Essential for high-traffic systems. L1: Nginx/HAProxy basics, auto-scaling

🔧 TOOLS & ECOSYSTEM
NGINXHAProxyAWS ALBAWS NLBGCP Load BalancerCloudflare Load BalancingEnvoyTraefikIstioKongKubeProxy

💰 Salary by region

RegionJuniorMidSenior
USA$95k$155k$220k
UK£70k£105k£145k
EU€75k€115k€155k
CANADAC$100kC$165kC$225k

🎯 Careers using Load Balancing & Scaling

❓ FAQ

Layer 4 vs Layer 7 load balancing — which should I use?
L4 (transport: TCP/UDP) is blazingly fast, handles any protocol, but has no request awareness — good for raw throughput, VoIP, gaming. L7 (application: HTTP/HTTPS) inspects headers/cookies, enables path-based routing, cookie stickiness, compression — best for web apps. Hybrid: L4 in front for DDoS/capacity, then L7 for app logic. Latency: L7 adds ~5-15ms vs L4's <1ms.
When do I need sticky sessions and what's the tradeoff?
Sticky sessions bind a user to one backend server — needed for in-memory session state (PHP $_SESSION, user shopping carts without Redis). Cost: reduces load balancing efficiency, breaks auto-scaling (can't kill a sticky server mid-request). Better: externalize sessions to Redis/Memcached (stateless backends), use JWT tokens, or store in database. If sticky required: use IP hash or cookie-based affinity, set TTL, monitor uneven load.
What autoscaling pitfalls should I avoid?
Metrics lag: CPU average measured over 5 min, so burst traffic creates 5-min delay before scaling. Too aggressive scaling = flapping/thrashing. Too conservative = customers hit rate limits. Solutions: use multiple metrics (CPU + request queue depth + network), set scale-up faster than scale-down, use predictive scaling (forecast based on time-of-day). Always test with load generator; don't trust defaults.
What load balancing algorithms exist and when do I use each?
Round-robin: simple, even distribution (good when servers identical). Least-connections: routes to server with fewest active connections (good for long-lived connections). Weighted: route proportionally (good when servers different sizes: 70%→new, 30%→old during canary). IP hash: deterministic per client (good for session stickiness without cookies). Random: mathematically optimal at scale (used in CDNs). Latency-aware: active measurement of each backend's response time (rare, cutting-edge).
DNS-based load balancing vs anycast — what's the difference?
DNS LB: client asks DNS, gets A record pointing to one LB instance, then connects via that LB. High latency (DNS lookup ~100ms), client not aware of failover (stale TTL). Anycast: multiple servers advertise same IP, network routes packet to nearest. Sub-1ms latency, automatic failover, but requires BGP + complex operations. Hybrid: use DNS for geo-routing (client → nearest region), then anycast within region.
How do I implement health checks without false positives?
Simple ping (ICMP) = unreliable. Better: TCP connect (proves port open), or HTTP GET + 200 status (proves app responding). Advanced: synthetic transactions (hit /health endpoint, verify database connectivity). Pitfall: health check interval too short = traffic overhead; too long = 30-60sec before detecting failure. Rule: 3-5 sec interval, 3 failures to mark unhealthy, 1 success to mark healthy. Avoid: health checks that trigger expensive operations (full DB scan).
How do blue-green deployments interact with load balancing?
Blue (live) + Green (new) environments run in parallel. LB switches traffic 100% at once (instant, easy rollback) or gradually (canary: 5% → 50% → 100%). Requires: health checks detect bad green, instant DNS/LB update capability. Gotcha: if sessions sticky to blue servers, green servers sit idle. Solution: external session store, or use canary to slowly drain blue.

Not sure this skill is for you?

Take a 10-min Career Match — we'll suggest the right tracks.

Find my best-fit skills →

Find your ideal career path

Skill-based matching across 2,536 careers. Free, ~10 minutes.

Take Career Match — free →