▶Layer 4 vs Layer 7 load balancing — which should I use?
L4 (transport: TCP/UDP) is blazingly fast, handles any protocol, but has no request awareness — good for raw throughput, VoIP, gaming. L7 (application: HTTP/HTTPS) inspects headers/cookies, enables path-based routing, cookie stickiness, compression — best for web apps. Hybrid: L4 in front for DDoS/capacity, then L7 for app logic. Latency: L7 adds ~5-15ms vs L4's <1ms.
▶When do I need sticky sessions and what's the tradeoff?
Sticky sessions bind a user to one backend server — needed for in-memory session state (PHP $_SESSION, user shopping carts without Redis). Cost: reduces load balancing efficiency, breaks auto-scaling (can't kill a sticky server mid-request). Better: externalize sessions to Redis/Memcached (stateless backends), use JWT tokens, or store in database. If sticky required: use IP hash or cookie-based affinity, set TTL, monitor uneven load.
▶What autoscaling pitfalls should I avoid?
Metrics lag: CPU average measured over 5 min, so burst traffic creates 5-min delay before scaling. Too aggressive scaling = flapping/thrashing. Too conservative = customers hit rate limits. Solutions: use multiple metrics (CPU + request queue depth + network), set scale-up faster than scale-down, use predictive scaling (forecast based on time-of-day). Always test with load generator; don't trust defaults.
▶What load balancing algorithms exist and when do I use each?
Round-robin: simple, even distribution (good when servers identical). Least-connections: routes to server with fewest active connections (good for long-lived connections). Weighted: route proportionally (good when servers different sizes: 70%→new, 30%→old during canary). IP hash: deterministic per client (good for session stickiness without cookies). Random: mathematically optimal at scale (used in CDNs). Latency-aware: active measurement of each backend's response time (rare, cutting-edge).
▶DNS-based load balancing vs anycast — what's the difference?
DNS LB: client asks DNS, gets A record pointing to one LB instance, then connects via that LB. High latency (DNS lookup ~100ms), client not aware of failover (stale TTL). Anycast: multiple servers advertise same IP, network routes packet to nearest. Sub-1ms latency, automatic failover, but requires BGP + complex operations. Hybrid: use DNS for geo-routing (client → nearest region), then anycast within region.
▶How do I implement health checks without false positives?
Simple ping (ICMP) = unreliable. Better: TCP connect (proves port open), or HTTP GET + 200 status (proves app responding). Advanced: synthetic transactions (hit /health endpoint, verify database connectivity). Pitfall: health check interval too short = traffic overhead; too long = 30-60sec before detecting failure. Rule: 3-5 sec interval, 3 failures to mark unhealthy, 1 success to mark healthy. Avoid: health checks that trigger expensive operations (full DB scan).
▶How do blue-green deployments interact with load balancing?
Blue (live) + Green (new) environments run in parallel. LB switches traffic 100% at once (instant, easy rollback) or gradually (canary: 5% → 50% → 100%). Requires: health checks detect bad green, instant DNS/LB update capability. Gotcha: if sessions sticky to blue servers, green servers sit idle. Solution: external session store, or use canary to slowly drain blue.