βΆWhen should I use Redis vs. Memcached?
Memcached: simple key-value caching, 'dumb' data store, no persistence, good for session cache. Redis: rich data structures (strings, hashes, sets, sorted sets, streams, HyperLogLog), persistence options (RDB snapshots, AOF logs), pub/sub, transactions, Lua scripting, cluster mode. Use Redis when: you need data durability, complex operations (leaderboards, rate limiting, messaging), or multi-command transactions. Use Memcached when: you only need TTL'd key-value cache and want minimal overhead. Most modern backends prefer Redis unless you have Memcached expertise. Redis 7+ is faster than old Memcached benchmarks β the gap has closed.
βΆHow do I design Redis keys for scalability?
Key naming convention: `namespace:entity:id:attribute`, e.g. `user:123:sessions:abc`, `leaderboard:game:today`. Enables: scanning by prefix (SCAN user:123:*), organizing cache invalidation, and reasoning about memory usage. Avoid: generic keys like 'cache', 'data' β they hide your data structure. Use hashes for related fields (e.g., `user:123` = {name, email, role}) rather than separate string keys. Avoid large unbounded collections (e.g., don't store all comments as array in one key). For leaderboards, use sorted sets (ZSET) where score is rank and member is user ID. Monitor key size with MEMORY STATS β oversized keys indicate design issues.
βΆHow do I implement a production caching strategy with Redis?
Patterns: (1) Cache-aside: app queries Redis, miss β fetch from DB, write to Redis with TTL. Most control, explicit. (2) Write-through: app writes to Redis first, Redis writes to DB synchronously. Ensures consistency but slower. (3) Write-behind: app writes to Redis, async process flushes to DB. Fast but risk of data loss. For most backends, use cache-aside with TTL=300-3600s depending on staleness tolerance. Always handle cache misses gracefully (check DB, update cache). Use GETEX to refresh TTL on access. For hot data, never expire (rely on manual invalidation or eviction policy). For cold data, short TTL (5min). Implement cache stampede prevention: use probabilistic early expiration or locks (SETNX with mutex key) when hot keys near expiration.
βΆWhat's the difference between RDB and AOF persistence?
RDB (Snapshotting): periodic snapshots (BGSAVE runs background fork) capturing entire dataset at moment in time. Fast recovery, compact file. Loss: all writes since last snapshot lost if crash. AOF (Append-Only File): logs every write command, replayed on restart. Durability: can lose <1s of data. Slower recovery (replay all writes). Combined strategy: RDB for disaster recovery (weekly), AOF for durability (daily fsync). For critical data, `appendfsync=always` (1 write/log but very slow); `appendfsync=everysec` (reasonable, ~1s data loss max); `appendfsync=no` (OS decides, risky). Rewrite AOF when it grows 100x original size to prevent bloat.
βΆHow do I debug memory issues and prevent OOM?
Monitor with INFO memory: used_memory, peak_memory, fragmentation_ratio. If fragmentation >1.5x, restart to compact. If used_memory approaches maxmemory limit, OOM kill will crash. Set maxmemory-policy: allkeys-lru (evict any LRU key) for caches, noeviction for critical data (then monitor actively). Use MEMORY DOCTOR for insights. Commands to analyze: MEMORY STATS (per-allocation bucket), SCAN + MEMORY USAGE key (per-key size). Common causes: unbounded lists (LPUSH without LTRIM), storing large objects (compress before storing), no TTL on ephemeral data. Use REDIS_CLI --bigkeys to find oversized keys. Lua scripts can cause memory spikes β keep them simple or stream data in batches.
βΆHow do I set up Redis Sentinel for high availability?
Sentinel = separate process monitoring Redis instances and handling failover. Setup: 3 Sentinel processes, 1 primary Redis, 2+ replicas. Sentinel monitors primary health (every 30s ping), promotes replica to primary if primary down. Configuration: sentinel.conf defines quorum (2 of 3 agree primary is dead) and failover time. Clients use Sentinel to discover current primary (conn string = sentinel:26379, ask for primary before each write). Failover is automatic, ~30s recovery time. Gotchas: Sentinel requires network partitions to be rare (slow network causes flapping); replication lag on replicas means promoted replica may have stale data; Sentinel itself is a single point of failure (use 3+ Sentinels). For larger deployments, use Redis Cluster instead (horizontal sharding) or Redis Cloud (managed).
βΆWhat are common Redis pitfalls and how do I avoid them?
Pitfall 1: Using KEYS command in production β scans entire keyspace, blocks server. Fix: use SCAN with small cursor. Pitfall 2: Not setting maxmemory-policy β OOM crashes. Fix: set policy immediately, test eviction. Pitfall 3: Assuming persistence is automatic β RDB requires BGSAVE scheduled, AOF requires fsync tuning. Fix: explicitly configure persistence in redis.conf. Pitfall 4: Replication lag on replicas β reads stale data. Fix: use read-your-own-write semantics, route writes to primary. Pitfall 5: Pub/Sub without persistence β subscribers miss messages if offline. Fix: use Streams instead (messages persisted, consumer groups, replay). Pitfall 6: Storing serialized objects without compression β memory bloat. Fix: compress with zlib before SETEX. Pitfall 7: Lua scripts with side effects β can't be retried. Fix: keep scripts idempotent. Test with redis-benchmark to catch bottlenecks early.