βΆHow do indexes actually speed up queries? What's the tradeoff?
Indexes create sorted lookup tables (B-trees, hashes) that allow the DB to find rows without scanning every row. A 1M-row table query: full scan = 1M ops, indexed = logβ(1M) β 20 ops. Tradeoff: indexes slow down INSERT/UPDATE/DELETE (must update index too) and consume disk. Rule: add an index only if SELECT queries outnumber writes 10:1. Always use EXPLAIN ANALYZE to verify the index is actually used.
βΆWhen should I denormalize instead of optimizing queries?
Denormalize when: JOIN-heavy queries dominate (5+ tables per query), normalization creates deep hierarchies (10+ levels), or single-column analytics queries run hourly. Example: store precomputed customer_total_spent on the customers table instead of summing orders each time. Denormalization is ~5% of your schema. Never denormalize your entire DB β it kills consistency and makes updates fragile. Always maintain a trigger to keep denormalized columns in sync.
βΆShould I partition a table β when and how?
Partition when: table > 100GB, queries filter on a date range or region, or DELETE operations are slow. Common strategies: RANGE (by date), HASH (by ID to spread evenly), LIST (by category). PostgreSQL: `PARTITION BY RANGE (created_at) PARTITION p_2026q1 FOR VALUES FROM ('2026-01-01') TO ('2026-04-01')`. MySQL: `PARTITION BY KEY (user_id) PARTITIONS 4`. Cost: setup ~4h, query speedup 2β5x, maintenance burden +2h/month.
βΆWhen do I use read replicas vs vertical scaling?
Vertical scaling (bigger server): simpler, no code changes, ~$500β$5k/month. Max out at 1M queries/sec per server. Read replicas: scales horizontally, handles 10M+ queries/sec, but adds replication lag (0β5s). Use replicas when: (1) your read:write ratio > 10:1, (2) you have 100+ concurrent readers, (3) analytics queries slow down prod. Start vertical, add replicas when prod CPU > 70% for 2+ weeks.
βΆHow do I find the actual bottleneck β I/O, CPU, or lock contention?
Use pg_stat_statements (PostgreSQL) or Performance Schema (MySQL) to log every query + duration. Filter top 10% slowest. Then run EXPLAIN ANALYZE to see if it's seq-scan (I/O), sort (CPU), or waiting for locks (contention). Datadog/New Relic will show you CPU/disk reads/writes over time. If your slow query is waiting (not running), it's lock contention β check for transactions that hold locks too long. Start with `EXPLAIN ANALYZE`, always.
βΆWhat's the difference between covering indexes, composite indexes, and partial indexes?
Covering index (includes all columns for SELECT): `CREATE INDEX idx ON orders (user_id) INCLUDE (total, status)` β no table lookup needed. Composite (multiple columns): `CREATE INDEX idx ON orders (user_id, status, created_at)` β left-to-right matching only; `WHERE user_id=1 AND status='paid'` uses it, but `WHERE status='paid'` does not. Partial (filtered): `CREATE INDEX idx ON orders (user_id) WHERE status='paid'` β smaller index, faster, only useful if most queries filter on status='paid'. Use partial indexes for soft-deletes (WHERE deleted_at IS NULL).
βΆHow do I estimate database server costs β how much capacity do I really need?
3 metrics: queries-per-second (QPS), avg query time (ms), data size (GB). Single server can handle ~1000 QPS at 10ms avg. Cloud pricing: AWS RDS t4g.large (~$150/mo, 500 QPS), r6i.xlarge (~$500/mo, 5000 QPS). Data size: SSD $1/GB/mo, HDD $0.10/GB/mo. For 500GB: SSD replica = $500, HDD backup = $50. Start small (t4g.medium), monitor CPU/disk I/O in production, scale after 2 weeks data. Most startups overprovision by 3β5x.