▶Relational vs NoSQL — when should I pick which for my application?
Relational (PostgreSQL, MySQL): ACID guarantees, complex queries, normalize data, consistency first. Pick for multi-table relationships (e-commerce, banking, SaaS). NoSQL (MongoDB, DynamoDB): eventual consistency, horizontal scale, document/key-value storage. Pick for unstructured data, high throughput, schemaless flexibility. Hybrid: use both — relational for transactions, NoSQL for logs/caches/events. 2026 trend: PostgreSQL wins for most apps; NoSQL reserved for specific use cases (real-time analytics, IoT, social feeds).
â–¶What are the normalization tradeoffs, and when should I denormalize?
Normalization (1NF → 3NF → BCNF) eliminates redundancy, saves space, ensures consistency. Downside: complex joins slow queries. Denormalize when: frequent reads of denormalized data (reporting), joins are expensive, you control consistency (event-sourcing), non-critical data. Real example: e-commerce—normalize product catalog, denormalize customer orders (save address snapshot to avoid broken links). Monitor query plans; denormalize only if profiler shows join bottleneck.
â–¶Should I use JSON columns in PostgreSQL instead of a separate table?
JSON columns (jsonb type) shine for: flexible schema (user metadata, webhooks), nested data you rarely filter by, semi-structured data. Avoid if: you query deeply into the JSON, need ACID guarantees on JSON fields (use separate tables), data is highly relational. Best practice: jsonb + indexes (e.g., GIN index) for fast querying. PostgreSQL jsonb is 70% replacement for NoSQL; most apps stay relational + jsonb hybrid.
â–¶How do I design a multi-tenant SaaS database schema?
Two patterns: (1) Row-Level Security (RLS)—one table per entity, add tenant_id column, use RLS policies to isolate data per tenant. Cheaper, simpler queries, shared infrastructure risk. (2) Separate schemas—one schema per tenant, same table structure, complete isolation, higher maintenance. Pick RLS for <100 tenants, separate schemas for high-security/regulatory isolation. Ensure indexes on tenant_id + query columns; avoid SELECT * queries that bypass tenant filtering.
â–¶What's the difference between sharding and partitioning?
Partitioning: split one table by range (e.g., date ranges for logs), list (enum), or hash—same database. Faster queries on specific partitions, easier pruning. Sharding: split data across multiple databases/servers by hash—horizontal scale, no single-server bottleneck. Pick partitioning for single-server scale-out (PostgreSQL partitions). Pick sharding when one server can't handle volume. Both require app-side logic or proxy (e.g., Django ORM, Prisma, Citus).
â–¶How do I handle zero-downtime migrations on production databases?
Backward-compatible steps: (1) add new column + backfill in batches, (2) dual-write old + new, (3) migrate reads to new column, (4) remove old column. Use tools: Prisma migrations (versioned), Flyway, Liquibase. Test on staging first. For >1GB tables, use online migration tools (percona-xtrabackup, pg_repack) to avoid locks. Always checkpoint every 1M rows backfilled; if something breaks, rollback that checkpoint.
â–¶What indexes should I add to optimize slow queries?
Use EXPLAIN to find seq scans and slow joins. Add indexes on: (1) WHERE columns, (2) JOIN keys, (3) ORDER BY / GROUP BY columns. Avoid: wide indexes (>4 cols), indexes on low-cardinality columns, indexes on UPDATE-heavy columns (slows writes). Composite indexes: order matters (equality first, then range, then sort). Partial indexes: WHERE clause to skip null/archived rows. Trade-off: faster reads, slower writes; monitor disk usage and query time weekly.