βΆWhen should I use MongoDB instead of PostgreSQL?
Use MongoDB when: (1) your data structure evolves frequently (schema migrations are expensive in SQL); (2) you have hierarchical/nested data (documents map naturally to objects); (3) you need horizontal scaling beyond single-server PostgreSQL; (4) your read patterns are highly denormalized (avoid JOIN complexity). Use PostgreSQL when: your data is highly relational, ACID guarantees across multiple tables matter, or your data structure is stable and well-defined. MongoDB excels for content management, user profiles, IoT sensor logs, real-time analytics. PostgreSQL excels for transactional systems (banking, e-commerce orders, inventory). Hybrid: use both β PostgreSQL for transactional core, MongoDB for logs/cache/user activity.
βΆHow do I design documents efficiently for MongoDB?
Three patterns: (1) Embed related data in one document if accessed together (e.g., user profile + address + preferences = one doc). Limits: 16MB max per document, so don't embed unbounded arrays. (2) Reference data with ObjectIds if it's updated independently (e.g., users reference order IDs, not embedding full orders). (3) Hybrid: embed hot data (user name, email), reference cold data (full order history). Use dot-notation to query embedded fields: `db.users.find({'address.city': 'NYC'})`. Profile your queries first, then denormalize intentionally β over-normalization defeats MongoDB's strengths.
βΆWhat's the aggregation pipeline and when do I use it?
Aggregation pipeline = multi-stage data transformation (map-reduce-filter-sort). Stages: $match (filter), $project (select fields), $group (aggregate), $sort (order), $lookup (join), $unwind (explode arrays), $facet (multi-dimensional output). Use for: complex analytics (sum revenue by region, count users per cohort), data transformation before sending to client, or replacing client-side processing. Example: `db.orders.aggregate([{$match: {date: {$gt: startDate}}}, {$group: {_id: '$customerId', total: {$sum: '$amount'}}}])` gets total revenue per customer. Pipeline is faster than fetching raw docs and processing in code because filtering/grouping happens server-side. Limit: pipeline can't modify documents (use updateMany for that).
βΆHow do I optimize queries β indexing strategy?
Index = sorted B-tree copy of a field, speeds up queries but slows writes. Strategy: (1) Use `db.collection.find().explain('executionStats')` to see how many documents a query scans. Scan count >> returned documents = missing index. (2) Index your most common query filters first (e.g., if 90% of queries filter by userId, index userId). (3) Compound indexes match query shape: query `{userId: 1, createdAt: -1}` matches index `{userId: 1, createdAt: -1}` exactly (order matters for sort). (4) Prefix rule: index `{a, b, c}` is used for `{a}`, `{a, b}`, or `{a, b, c}` queries but NOT `{b, c}` alone. (5) Avoid over-indexing β each index slows inserts. Monitor with `db.collection.stats()` to see index size and query patterns. Atlas has performance advisor that suggests indexes.
βΆWhat's the difference between transactions and atomicity in MongoDB?
Single-document atomicity = updates to one document are atomic (all-or-nothing). Multi-document transactions (v4.0+) = multiple operations across multiple documents treated as atomic unit. Use transactions when: updating a user balance AND logging the transaction (both must succeed or both fail). Syntax: `session.startTransaction()`, execute ops, `session.commitTransaction()` or `session.abortTransaction()`. Cost: transactions are slower (2-5x) than single-document updates and lock resources. Best practice: design documents to minimize transaction need (embed related data instead). Transactions on sharded clusters have higher latency β avoid if possible.
βΆHow do I handle data consistency and replication?
Replica sets = primary (accepts writes) + secondaries (read replicas). Write concern (w=1 default, w='majority' stronger) controls how many replicas must acknowledge a write before returning success. Read preference = primary (default, consistent), primaryPreferred (read from secondary if primary down), or secondary (read-only replicas for analytics). For high consistency: use `{w: 'majority'}` writes (slower but safer). For high throughput: use `w=1` (fast but risky if primary fails before secondary replicates). Change streams = listen for real-time updates: `collection.watch([{$match: {operationType: 'insert'}}])` fires callbacks on new inserts. Use for: microservices notifications, syncing to search indexes, real-time dashboards.
βΆWhat are common MongoDB mistakes and how do I avoid them?
Mistake 1: Using MongoDB for highly relational data (10+ JOINs) β slower than SQL. Fix: normalize in PostgreSQL. Mistake 2: Embedding unbounded arrays (e.g., all comments in a post) β document bloats to 16MB. Fix: store comment IDs in post, fetch separately. Mistake 3: No indexes on filter fields β collection scans 100% of documents. Fix: run explain(), add indexes. Mistake 4: Ignoring write concern β data loss if primary crashes. Fix: use `w='majority'`. Mistake 5: Denormalizing everything β update anomalies (change user name in 1000 docs). Fix: reference instead of embed for mutable data. Mistake 6: Not using projection β fetching unnecessary fields. Fix: `.find({}, {_id: 1, name: 1})` to fetch only id + name. Test data consistency with read-your-own-write semantics: write then immediately read to verify.