Question 1

CAP theorem says pick 2 of 3 — is that real?

Accepted Answer

Oversimplified. CAP applies only during network partitions (rare). What you trade is consistency latency vs availability latency. Most systems are CP by design (wait for consensus before responding) or AP (accept writes, sync later). The nuance: partition doesn't happen often; during normal ops, you want CA. RDBMs = CP, DynamoDB/Cassandra = AP.

Question 2

When do I actually need distributed consensus (Raft, Paxos)?

Accepted Answer

Only when you need a single source of truth across multiple machines that tolerate failures. Examples: leader election, config distribution (etcd, Consul), distributed locks, blockchain. Most application-level code doesn't need this — use the database's built-in consensus instead (MySQL InnoDB Group Replication, PostgreSQL replication, Cassandra's quorum reads).

Question 3

Raft vs Paxos — which should I learn?

Accepted Answer

Raft (designed for teaching, used in etcd/Consul). Paxos (harder, used in Google Chubby/Bigtable). Learn Raft for interviews. Paxos for deep dives. Both guarantee safety (once committed, never lost) and liveness (progress under failures). Key insight: both are just consensus; implementation differences matter more than algorithm choice.

Question 4

Distributed transactions (2PC) always fail — why use them?

Accepted Answer

2PC (two-phase commit) blocks during failures and doesn't scale. Better: Saga pattern (choreography via events or orchestration service). Trade: multiple DBs see eventual consistency instead of immediate atomicity. Acceptable for most systems; required only when you absolutely need immediate ACID across DBs (rare).

Question 5

Eventual consistency means stale data forever — is it broken?

Accepted Answer

No. Eventual consistency = reads may see old data, but writes always propagate. Bounded staleness (usually <100ms). Good UX masks it: show what you just wrote optimistically, don't query immediately after. Eventual consistency enables scale (read replicas, global CDNs). ACID = slower, scales to 1 region; eventual consistency = scales globally.

Question 6

How do I monitor distributed systems for hidden failure modes?

Accepted Answer

Build observability: distributed tracing (Jaeger/Zipkin), structured logging (ELK, Datadog), metrics (Prometheus). Test failure modes with chaos engineering (Gremlin, chaos monkey). Manual testing: simulate network latency, packet loss, process crashes. Most issues found in production — run blameless postmortems, treat as learning.

Question 7

What's the hardest part of distributed systems in practice?

Accepted Answer

Debugging. A request spans 10 services, each with replicas, retries, timeouts, and race conditions. Tracing + logging are non-negotiable. Second-hardest: keeping mental models accurate (what state can we be in?). Third: operational overhead (monitoring, alerting, runbooks). Start with boring centralized systems; move to distributed only when you've hit concrete limits.

Region	Junior	Mid	Senior
USA	$0	$160k	$240k
UK	£0	£95k	£145k
EU	€0	€105k	€160k
CANADA	C$0	C$170k	C$260k

Distributed Systems

What is Distributed Systems

📋 Before you start

💰 Salary by region

🎓 Certifications

🎯 Careers using Distributed Systems

⚖ Compare with

❓ FAQ

Not sure this skill is for you?

Find your ideal career path