Design fault-tolerant systems across multiple machines. Consensus, replication, partitioning. Essential for Staff+ engineers at scale. Powers Google, Amazon, Netflix infrastructure. Learning Curve: Hard (complex theory + practical trade-offs)