Distributed data processing engine for big data analytics. Process petabytes of data across clusters. Standard for batch and streaming data processing at scale. Learning Curve: Medium-Hard (distributed computing concepts)