Apache Impala is a massively parallel processing (MPP) SQL engine for Hadoop. It executes SQL queries directly on HDFS and other storage (HBase, S3) without MapReduce overhead. Impala's in-memory, vectorized execution returns results in milliseconds, enabling interactive analytics and ad-hoc exploration. Architecture: coordinator distributes query to executors, executors process data in parallel, results aggregated and returned. Built on Hadoop infrastructure (uses Hive metastore, runs on HDFS), making it compatible with existing data lakes.