Speech-to-text (ASR, automatic speech recognition) converts audio recordings into text automatically. Modern ASR models (Whisper, Google Cloud, AWS Transcribe) achieve >95% accuracy on clean audio and can handle multiple languages, accents, and dialects. Applications range from accessibility (captions for deaf users), content creation (podcast transcripts, video subtitles), to voice interfaces (Alexa, Siri). ASR combines audio signal processing, acoustic modeling, and language models to predict what words were spoken.