Edge inference is executing machine learning models on edge devices or servers close to users, not in cloud data centers. This includes smartphones, IoT devices, Cloudflare Workers, and regional servers. Key technologies: ONNX (model format), TensorFlow Lite (mobile), WebAssembly (browser), and specialized runtimes.