KServe is a Kubernetes-native platform for deploying and serving machine learning models. It provides model serving abstraction (supports PyTorch, TensorFlow, SKLearn, XGBoost, custom models), auto-scaling based on traffic, traffic splitting (canary, A/B testing), monitoring, and model versioning. KServe runs on Kubernetes via KNative, enabling serverless inference: models scale to zero when idle, spin up on demand.