OpenLLM is a framework (built on BentoML) for serving open-source language models (Llama, Mistral, Qwen, Baichuan, etc.). It exposes models via an OpenAI API-compatible server, enabling drop-in replacement for proprietary LLMs. Deploy anywhere: Kubernetes, EC2, bare metal, serverless. Full control, no vendor lock-in.