Welcome to OME

Read the docs GitHub

Open Model Engine

OME is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs). It optimizes deployment and operation through automated model management, intelligent runtime selection, and sophisticated resource scheduling.

OME provides advanced serving capabilities including prefill-decode disaggregation, multi-node inference, and cache-aware load balancing. With first-class support for SGLang, vLLM, and TensorRT-LLM, OME ensures optimal performance for your LLM workloads.

Build production-ready LLM services with comprehensive model lifecycle management, automatic runtime selection, and seamless integration with Kubernetes ecosystem components like KEDA, Gateway API, and Istio.