Installation

Installing OME to a Kubernetes Cluster

Before you begin

OME supports multiple deployment modes to enable InferenceService deployment with Kubernetes resources:

  • RawDeployment (Default): Uses standard Kubernetes Deployment, Service, Ingress and HorizontalPodAutoscaler. Supports mounting multiple volumes but does not support scale to/from zero. Optionally supports custom metrics scaling with KEDA and Prometheus.
  • Serverless: Enables autoscaling based on request volume with scale down to and from zero. Supports revision management and canary rollout. Requires: Knative Serving and Istio.
  • MultiNode: Enables multi-node deployment for models that require distributed computing. Requires: LeaderWorkerSet (LWS).
  • PDDisaggregated: Enables prefill-decode disaggregated deployment for models that require most optimal performance. Requires: LeaderWorkerSet (LWS) for larger models that require distributed computing.

Required Components

Make sure the following conditions are met:

  • A Kubernetes cluster with version 1.27 or newer is running. Learn how to install the Kubernetes tools.
  • The kubectl command-line tool has communication with your cluster.
  • The cluster has a cert-manager installed (minimum version 1.9.0).

Optional Components

The following components are optional and only required for specific features:

ComponentRequired ForDescription
IstioServerless mode, Virtual ServicesService mesh for traffic management (minimum version 1.19)
Knative ServingServerless modeServerless container deployment and serving
KEDACustom metrics autoscalingKubernetes Event-driven Autoscaling
PrometheusCustom metrics autoscalingMetrics collection and monitoring
LeaderWorkerSet (LWS)MultiNode, MultiNodeRayVLLM modesKubernetes API for distributed training workloads
KueueJob schedulingKubernetes-native job queueing

!!! warning Important: If you plan to use MultiNode or MultiNodeRayVLLM deployment modes, you MUST install the corresponding optional components (Ray and/or LWS) BEFORE installing OME. The controller may panic if these CRDs are not available when needed.

1. Install Istio

Optional - Required only for Serverless mode and Virtual Service ingress

The minimally required Istio version is 1.19 and you can refer to the Istio install guide.

Once Istio is installed, create IngressClass resource for istio:

apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
  name: istio
spec:
  controller: istio.io/ingress-controller

!!! Note If you are running on a managed Kubernetes service, you can use the managed Istio service provided by the cloud provider.

!!! Note Istio ingress is recommended for Serverless mode, but you can choose to install with other Ingress controllers and create IngressClass resource for your Ingress option.

2. Install Cert Manager

Required

The minimally required Cert Manager version is 1.9.0, and you can refer to Cert Manager installation guide.

!!! Note Cert manager is required to provision webhook certs for production grade installation. Alternatively, you can run a self-signed certs generation script.

3. Install Knative Serving (Optional - Serverless mode only)

Optional - Required only for Serverless deployment mode

Please refer to Knative Serving install guide.

!!! note If you are looking to use PodSpec fields such as nodeSelector, affinity or tolerations which are now supported in the v1beta1 API spec, you need to turn on the corresponding feature flags in your Knative configuration.

!!! note If you are using private registry for your images, you need to configure knative to skip resolve image digest.

kubectl -n knative-serving edit configmap config-deployment

Add the following to the data section:

data:
  registriesSkippingTagResolving: ko.local, dev.local, ghcr.io

4. Install KEDA (Optional - Custom metrics scaling)

Optional - Required only for custom metrics autoscaling

Please refer to KEDA install guide.

5. Install Prometheus (Optional - Custom metrics scaling)

Optional - Required only for custom metrics autoscaling with KEDA

  1. Get Helm Repository Information
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
  1. Install kube-prometheus-stack
helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack

6. Install LeaderWorkerSet (Optional - MultiNode mode only)

Optional - Required for both MultiNode and MultiNodeRayVLLM deployment modes

Please refer to LeaderWorkerSet installation guide.

Example installation:

kubectl apply --server-side -f https://github.com/kubernetes-sigs/lws/releases/download/v0.3.0/lws-webhook.yaml

7. Install Kueue (Optional - Job scheduling)

Optional - Required only for advanced job scheduling features

Please refer to Kueue installation guide.

8. Clone OME repository

The Go tools require that you clone the repository to the src/github.com/sgl-project/ome directory in your GOPATH.

To check out this repository:

  1. Create your own clone this repo
  2. Clone it to your machine:
mkdir -p ${GOPATH}/src/github.com/sgl-project
cd ${GOPATH}/src/github.com/sgl-project
git clone https://github.com/sgl-project/ome.git
cd ome

Once you reach this point, you are ready to do a full build and deploy as described below.

Install the latest development version

To install the latest development version of OME in your cluster, run the following command:

make install

The controller runs in the ome namespace.

Uninstall

To uninstall OME, run the following command:

make uninstall