Labels and Annotations

Reference of labels and annotations used by OME.

This document serves as a reference of the various labels and annotations used throughout OME.

Annotations

InferenceService Annotations

These annotations are used to configure InferenceService behavior:

AnnotationDescription
ome.io/enable-tag-routingEnables tag-based routing for the InferenceService
ome.io/autoscalerClassSpecifies the autoscaler class to use. Valid values: hpa, keda, external
ome.io/metricsDefines the scaling metric type. Valid values: cpu, memory
ome.io/targetUtilizationPercentageSets the target utilization percentage for autoscaling
ome.io/deprecation-warningDisplays deprecation warnings for legacy configurations
ome.io/enable-metric-aggregationEnables metric aggregation for the InferenceService
ome.io/enable-prometheus-scrapingEnables Prometheus scraping for metrics collection
ome.io/volcano-queueSpecifies the Volcano queue name for job scheduling

Model and Runtime Annotations

AnnotationDescription
ome.io/inject-model-initEnables injection of model initialization containers
ome.io/inject-fine-tuned-adapterEnables injection of fine-tuned adapter containers
ome.io/inject-serving-sidecarEnables injection of serving sidecar containers
ome.io/fine-tuned-weight-ft-strategySpecifies the fine-tuning strategy for weights
ome.io/base-model-nameSpecifies the base model name
ome.io/base-model-vendorSpecifies the base model vendor
ome.io/serving-runtimeSpecifies the serving runtime to use
ome.io/base-model-formatSpecifies the base model format
ome.io/base-model-format-versionSpecifies the base model format version
ome.io/fine-tuned-serving-with-merged-weightsEnables fine-tuned serving with merged weights

Model Security Annotations

These annotations control model encryption and decryption:

AnnotationDescription
ome.io/base-model-decryption-key-nameSpecifies the decryption key name for the base model
ome.io/base-model-decryption-secret-nameSpecifies the secret name containing decryption credentials
ome.io/disable-model-decryptionDisables model decryption

Service Configuration Annotations

AnnotationDescription
ome.io/service-typeSpecifies the Kubernetes service type
ome.io/load-balancer-ipSets the load balancer IP address

RDMA Annotations

AnnotationDescription
rdma.ome.io/auto-injectEnables automatic RDMA injection
rdma.ome.io/profileSpecifies the RDMA profile to use
rdma.ome.io/container-nameSpecifies the container name for RDMA configuration

Knative Annotations

AnnotationDescription
autoscaling.knative.dev/min-scaleSets the minimum number of replicas
autoscaling.knative.dev/max-scaleSets the maximum number of replicas
serving.knative.dev/rollout-durationSpecifies the rollout duration
serving.knative.openshift.io/enablePassthroughEnables passthrough on OpenShift

Labels

Model and Runtime Labels

LabelDescription
base-model-nameBase model name label
base-model-sizeBase model size label
base-model-typeBase model type label
base-model-vendorBase model vendor label
fine-tuned-servingFine-tuned serving label
fine-tuned-serving-with-merged-weightsFine-tuned serving with merged weights label
serving-runtimeServing runtime label
fine-tuned-weight-ft-strategyFine-tuning strategy label

Scheduling Labels

LabelDescription
ray.io/scheduler-nameRay scheduler name
ray.io/priority-class-nameRay priority class name
raycluster/unavailable-sinceRay cluster unavailable timestamp
volcano.sh/queue-nameVolcano queue name
volcano.sh/job-nameVolcano job name

Kueue Labels

LabelDescription
kueue.x-k8s.io/queue-nameKueue queue name
kueue.x-k8s.io/priority-classKueue workload priority class
kueue-enabledEnables Kueue for the resource

Model Agent Labels

LabelDescription
node.kubernetes.io/instance-typeNode instance shape
models.ome/{uid}Model label with UID
models.ome.io/target-instance-shapesTarget instance shapes for models
models.ome/basemodel-statusBase model status

Component Labels

LabelDescription
componentKService component label
endpointKService endpoint label
ome.io/inferenceserviceInferenceService label for TrainedModel
ome.io/inferenceserviceInferenceService pod label

Network Visibility Labels

LabelDescription
networking.ome.io/visibilityNetwork visibility configuration
networking.knative.dev/visibilityKnative network visibility
sidecar.istio.io/injectIstio sidecar injection

Special Values

Autoscaler Classes

  • hpa: Horizontal Pod Autoscaler
  • keda: Kubernetes Event-driven Autoscaling
  • external: External autoscaler

Scale Metrics

  • cpu: CPU utilization
  • memory: Memory utilization
  • concurrency: Request concurrency (Knative)
  • rps: Requests per second (Knative)

Priority Classes

  • volcano-scheduling-high-priority: High priority for Volcano scheduling
  • kueue-scheduling-high-priority: High priority for Kueue workload scheduling