Concepts

Core OME Concepts

This section of the documentation helps you learn about the components, APIs and abstractions that OME uses to represent model serving, model training, models, and dedicated AI clusters.

APIs

Base Model

The BaseModel CRD manages the lifecycle of foundational Hugging Face compatible and TensorRT LLM/AI models such as GPT, BERT, and other architectures, model type, model format, capabilities, model sizes and model configurations. These base models can be used for both training and serving. This resource has both namespace-scoped and cluster-scoped which can be used to define base models for different models.

Fine-Tuned Weight

The FineTunedWeight CRD manages the weights of models fine-tuned from a base model, allowing for task-specific optimization.

Serving Runtime

The ServingRuntime CRD manages the runtime environment for model serving, allowing for dynamic scaling and configuration of model-serving containers. This resource has both namespace-scoped and cluster-scoped which can be used to define serving runtimes for different models.

Inference Service

The InferenceService CRD manages the entire lifecycle of model-serving workloads, including model versioning, scaling, and traffic routing. It supports real-time inference for both single-node and multi-node deployments, ensuring seamless model updates and efficient scaling.

Ingress

OME supports a range of ingress controllers for external access to model serving workloads. This section provides an overview of the available ingress controller options, including their capabilities, configuration, and features.

Benchmark

OME integrates with the latest GenAI-Bench to provide real-time benchmarking capabilities for AI models. This section provides an overview of the GenAI-Bench benchmarks, how they work, and how to use them with OME.