Base Model

Base Model defines foundation models that can be automatically downloaded, parsed, and served across your cluster.

What is a Base Model?

A Base Model in OME is a Kubernetes resource that represents a foundation AI model (like GPT, Llama, or Mistral) that you want to use for inference workloads. Think of it as a blueprint that tells OME where to find your model, how to download it, and where to store it on your cluster nodes.

When you create a BaseModel resource, OME automatically handles the complex process of downloading the model files, parsing the model’s configuration to understand its capabilities, and making it available across your cluster nodes where AI workloads can use it.

BaseModel vs ClusterBaseModel

OME provides two types of model resources:

BaseModel is namespace-scoped, meaning it exists within a specific Kubernetes namespace. If you create a BaseModel in the “team-a” namespace, only workloads in that namespace can use it. This is perfect for team-specific models or when you want to isolate model access.

ClusterBaseModel is cluster-scoped, meaning it’s available to workloads in any namespace across your entire cluster. This is ideal for organization-wide models that multiple teams need to access, like a shared Llama-3 model that everyone uses.

Both types use exactly the same specification format - the only difference is their visibility scope.

Basic Example

Here’s a simple BaseModel to get you started:

apiVersion: ome.io/v1beta1
kind: ClusterBaseModel
metadata:
  name: llama-3-70b-instruct
spec:
  vendor: meta
  version: "3.1"
  disabled: false
  modelType: llama
  modelArchitecture: LlamaForCausalLM
  modelParameterSize: "70B"
  maxTokens: 8192
  modelCapabilities:
    - text-to-text
  modelFormat:
    name: safetensors
    version: "1.0.0"
  modelFramework:
    name: transformers
    version: "4.36.0"
  storage:
    storageUri: oci://n/ai-models/b/llm-store/o/meta/llama-3.1-70b-instruct/
    path: /raid/models/llama-3.1-70b-instruct
    storageKey: oci-credentials
    parameters:
      region: us-phoenix-1
      auth_type: InstancePrincipal
    nodeSelector:
      node.kubernetes.io/instance-type: GPU.A100.4

Specification Reference

Available attributes in the BaseModel/ClusterBaseModel spec:

AttributeTypeDescription
Core Configuration
vendorstringVendor of the model (e.g., “meta”, “mistral”, “openai”)
versionstringVersion of the model (e.g., “3.1”, “1.0”)
disabledbooleanWhether the model is disabled. Defaults to false
displayNamestringUser-friendly name of the model
Model Identification
modelTypestringArchitecture family (e.g., “llama”, “mistral”, “deepseek_v3”)
modelArchitecturestringSpecific implementation (e.g., “LlamaForCausalLM”, “MistralForCausalLM”)
modelParameterSizestringHuman-readable parameter count (e.g., “7B”, “70B”, “405B”)
maxTokensint32Maximum number of tokens the model can process
modelCapabilities[]stringModel capabilities (see Model Capabilities)
Model Format and Framework
modelFormat.namestringFormat name (e.g., “safetensors”, “onnx”, “pytorch”)
modelFormat.versionstringFormat version (e.g., “1”, “2.0”)
modelFramework.namestringFramework name (e.g., “transformers”, “onnx”, “tensorrt”)
modelFramework.versionstringFramework version (e.g., “4.36.0”, “1.14.0”)
quantizationstringQuantization scheme (see Quantization Types)
Storage Configuration
storage.storageUristringSource URI of the model
storage.pathstringLocal path where model will be stored on nodes
storage.schemaPathstringPath to model schema or configuration within storage
storage.storageKeystringName of Kubernetes Secret containing storage credentials
storage.parametersmap[string]stringStorage-specific parameters (region, auth_type, etc.)
storage.nodeSelectormap[string]stringNode labels that must match for model placement
storage.nodeAffinityNodeAffinityAdvanced node selection rules
Serving Configuration
modelConfigurationRawExtensionModel-specific configuration as JSON
additionalMetadatamap[string]stringAdditional key-value metadata

Storage Backends

OME supports multiple storage backends to work with your existing infrastructure:

Cloud Object Storage

Store your models in OCI Object Storage using this URI format:

oci://n/{namespace}/b/{bucket}/o/{object_path}

Example:

storage:
  storageUri: "oci://n/mycompany/b/ai-models/o/llama/llama-3-70b/"
  path: "/raid/models/llama-3-70b-instruct"
  parameters:
    region: "us-phoenix-1"
    auth_type: "InstancePrincipal"

Hugging Face Hub

Download models directly from Hugging Face Hub:

hf://{model-id}[@{branch}]

Example:

storage:
  storageUri: "hf://meta-llama/Llama-3.3-70B-Instruct"
  path: "/models/llama-3.3-70b"
  storageKey: "huggingface-token"

Hugging Face Parameters

ParameterDescriptionExample
revisionGit revision to downloadmain, v1.0
cache_dirLocal cache directory/tmp/hf_cache

Persistent Volume Claims (PVC)

Reference models already stored in Kubernetes persistent volumes:

pvc://{pvc-name}/{sub-path}

Example:

storage:
  storageUri: "pvc://model-storage/llama-models/llama-3-70b"
  path: "/local/models/llama-3-70b"

Vendor Storage

For proprietary or vendor-specific storage systems:

vendor://{vendor-name}/{resource-type}/{resource-path}

Example:

storage:
  storageUri: "vendor://nvidia/models/llama-70b-tensorrt"
  path: "/opt/models/llama-70b-tensorrt"

Node Selection

Control which nodes download and store your models using node selectors and affinity rules:

Simple Node Selection

storage:
  storageUri: "oci://n/mycompany/b/models/o/llama-70b/"
  nodeSelector:
    node.kubernetes.io/instance-type: "GPU.A100.4"
    models.ome.io/storage-tier: "fast-ssd"

Advanced Node Affinity

The nodeAffinity field supports standard Kubernetes node affinity with these operators:

OperatorDescription
InNode label value must be in the list
NotInNode label value must not be in the list
ExistsNode must have the label key
DoesNotExistNode must not have the label key
GtNumeric value must be greater than
LtNumeric value must be less than

Example:

storage:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: "node.kubernetes.io/instance-type"
          operator: In
          values: ["GPU.A100.4", "GPU.H100.8"]
        - key: "models.ome.io/available-storage"
          operator: Gt
          values: ["500Gi"]

Automatic Model Discovery

OME automatically analyzes your models to extract important metadata. When the Model Agent downloads a model, it looks for a config.json file and uses specialized parsers for different model architectures.

Supported Model Types

OME currently supports automatic parsing for:

  • Llama Family Models (Llama 3, 3.1, 3.2, and 4)
  • DeepSeek Models (including DeepSeek V3 with MoE architecture)
  • Mistral and Mixtral (standard and mixture-of-experts models)
  • Microsoft Phi Models (including Phi-3 Vision for multimodal)
  • Qwen Models (Qwen2 family)
  • Multimodal Models (MLlama Vision models)

What Gets Detected

The system automatically determines:

  • Model Type: Architecture family (e.g., “llama”, “mistral”)
  • Model Architecture: Specific implementation (e.g., “LlamaForCausalLM”)
  • Parameter Count: Total number of parameters
  • Context Length: Maximum input context length
  • Framework Information: AI framework and version
  • Data Type: Model precision (float16, bfloat16, etc.)
  • Capabilities: What the model can do (text generation, embeddings, vision)

Quantization Types

Valid values for quantization:

TypeDescription
fp88-bit floating point quantization
fbgemm_fp8Facebook GEMM FP8 quantization
int44-bit integer quantization

Disabling Automatic Parsing

If you need to specify model information manually:

apiVersion: ome.io/v1beta1
kind: BaseModel
metadata:
  name: custom-model
  annotations:
    ome.io/skip-config-parsing: "true"
spec:
  modelType: "custom"
  modelArchitecture: "CustomForCausalLM"
  modelParameterSize: "70B"
  maxTokens: 4096
  modelCapabilities:
    - TEXT_GENERATION

Model Status and Lifecycle

Model States

Each model on each node goes through these states:

  • Ready: Successfully downloaded and available for use
  • Updating: Currently being downloaded or updated
  • Failed: Download or validation failed
  • Deleted: Removed from the node

Status Fields

The BaseModel status contains these fields:

FieldTypeDescription
statestringOverall model state (Creating, Ready, Failed)
lifecyclestringLifecycle stage of the model
nodesReady[]stringList of nodes where model is ready
nodesFailed[]stringList of nodes where model failed

Example status:

status:
  state: Ready
  lifecycle: Ready
  nodesReady:
    - worker-node-1
    - worker-node-2
  nodesFailed: []

Checking Model Status

View model status across your cluster:

# Check all models
kubectl get clusterbasemodels

# Check model status on specific nodes
kubectl get configmaps -n ome -l models.ome/basemodel-status=true

# Find nodes with a specific model ready
kubectl get nodes -l "models.ome/model-uid=Ready"

Authentication

OCI Authentication Methods

  • Instance Principal: Uses the compute instance’s identity (recommended for OCI)
  • User Principal: Uses specific user credentials stored in secrets
  • Resource Principal: For OKE with resource principals
  • OKE Workload Identity: Service account-based authentication

Hugging Face Authentication

For private or gated models, provide an access token:

# Create a secret with your Hugging Face token
apiVersion: v1
kind: Secret
metadata:
  name: hf-token
data:
  token: <base64-encoded-token>

Using Custom Secret Key Names

By default, the Model Agent looks for a key named “token” in your secret. However, you can specify a custom key name using the secretKey parameter:

# Create a secret with a custom key name
apiVersion: v1
kind: Secret
metadata:
  name: hf-credentials
data:
  access-token: <base64-encoded-token>
  
---
# Reference it in your BaseModel
apiVersion: ome.io/v1beta1
kind: BaseModel
metadata:
  name: private-model
spec:
  storage:
    storageUri: "hf://my-org/private-model"
    storageKey: "hf-credentials"
    parameters:
      secretKey: "access-token"  # Specify the custom key name

This is useful when:

  • You have existing secrets with different key names
  • You’re following specific naming conventions in your organization
  • You need to store multiple tokens in the same secret

Complete Configuration Example

Here’s a comprehensive BaseModel configuration showing all available options:

apiVersion: ome.io/v1beta1
kind: ClusterBaseModel
metadata:
  name: llama-3-70b-instruct
  labels:
    vendor: "meta"
    model-family: "llama"
    parameter-size: "70b"
  annotations:
    ome.io/skip-config-parsing: "false"
spec:
  # Basic model information
  vendor: "meta"
  version: "3.1"
  disabled: false
  displayName: "Llama 3.1 70B Instruct"
  
  # Model identification
  modelType: "llama"
  modelArchitecture: "LlamaForCausalLM"
  modelParameterSize: "70B"
  maxTokens: 8192
  modelCapabilities:
    - text-to-text
  
  # Model format and framework
  modelFormat:
    name: "safetensors"
    version: "1.0.0"
  modelFramework:
    name: "transformers"
    version: "4.36.0"
  quantization: "fp8"
  
  # Storage configuration
  storage:
    storageUri: "oci://n/ai-models/b/llm-store/o/meta/llama-3.1-70b-instruct/"
    path: "/raid/models/llama-3.1-70b-instruct"
    schemaPath: "config.json"
    storageKey: "oci-model-credentials"
    
    parameters:
      region: "us-phoenix-1"
      auth_type: "InstancePrincipal"
    
    # Target appropriate hardware
    nodeSelector:
      node.kubernetes.io/instance-type: "GPU.A100.4"
      models.ome.io/storage-tier: "nvme"
    
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: "accelerator.nvidia.com/gpu-product"
            operator: In
            values: ["A100-SXM4-80GB", "H100-SXM5-80GB"]
          - key: "models.ome.io/available-storage"
            operator: Gt
            values: ["200Gi"]
  
  # Model-specific configuration
  modelConfiguration: |
    {
      "temperature": 0.7,
      "top_p": 0.9,
      "max_new_tokens": 2048
    }
  
  # Additional metadata
  additionalMetadata:
    license: "Llama 3.1 Community License"
    description: "Meta Llama 3.1 70B Instruct model for chat and instruction following"
    use_cases: "chat,assistant,instruction_following"
    cost_center: "ai-research"
    owner: "ml-platform-team"

Fine-Tuned Models

FineTunedWeight Specification

FineTunedWeight resources reference BaseModels and add fine-tuning specific configuration:

apiVersion: ome.io/v1beta1
kind: FineTunedWeight
metadata:
  name: llama-70b-finance-lora
spec:
  # Reference to base model
  baseModelRef:
    name: llama-3-70b-instruct
    namespace: default
  
  # Fine-tuning configuration
  modelType: LoRA
  hyperParameters: |
    {
      "lora_rank": 16,
      "lora_alpha": 32,
      "learning_rate": 1e-4
    }
  
  # Storage for fine-tuned weights
  storage:
    storageUri: oci://n/mycompany/b/fine-tuned/o/llama-70b-finance-lora/
    path: /raid/fine-tuned/llama-70b-finance-lora
  
  # Training job reference
  trainingJobRef:
    name: llama-finance-training-job
    namespace: training

FineTunedWeight Spec Attributes

AttributeTypeDescription
baseModelRefObjectReferenceReference to the base model
modelTypestringFine-tuning method (e.g., “LoRA”, “Adapter”)
hyperParametersRawExtensionFine-tuning hyperparameters as JSON
configurationRawExtensionAdditional configuration as JSON
storageStorageSpecStorage configuration for fine-tuned weights
trainingJobRefObjectReferenceReference to the training job

Best Practices

Model Organization

  • Use consistent naming conventions including version and parameter size
  • Use labels to organize models by team, use case, or model family
  • Use ClusterBaseModels for widely-used models, BaseModels for team-specific models

Resource Planning

  • Ensure nodes have sufficient storage (large models can be 100GB+)
  • Use node affinity to target appropriate hardware
  • Consider using fast local storage (NVMe SSDs) for model paths

Security

  • Store all credentials in Kubernetes Secrets
  • Use workload identity or instance principals when possible
  • Implement appropriate RBAC for model resource management

Labels and Annotations

  • Use labels for filtering and organization
  • Use annotations for metadata that doesn’t affect selection
  • Consider using ome.io/skip-config-parsing for custom models

Storage Configuration

  • Use appropriate storage backends for your infrastructure
  • Configure node selectors to target appropriate hardware
  • Set reasonable storage paths with sufficient capacity

Using Models in InferenceServices

Once your BaseModel is ready, reference it in an InferenceService:

apiVersion: ome.io/v1beta1
kind: InferenceService
metadata:
  name: llama-chat
spec:
  model:
    name: llama-3-70b-instruct
  engine:
    minReplicas: 1
    maxReplicas: 3

Next Steps

For detailed technical and operational information, see the Administration section.