How to Support New Models#

This document explains how to add support for new language models and vision‐language models (VLMs) in SGLang. It also covers how to test new models and register external implementations.

How to Support a new Language Model#

To support a new model in SGLang, you only need to add a single file under the SGLang Models Directory. You can learn from existing model implementations and create a new file for your model. For most models, you should be able to find a similar model to start with (e.g., starting from Llama). Also refer how to port a Model from vLLM to SGLang

How to Support a new Vision-Language model#

To support a new vision-language model (vLM) in SGLang, there are several key components in addition to the standard LLM support:

  1. Register your new model as multimodal: Extend is_multimodal_model in model_config.py to return True for your model.

  2. Process Images: Define a new Processor class that inherits from BaseProcessor and register this processor as your model’s dedicated processor. See multimodal_processor.py for more details.

  3. Handle Image Tokens: Implement a pad_input_ids function for your new model. In this function, image tokens in the prompt should be expanded and replaced with image-hashes so that SGLang can recognize different images when using RadixAttention.

  4. Replace Vision Attention: Replace the multi-headed Attention of ViT with SGLang’s VisionAttention.

You can refer to Qwen2VL or other vLM implementations. These models demonstrate how to correctly handle both multimodal and textual inputs.

You should test the new vLM locally against Hugging Face models. See the mmmu benchmark for an example.

Test the Correctness#

Interactive Debugging#

For interactive debugging, compare the outputs of Hugging Face/Transformers and SGLang. The following two commands should give the same text output and very similar prefill logits:

  • Get the reference output:

    python3 scripts/playground/reference_hf.py --model-path [new model] --model-type {text,vlm}
    
  • Get the SGLang output:

    python3 -m sglang.bench_one_batch --correct --model [new model]
    

Add the Model to the Test Suite#

To ensure the new model is well maintained, add it to the test suite by including it in the ALL_OTHER_MODELS list in the test_generation_models.py file, test the new model on your local machine and report the results on demonstrative benchmarks (GSM8K, MMLU, MMMU, MMMU-Pro, etc.) in your PR.

This is the command to test a new model on your local machine:

ONLY_RUN=Qwen/Qwen2-1.5B python3 -m unittest test_generation_models.TestGenerationModels.test_others

Port a Model from vLLM to SGLang#

The vLLM Models Directory is a valuable resource, as vLLM covers many models. SGLang reuses vLLM’s interface and some layers, making it easier to port models from vLLM to SGLang.

To port a model from vLLM to SGLang:

  • Compare these two files for guidance:

  • The major differences include:

    • Replace vLLM’s Attention with RadixAttention (ensure you pass layer_id to RadixAttention).

    • Replace vLLM’s LogitsProcessor with SGLang’s LogitsProcessor.

    • Replace the multi-headed Attention of ViT with SGLang’s VisionAttention.

    • Replace other vLLM layers (such as RMSNorm, SiluAndMul) with SGLang layers.

    • Remove Sample.

    • Change the forward() functions and add a forward_batch() method.

    • Add EntryClass at the end.

    • Ensure that the new implementation uses only SGLang components and does not rely on any vLLM components.

Registering an External Model Implementation#

In addition to the methods above, you can register your new model with the ModelRegistry before launching the server. This allows you to integrate your model without modifying the source code.

For example:

from sglang.srt.models.registry import ModelRegistry
from sglang.srt.entrypoints.http_server import launch_server

# For a single model, add it to the registry:
ModelRegistry.models[model_name] = model_class

# For multiple models, you can imitate the import_model_classes() function:
from functools import lru_cache

@lru_cache()
def import_new_model_classes():
    model_arch_name_to_cls = {}
    # Populate model_arch_name_to_cls with your new model classes.
    ...
    return model_arch_name_to_cls

ModelRegistry.models.update(import_new_model_classes())

# Launch the server with your server arguments:
launch_server(server_args)

By following these guidelines, you can add support for new language models and vision-language models in SGLang and ensure they are thoroughly tested and easily integrated into the system.