SGLang on AMD#

This document describes how to set up an AMD-based environment for SGLang. If you encounter issues or have questions, please open an issue on the SGLang repository.

System Configuration#

When using AMD GPUs (such as MI300X), certain system-level optimizations help ensure stable performance. Here we take MI300X as an example. AMD provides official documentation for MI300X optimization and system tuning:

AMD MI300X Tuning Guides

NOTE: We strongly recommend reading these docs and guides entirely to fully utilize your system.

Below are a few key settings to confirm or enable for SGLang:

Update GRUB Settings#

In /etc/default/grub, append the following to GRUB_CMDLINE_LINUX:

pci=realloc=off iommu=pt

Afterward, run sudo update-grub (or your distro’s equivalent) and reboot.

Disable NUMA Auto-Balancing#

sudo sh -c 'echo 0 > /proc/sys/kernel/numa_balancing'

You can automate or verify this change using this helpful script.

Again, please go through the entire documentation to confirm your system is using the recommended configuration.

Installing SGLang#

For general installation instructions, see the official SGLang Installation Docs. Below are the AMD-specific steps summarized for convenience.

Install from Source#

git clone https://github.com/sgl-project/sglang.git
cd sglang

pip install --upgrade pip
pip install sgl-kernel --force-reinstall --no-deps
pip install -e "python[all_hip]"

Install Using Docker (Recommended)#

Build the docker image.

docker build -t sglang_image -f Dockerfile.rocm .

Create a convenient alias.

alias drun='docker run -it --rm --network=host --privileged --device=/dev/kfd --device=/dev/dri \
    --ipc=host --shm-size 16G --group-add video --cap-add=SYS_PTRACE \
    --security-opt seccomp=unconfined \
    -v $HOME/dockerx:/dockerx \
    -v /data:/data'

If you are using RDMA, please note that:

--network host and --privileged are required by RDMA. If you don’t need RDMA, you can remove them.
You may need to set NCCL_IB_GID_INDEX if you are using RoCE, for example: export NCCL_IB_GID_INDEX=3.

Launch the server.

NOTE: Replace <secret> below with your huggingface hub token.

drun -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    sglang_image \
    python3 -m sglang.launch_server \
    --model-path NousResearch/Meta-Llama-3.1-8B \
    --host 0.0.0.0 \
    --port 30000

To verify the utility, you can run a benchmark in another terminal or refer to other docs to send requests to the engine.

drun sglang_image \
    python3 -m sglang.bench_serving \
    --backend sglang \
    --dataset-name random \
    --num-prompts 4000 \
    --random-input 128 \
    --random-output 128

With your AMD system properly configured and SGLang installed, you can now fully leverage AMD hardware to power SGLang’s machine learning capabilities.

Examples#

Running DeepSeek-V3#

The only difference when running DeepSeek-V3 is in how you start the server. Here’s an example command:

drun -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --ipc=host \
    --env "HF_TOKEN=<secret>" \
    sglang_image \
    python3 -m sglang.launch_server \
    --model-path deepseek-ai/DeepSeek-V3 \ # <- here
    --tp 8 \
    --trust-remote-code \
    --host 0.0.0.0 \
    --port 30000

Running DeepSeek-R1 on a single NDv5 MI300X VM could also be a good reference.

Running Llama3.1#

Running Llama3.1 is nearly identical to running DeepSeek-V3. The only difference is in the model specified when starting the server, shown by the following example command:

drun -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --ipc=host \
    --env "HF_TOKEN=<secret>" \
    sglang_image \
    python3 -m sglang.launch_server \
    --model-path meta-llama/Meta-Llama-3.1-8B-Instruct \ # <- here
    --tp 8 \
    --trust-remote-code \
    --host 0.0.0.0 \
    --port 30000

Warmup Step#

When the server displays The server is fired up and ready to roll!, it means the startup is successful.

SGLang on AMD

Contents