Skip to content

Running Benchmark Using genai-bench Container

Using Pre-built Docker Image

Pull the latest docker image:

docker pull ghcr.io/moirai-internal/genai-bench:v0.0.1

Building from Source

Alternatively, you can build the image locally from the Dockerfile:

docker build . -f Dockerfile -t genai-bench:dev

To avoid internet disruptions and network latency, it's recommended to run the benchmarking within the same network as the target inference server. You can always choose to use --network host if you prefer.

To create a bridge network in docker:

docker network create benchmark-network -d bridge

Then, start the inference server using the standard Docker command with the additional flag --network benchmark-network.

Example:

docker run -itd \
    --gpus \"device=0,1,2,3\" \
    --shm-size 10g  -v /raid/models:/models \
    --ulimit nofile=65535:65535   --network benchmark-network \
    --name sglang-v0.4.7.post1-llama4-scout-tp4 \
    lmsysorg/sglang:v0.4.7.post1-cu124 \
    python3 -m sglang.launch_server \
    --model-path=/models/meta-llama/Llama-4-Scout-17B-16E-Instruct \
    --tp 4 \
    --port=8080 \
    --host 0.0.0.0 \
    --context-length=131072

Next, start the genai-bench container with the same network flag.

Example:

First, create a dataset configuration file to properly specify the split:

llava-config.json:

{
  "source": {
    "type": "huggingface",
    "path": "lmms-lab/llava-bench-in-the-wild",
    "huggingface_kwargs": {
      "split": "train"
    }
  },
  "prompt_column": "question",
  "image_column": "image"
}

Then run the benchmark with the configuration file:

docker run \
    -tid \
    --shm-size 5g \
    --ulimit nofile=65535:65535 \
    --env HF_TOKEN="your_HF_TOKEN" \
    --network benchmark-network \
    -v /mnt/data/models:/models \
    -v $(pwd)/llava-config.json:/genai-bench/llava-config.json \
    --name llama-4-scout-benchmark \
    genai-bench:dev \
    benchmark \
    --api-backend openai \
    --api-base http://localhost:8080 \
    --api-key your_api_key \
    --api-model-name /models/meta-llama/Llama-4-Scout-17B-16E-Instruct \
    --model-tokenizer /models/meta-llama/Llama-4-Scout-17B-16E-Instruct \
    --task image-to-text \
    --max-time-per-run 10 \
    --max-requests-per-run 100 \
    --server-engine "SGLang" \
    --server-gpu-type "H100" \
    --server-version "v0.4.7.post1" \
    --server-gpu-count 4 \
    --traffic-scenario "I(512,512)" \
    --traffic-scenario "I(2048,2048)" \
    --num-concurrency 1 \
    --num-concurrency 2 \
    --num-concurrency 4 \
    --dataset-config /genai-bench/llava-config.json

Note that genai-bench is already the entrypoint of the container, so you only need to provide the command arguments afterward.

The genai-bench runtime UI should be available through:

docker logs --follow <CONTAINER_ID>

You can also utilize tmux for additional parallelism and session control.

Monitor benchmark using volume mount

To monitor benchmark interim results using the genai-bench container, you can leverage volume mounts along with the --experiment-base-dir option.

HOST_OUTPUT_DIR=$HOME/benchmark_results
CONTAINER_OUTPUT_DIR=/genai-bench/benchmark_results
docker run \
    -tid \
    --shm-size 5g \
    --ulimit nofile=65535:65535 \
    --env HF_TOKEN="your_HF_TOKEN" \
    --network benchmark-network \
    -v /mnt/data/models:/models \
    -v $HOST_OUTPUT_DIR:$CONTAINER_OUTPUT_DIR \
    -v $(pwd)/llava-config.json:/genai-bench/llava-config.json \
    --name llama-3.2-11b-benchmark \
    genai-bench:dev \
    benchmark \
    --api-backend openai \
    --api-base http://localhost:8080 \
    --api-key your_api_key \
    --api-model-name /models/meta-llama/Llama-4-Scout-17B-16E-Instruct \
    --model-tokenizer /models/meta-llama/Llama-4-Scout-17B-16E-Instruct \
    --task image-to-text \
    --max-time-per-run 10 \
    --max-requests-per-run 100 \
    --server-engine "SGLang" \
    --server-gpu-type "H100" \
    --server-version "v0.4.7.post1" \
    --server-gpu-count 4 \
    --traffic-scenario "I(512,512)" \
    --traffic-scenario "I(2048,2048)" \
    --num-concurrency 1 \
    --num-concurrency 2 \
    --num-concurrency 4 \
    --dataset-config /genai-bench/llava-config.json \
    --experiment-base-dir $CONTAINER_OUTPUT_DIR