Performance Visualization & Plotting¶
Quick Start¶
You can check out genai-bench plot --help
to find how to generate a 2x4 Plot containing:
- Output Inference Speed (tokens/s) vs Output Throughput of Server (tokens/s)
- TTFT (s) vs Output Throughput of Server (tokens/s)
- Mean E2E Latency (s) per Request vs RPS
- Error Rates by HTTP Status vs Concurrency
- Output Inference Speed per Request (tokens/s) vs Total Throughput (Input + Output) of Server (tokens/s)
- TTFT (s) vs Total Throughput (Input + Output) of Server (tokens/s)
- P90 E2E Latency (s) per Request vs RPS
- P99 E2E Latency (s) per Request vs RPS
Note: TTFT plots automatically use logarithmic scale for better visualization of the wide range of values. You can override this by specifying "y_scale": "linear"
in custom plot configurations.
Advanced Plot Configuration¶
This section provides comprehensive configuration examples and templates for customizing GenAI Bench's flexible plotting system to meet your specific analysis needs.
Usage¶
Use these configurations with the genai-bench plot
command:
# Use a custom configuration file
genai-bench plot --experiments-folder /path/to/experiments \
--group-key traffic_scenario \
--plot-config examples/plot_configs/custom_2x2.json
# Use a built-in preset for multiple scenarios
genai-bench plot --experiments-folder /path/to/experiments \
--group-key traffic_scenario \
--preset simple_2x2
# Use multi-line preset for single scenario analysis
genai-bench plot --experiments-folder /path/to/experiments \
--group-key none \
--preset single_scenario_analysis
# List available fields with actual data from your experiment
genai-bench plot --experiments-folder /path/to/experiments \
--group-key traffic_scenario \
--list-fields
# Validate a configuration without generating plots
genai-bench plot --experiments-folder /path/to/experiments \
--group-key traffic_scenario \
--plot-config examples/plot_configs/custom_2x2.json \
--validate-only
Available Configurations¶
custom_2x2.json¶
A simple 2x2 grid layout focusing on key performance metrics: - Throughput vs Mean Latency - RPS vs P99 Latency - Concurrency vs TTFT - Error Rate Analysis
performance_focused.json¶
A comprehensive 2x3 grid for detailed performance analysis: - Token generation speed analysis - Time to first token trends - Latency percentiles - Token efficiency scatter plot - Request success rates - Throughput scaling
multi_line_latency.json¶
Demonstrates multi-line plotting capabilities with a 2x2 layout: - Latency Percentiles Comparison: Multiple latency percentiles (mean, P90, P99) on one plot - TTFT Performance Analysis: Mean and P95 TTFT comparison - Token Processing Speed: Output speed vs input throughput comparison - Request Success Metrics: Single-line error rate plot
comprehensive_multi_line.json¶
Advanced multi-line example with 1x3 layout showcasing complex comparisons: - E2E Latency Distribution: All percentiles (P25, P50, P75, P90, P99) with custom colors - Throughput Components: Input, output, and total throughput comparison - Token Statistics: Input, output, and total token counts as scatter plot
Configuration Format¶
Plot configurations use the following JSON schema:
Single-Line Plots¶
{
"layout": {
"rows": 2,
"cols": 2,
"figsize": [16, 12] // Optional: [width, height] in inches
},
"plots": [
{
"title": "Plot Title",
"x_field": "field.path.from.AggregatedMetrics",
"y_field": "another.field.path", // Single field
"x_label": "Custom X Label", // Optional
"y_label": "Custom Y Label", // Optional
"plot_type": "line", // line, scatter, or bar
"position": [0, 0] // [row, col] in grid
}
]
}
Multi-Line Plots¶
{
"plots": [
{
"title": "Multi-Line Comparison",
"x_field": "requests_per_second",
"y_fields": [ // Multiple fields on same plot
{
"field": "stats.e2e_latency.mean",
"label": "Mean Latency", // Optional custom label
"color": "blue", // Optional custom color
"linestyle": "-" // Optional: '-', '--', '-.', ':'
},
{
"field": "stats.e2e_latency.p90",
"label": "P90 Latency",
"color": "red",
"linestyle": "--"
}
],
"x_label": "RPS",
"y_label": "Latency (s)",
"plot_type": "line",
"position": [0, 0]
}
]
}
Key Features¶
- Single vs Multi-Line: Use
y_field
for single line,y_fields
for multiple lines - Custom Styling: Each line can have custom color, linestyle, and label
- Flexible Layout: Any NxM grid layout from 1x1 to 5x6
- Plot Types:
line
,scatter
,bar
(multi-line bar creates grouped bars) - Automatic Legends: Multi-line plots automatically generate legends
When to Use Multi-Line Plots¶
✅ GOOD Use Cases:
- Single scenario analysis: Use --group-key ""
(empty string) for one traffic scenario
- Deep metric comparison: Comparing mean, P90, P99 latency on same plot
- Performance analysis: Related metrics on the same scale
❌ AVOID Multi-Line Plots When:
- Multiple scenarios: --group-key traffic_scenario
with multiple scenarios
- Multiple server versions: --group-key server_version
- Any grouping: Multi-line + grouping creates cluttered, hard-to-read plots
The system will automatically convert multi-line plots to single-line plots when it detects multiple groups/scenarios for better visualization.
Usage Patterns¶
# ✅ GOOD: Multi-line for single scenario analysis
genai-bench plot --preset single_scenario_analysis --group-key ""
# ✅ GOOD: Single-line for multiple scenarios
genai-bench plot --preset 2x4_default --group-key traffic_scenario
# ⚠️ AUTO-CONVERTED: Multi-line + grouping → single-line
genai-bench plot --preset multi_line_latency --group-key traffic_scenario
Available Fields¶
Run genai-bench plot --experiments-folder /path/to/experiments --group-key traffic_scenario --list-fields
to see all available field paths with actual data from your experiments.
Common field paths include:
Direct Metrics¶
num_concurrency
- Concurrency levelrequests_per_second
- RPSerror_rate
- Error ratemean_output_throughput_tokens_per_s
- Server output throughputmean_total_tokens_throughput_tokens_per_s
- Total throughputrun_duration
- Duration of the run
Statistical Fields¶
Access statistics using stats.{metric}.{statistic}
:
Metrics: ttft, tpot, e2e_latency, output_latency, output_inference_speed, num_input_tokens, num_output_tokens, total_tokens, input_throughput, output_throughput
Statistics: min, max, mean, stddev, sum, p25, p50, p75, p90, p95, p99
Examples:
- stats.ttft.mean
- Mean time to first token
- stats.e2e_latency.p99
- 99th percentile end-to-end latency
- stats.output_inference_speed.mean
- Mean output inference speed
Built-in Presets¶
2x4_default¶
The original 2x4 layout with all 8 standard plots. This maintains backwards compatibility with the existing plotting system.
simple_2x2¶
A simplified 2x2 layout with the most important metrics for quick analysis.
Creating Custom Configurations¶
- Start with an example configuration
- Modify the layout dimensions and plot specifications
- Use
--list-fields
to find available metrics - Use
--validate-only
to test your configuration - Generate plots with your custom config
Tips¶
- Use descriptive titles for your plots
- Choose appropriate plot types (line for trends, scatter for relationships, bar for comparisons)
- Ensure field paths are valid using
--validate-only
- Consider your audience when selecting metrics to display
- Use figsize to adjust the output image dimensions