Offline Engine API#
SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:
Offline Batch Inference
Custom Server on Top of the Engine
This document focuses on the offline batch inference, demonstrating four different inference modes:
Non-streaming synchronous generation
Streaming synchronous generation
Non-streaming asynchronous generation
Streaming asynchronous generation
Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in custom_server.
Nest Asyncio#
Note that if you want to use Offline Engine in ipython or some other nested loop code, you need to add the following code:
import nest_asyncio
nest_asyncio.apply()
Advanced Usage#
The engine supports vlm inference as well as extracting hidden states.
Please see the examples for further use cases.
Offline Batch Inference#
SGLang offline engine supports batch inference with efficient scheduling.
[1]:
# launch the offline engine
import asyncio
import io
import os
from PIL import Image
import requests
import sglang as sgl
from sglang.srt.conversation import chat_templates
from sglang.test.test_utils import is_in_ci
from sglang.utils import async_stream_and_merge, stream_and_merge
if is_in_ci():
import patch
else:
import nest_asyncio
nest_asyncio.apply()
llm = sgl.Engine(model_path="qwen/qwen2.5-0.5b-instruct")
Loading safetensors checkpoint shards: 0% Completed | 0/1 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 4.76it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 4.76it/s]
Non-streaming Synchronous Generation#
[2]:
prompts = [
"Hello, my name is",
"The president of the United States is",
"The capital of France is",
"The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}
outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
print("===============================")
print(f"Prompt: {prompt}\nGenerated text: {output['text']}")
===============================
Prompt: Hello, my name is
Generated text: Alex. I am currently in Year 7 and I have been doing well so far. I have been in the classroom for about 4 hours a day and enjoy my subjects very much. I have always loved sports, and I have been playing basketball since I was 10. I have also been really good at my PE lessons and have won lots of trophies for it. I also have an interest in painting and drawing, and I try to use my imagination when I paint. I also love playing with my dogs and I love spending time with them. I have a good friend named Tommy, and we have been best friends since we
===============================
Prompt: The president of the United States is
Generated text: visiting a small town in need. The president is known to love all animals, and they have decided to bring food to the town. They plan to bring 100 pounds of food, which is equivalent to 2000 ounces. However, the town is in a severe food shortage, and they need to find a way to distribute the food more efficiently. They decide to offer the following pricing strategy: 500 ounces of food for a dollar. If the president brings a certain number of pounds of food, they will receive 2000 ounces of food for a dollar.
Given that the president brings
===============================
Prompt: The capital of France is
Generated text: __________. A) Paris B) Nancy C) Lille D) Montpellier
A: Paris C: Lille
To determine the capital of France, let's consider the options given:
A) Paris - This is the capital of France.
B) Nancy - This is a city in France, not the capital.
C) Lille - This is a city in France, not the capital.
D) Montpellier - This is a city in France, not the capital.
Based on this reasoning, the correct answer is:
A) Paris
Therefore, the capital of France is Paris. The answer is C. Lille
===============================
Prompt: The future of AI is
Generated text: exciting, but it also requires a change in the way we think about AI.
We started with a conversation about how the future of AI will look. We asked the audience what they thought about it, and our 50+ responses ranged from excited about the future to fearful of the future.
We then discussed the future of AI in more detail, and it was clear that there are a number of different paths that AI can take.
A lot of the people we talked to thought that AI will be used to solve problems in a very specific way. They thought it would be a tool to help people solve problems in a way that’s more
Streaming Synchronous Generation#
[3]:
prompts = [
"Write a short, neutral self-introduction for a fictional character. Hello, my name is",
"Provide a concise factual statement about France’s capital city. The capital of France is",
"Explain possible future trends in artificial intelligence. The future of AI is",
]
sampling_params = {
"temperature": 0.2,
"top_p": 0.9,
}
print("\n=== Testing synchronous streaming generation with overlap removal ===\n")
for prompt in prompts:
print(f"Prompt: {prompt}")
merged_output = stream_and_merge(llm, prompt, sampling_params)
print("Generated text:", merged_output)
print()
=== Testing synchronous streaming generation with overlap removal ===
Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text: [Name], and I am a [job title] at [company name]. I have been working at [company name] for [number of years] years, and I have always been passionate about [job title] and have always wanted to [job title] at [company name]. I am always looking for new challenges and opportunities to grow and learn, and I am always eager to learn more about [job title] and the company. I am excited to be a part of [company name] and contribute to its success. Thank you for considering me for the position. [Name] [Job Title] [Company Name] [
Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text: Paris. It is the largest city in France and the second-largest city in the European Union. Paris is known for its rich history, beautiful architecture, and vibrant culture. It is home to many famous landmarks such as the Eiffel Tower, Louvre Museum, and Notre-Dame Cathedral. Paris is also a major center for business, finance, and tourism. It is a popular destination for tourists and locals alike. The city is home to many cultural institutions, including the Louvre Museum, the Musée d'Orsay, and the Centre Pompidou. Paris is a city of contrasts, with its modern architecture and historical landmarks
Prompt: Explain possible future trends in artificial intelligence. The future of AI is
Generated text: likely to be characterized by a number of trends that are expected to shape the development of the technology in the coming years. Here are some of the most likely trends:
1. Increased focus on ethical considerations: As AI becomes more integrated into our daily lives, there will be a growing emphasis on ethical considerations. This includes issues such as bias, privacy, and transparency. AI developers will need to be more mindful of the potential impact of their technology on society and work to ensure that it is used in a way that is fair and responsible.
2. Greater integration with human decision-making: AI is likely to become more integrated with human decision-making in
Non-streaming Asynchronous Generation#
[4]:
prompts = [
"Write a short, neutral self-introduction for a fictional character. Hello, my name is",
"Provide a concise factual statement about France’s capital city. The capital of France is",
"Explain possible future trends in artificial intelligence. The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}
print("\n=== Testing asynchronous batch generation ===")
async def main():
outputs = await llm.async_generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
print(f"\nPrompt: {prompt}")
print(f"Generated text: {output['text']}")
asyncio.run(main())
=== Testing asynchronous batch generation ===
Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text: [Name] and I am a [profession] with [number] years of experience. I bring a unique blend of [attractive traits or skills] to every project I work on. I am always up for challenges and I am always ready to learn and improve. I am confident in my abilities and I am committed to helping others grow and succeed. What can you tell me about yourself? [Name], I am always up for challenges and ready to learn and improve. My unique blend of [attractive traits or skills] has always made me stand out in my field. I bring a unique perspective to every project I work on and
Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text: Paris, the city known for its historic architecture, fashion, and French cuisine. Its population is around 1,160,000, and it is the country's largest city and the seat of government. Paris is often referred to as the "City of Light" and is a UNESCO World Heritage site. The city has a rich history, including the influence of the Roman, French, and Arab civilizations. It is a popular tourist destination and has many well-known landmarks such as the Eiffel Tower, Notre-Dame Cathedral, and the Louvre Museum. Paris is also known for its fashion industry, which is a
Prompt: Explain possible future trends in artificial intelligence. The future of AI is
Generated text: rapidly evolving and there are several trends that are likely to shape its direction. Some possible future trends include:
1. Increased automation: As AI continues to advance, we are likely to see more automation in various industries, such as manufacturing, healthcare, and finance, leading to more efficient processes and reduced human error.
2. Artificial intelligence in healthcare: AI is expected to play a larger role in the healthcare industry in the coming years, with more personalized treatments and diagnoses becoming possible with the help of AI algorithms.
3. AI in customer service: AI is already being used in customer service to automate repetitive tasks and provide personalized assistance. As technology continues
Streaming Asynchronous Generation#
[5]:
prompts = [
"Write a short, neutral self-introduction for a fictional character. Hello, my name is",
"Provide a concise factual statement about France’s capital city. The capital of France is",
"Explain possible future trends in artificial intelligence. The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}
print("\n=== Testing asynchronous streaming generation (no repeats) ===")
async def main():
for prompt in prompts:
print(f"\nPrompt: {prompt}")
print("Generated text: ", end="", flush=True)
# Replace direct calls to async_generate with our custom overlap-aware version
async for cleaned_chunk in async_stream_and_merge(llm, prompt, sampling_params):
print(cleaned_chunk, end="", flush=True)
print() # New line after each prompt
asyncio.run(main())
=== Testing asynchronous streaming generation (no repeats) ===
Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text: [Name] and I'm a/an [Occupation]. I'm a/an [Age] year old, [Gender] and [Occupation] [Job Title]. I have [X] years of experience in [Occupation] [Role]. I have a [X] degree in [Field of Study]. I have a [X] year of experience in [Field of Study]. I have [X] years of experience in [Field of Study]. I have [X] years of experience in [Field of Study]. I am currently [Job Title], [Role], and I am looking forward to [Job Title], [Role
Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text: Paris.
How would one describe the cultural significance of Paris? Paris is a cultural hub with many famous attractions and landmarks such as the Eiffel Tower, Louvre Museum, Notre Dame Cathedral, and Champs-Élysées. Paris is also known for its vibrant nightlife, art scene, and love of French cuisine.
What are some of the unique features of Paris that make it a must-visit destination for visitors? Paris has many unique features that make it a must-visit destination for visitors. Here are a few:
1. Eiffel Tower: One of the most famous landmarks in the world, the Eiffel Tower
Prompt: Explain possible future trends in artificial intelligence. The future of AI is
Generated text: expected to be marked by rapid advancements in the areas of machine learning, deep learning, natural language processing, computer vision, robotics, and autonomous systems. Here are some possible future trends in AI:
1. Increased personalization and customization: AI is expected to become even more personalized and customized in the future, allowing machines to learn from user data and provide better and more accurate results.
2. Increased transparency and explainability: As AI systems become more sophisticated, there will be a greater emphasis on transparency and explainability. This will help to reduce bias and improve trust with AI-powered systems.
3. More advanced ethical guidelines and regulations: AI systems
[6]:
llm.shutdown()