Offline Engine API#
SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:
Offline Batch Inference
Custom Server on Top of the Engine
This document focuses on the offline batch inference, demonstrating four different inference modes:
Non-streaming synchronous generation
Streaming synchronous generation
Non-streaming asynchronous generation
Streaming asynchronous generation
Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in custom_server.
Nest Asyncio#
Note that if you want to use Offline Engine in ipython or some other nested loop code, you need to add the following code:
import nest_asyncio
nest_asyncio.apply()
Advanced Usage#
The engine supports vlm inference as well as extracting hidden states.
Please see the examples for further use cases.
Offline Batch Inference#
SGLang offline engine supports batch inference with efficient scheduling.
[1]:
# launch the offline engine
import asyncio
import io
import os
from PIL import Image
import requests
import sglang as sgl
from sglang.srt.conversation import chat_templates
from sglang.test.test_utils import is_in_ci
from sglang.utils import async_stream_and_merge, stream_and_merge
if is_in_ci():
import patch
else:
import nest_asyncio
nest_asyncio.apply()
llm = sgl.Engine(model_path="qwen/qwen2.5-0.5b-instruct")
Loading safetensors checkpoint shards: 0% Completed | 0/1 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 5.04it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 5.04it/s]
Non-streaming Synchronous Generation#
[2]:
prompts = [
"Hello, my name is",
"The president of the United States is",
"The capital of France is",
"The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}
outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
print("===============================")
print(f"Prompt: {prompt}\nGenerated text: {output['text']}")
===============================
Prompt: Hello, my name is
Generated text: Robert. My name is Robert. I am a middle school student. I like to have a good time. I like to have a good time. I like to go to the movies and listen to music. I like to watch cartoons and play computer games. I like to eat healthy food and get enough exercise. I don't like to smoke or drink. But I like to eat out. My favorite food is fish and chips. I like to do my homework and play sports with my friends. I like to eat ice cream. I like to watch the TV. I like to play with my friends. I like to play the guitar
===============================
Prompt: The president of the United States is
Generated text: 32 years younger than his daughter. The president is 43 years old. If the president decides to donate 5% of his wealth to charity, how old will his daughter be when the president's gift is completed? Let's denote the age of the president's daughter as \( x \). According to the problem, the president is 43 years old and is 32 years younger than his daughter, so we can write the equation:
\[ x = 43 + 32 \]
\[ x = 75 \]
So, the daughter is 75 years old.
Next, we need
===============================
Prompt: The capital of France is
Generated text: located in ____
A. North of the Seine
B. North of Paris
C. North of the Alps
D. North of the Rhine
Answer:
C
In the field of public relations, the principle of 'media first' is to convey the most important information through the most appropriate means, which can be considered the core of the principle of 'media first'.
A. Correct
B. Incorrect
Answer:
A
Which of the following is NOT a characteristic of the network economy? ____
A. The market is open and dynamic
B. The value and benefits of information flow freely
C. The
===============================
Prompt: The future of AI is
Generated text: here: a world without the need for human intervention
The future of AI is here: a world without the need for human intervention
While many of us are acutely aware of the remarkable and transformative impact of artificial intelligence on our lives, some of us may not realize that the development of AI has the potential to revolutionize the world.
AI is a powerful tool that can help us solve complex problems more efficiently and effectively. It can assist us in the areas of healthcare, finance, and education by automating many of our tasks. However, the development and deployment of AI also present some challenges that must be carefully managed. This includes ensuring
Streaming Synchronous Generation#
[3]:
prompts = [
"Write a short, neutral self-introduction for a fictional character. Hello, my name is",
"Provide a concise factual statement about France’s capital city. The capital of France is",
"Explain possible future trends in artificial intelligence. The future of AI is",
]
sampling_params = {
"temperature": 0.2,
"top_p": 0.9,
}
print("\n=== Testing synchronous streaming generation with overlap removal ===\n")
for prompt in prompts:
print(f"Prompt: {prompt}")
merged_output = stream_and_merge(llm, prompt, sampling_params)
print("Generated text:", merged_output)
print()
=== Testing synchronous streaming generation with overlap removal ===
Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text: [Name] and I am a [Age] year old [Occupation]. I am a [Type of Character] who has [Number of Years of Experience] years of experience in [Field of Interest]. I am passionate about [What I Love to Do]. I am [What I Like to Do]. I am [What I Like to Do]. I am [What I Like to Do]. I am [What I Like to Do]. I am [What I Like to Do]. I am [What I Like to Do]. I am [What I Like to Do]. I am [What I Like to Do]. I am [
Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text: Paris, the city known for its iconic Eiffel Tower and the Louvre Museum. It is also the seat of the French government and the country's cultural and political center. Paris is a bustling metropolis with a rich history dating back to the Roman Empire and the Middle Ages, and is known for its diverse culture, cuisine, and fashion. It is also home to many famous landmarks such as the Notre-Dame Cathedral and the Arc de Triomphe. Paris is a popular tourist destination and a major economic hub, with a thriving economy and a diverse population. The city is known for its fashion industry, art scene, and food
Prompt: Explain possible future trends in artificial intelligence. The future of AI is
Generated text: likely to be characterized by a number of trends that are expected to shape the technology's direction and impact on society. Here are some of the most likely trends:
1. Increased integration with human intelligence: As AI becomes more advanced, it is likely to become more integrated with human intelligence. This could lead to more sophisticated forms of AI that can perform tasks that are currently only possible with human intelligence, such as creative problem-solving and emotional intelligence.
2. Greater emphasis on ethical considerations: As AI becomes more advanced, there will be a greater emphasis on ethical considerations. This could lead to more stringent regulations and guidelines for AI development and deployment, as
Non-streaming Asynchronous Generation#
[4]:
prompts = [
"Write a short, neutral self-introduction for a fictional character. Hello, my name is",
"Provide a concise factual statement about France’s capital city. The capital of France is",
"Explain possible future trends in artificial intelligence. The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}
print("\n=== Testing asynchronous batch generation ===")
async def main():
outputs = await llm.async_generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
print(f"\nPrompt: {prompt}")
print(f"Generated text: {output['text']}")
asyncio.run(main())
=== Testing asynchronous batch generation ===
Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text: [Name], and I'm a [job title] with [number of years of experience] years of experience in [what you do or what you specialize in]. I'm a [type of coffee] lover who enjoys brewing and enjoying the aroma of freshly brewed coffee. I also have a passion for [food] and enjoy cooking, baking, and experimenting with new recipes. I'm a [color] person who loves to have [number of hobbies or interests] in my free time. I enjoy traveling and trying new places, and I love to write my own stories and share my thoughts and experiences with others. I'm [gender]
Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text: Paris, also known as the "City of Light," which is a French-speaking city located on the river Seine in the northwestern suburbs of the Paris region. It was founded in the 7th century, initially as a Roman colony but later conquered by the Franks in the 6th century. Paris is known for its rich cultural heritage, iconic landmarks such as Notre-Dame Cathedral, Eiffel Tower, and Louvre Museum, as well as its welcoming and lively atmosphere. It is the largest city in France by population and has a strong economy, particularly in the fashion, music, and food industries. Paris is also
Prompt: Explain possible future trends in artificial intelligence. The future of AI is
Generated text: likely to be characterized by a combination of improvements in processing power, better accuracy and more complex tasks. Some possible future trends in AI include:
1. Improved accuracy: As AI systems become more complex, they will become more accurate at performing a wide range of tasks. This includes tasks that require higher levels of detail, such as understanding human emotions or recognizing cultural differences.
2. Personalization: AI systems will become increasingly able to learn and adapt to user behavior, resulting in more personalized experiences. This could involve things like personalized recommendations for products or services, or even personalized healthcare recommendations.
3. Integration with other technologies: AI will likely become more
Streaming Asynchronous Generation#
[5]:
prompts = [
"Write a short, neutral self-introduction for a fictional character. Hello, my name is",
"Provide a concise factual statement about France’s capital city. The capital of France is",
"Explain possible future trends in artificial intelligence. The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}
print("\n=== Testing asynchronous streaming generation (no repeats) ===")
async def main():
for prompt in prompts:
print(f"\nPrompt: {prompt}")
print("Generated text: ", end="", flush=True)
# Replace direct calls to async_generate with our custom overlap-aware version
async for cleaned_chunk in async_stream_and_merge(llm, prompt, sampling_params):
print(cleaned_chunk, end="", flush=True)
print() # New line after each prompt
asyncio.run(main())
=== Testing asynchronous streaming generation (no repeats) ===
Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text: [Your Name], and I'm a [Your Age, Gender, and Occupation]. I've always had a passion for exploring the world and trying new things, and I'm always eager to learn and grow. I'm always looking for new experiences and opportunities to grow and learn, and I'm always willing to put in the time and effort to achieve my goals. I'm confident in my abilities and always aim to become a better version of myself, and I'm excited to start my journey towards self-improvement. Looking forward to meeting you! #selfintroduction #growthhappens #selfimprovement #outreach #
Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text: Paris, which is known for its iconic Eiffel Tower and iconic landmarks like the Louvre Museum and Notre-Dame Cathedral. It is the largest city in the world by population and the second-largest by area after New York City. Paris is home to over a million people and is a major cultural and tourist hub. The city is known for its romantic and historical architecture, vibrant nightlife, and cuisine. It has been influenced by various cultures and is a melting pot of diverse cultures. The city is also home to world-renowned universities, art galleries, and cultural institutions. Its identity and influence have made Paris a global center for arts,
Prompt: Explain possible future trends in artificial intelligence. The future of AI is
Generated text: exciting and constantly evolving, and there are many possible trends that could shape how we see the technology in the coming years. Here are some of the most likely future trends in AI:
1. Increased reliance on AI in healthcare: With more and more people suffering from chronic diseases, the demand for advanced AI-powered healthcare solutions is expected to grow. AI will be used to develop and improve diagnostic tools, assist in personalized medicine, and improve treatment outcomes.
2. Autonomous vehicles: As the technology behind self-driving cars becomes more advanced, we may see a shift towards fully autonomous vehicles becoming a reality. This could lead to reduced human drivers and potentially eliminate
[6]:
llm.shutdown()