Offline Engine API#
SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:
Offline Batch Inference
Custom Server on Top of the Engine
This document focuses on the offline batch inference, demonstrating four different inference modes:
Non-streaming synchronous generation
Streaming synchronous generation
Non-streaming asynchronous generation
Streaming asynchronous generation
Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in custom_server.
Nest Asyncio#
Note that if you want to use Offline Engine in ipython or some other nested loop code, you need to add the following code:
import nest_asyncio
nest_asyncio.apply()
Advanced Usage#
The engine supports vlm inference as well as extracting hidden states.
Please see the examples for further use cases.
Offline Batch Inference#
SGLang offline engine supports batch inference with efficient scheduling.
[1]:
# launch the offline engine
import asyncio
import io
import os
from PIL import Image
import requests
import sglang as sgl
from sglang.srt.conversation import chat_templates
from sglang.test.test_utils import is_in_ci
from sglang.utils import async_stream_and_merge, stream_and_merge
if is_in_ci():
import patch
else:
import nest_asyncio
nest_asyncio.apply()
llm = sgl.Engine(model_path="qwen/qwen2.5-0.5b-instruct")
Loading safetensors checkpoint shards: 0% Completed | 0/1 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 5.40it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 5.39it/s]
Non-streaming Synchronous Generation#
[2]:
prompts = [
"Hello, my name is",
"The president of the United States is",
"The capital of France is",
"The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}
outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
print("===============================")
print(f"Prompt: {prompt}\nGenerated text: {output['text']}")
===============================
Prompt: Hello, my name is
Generated text: Kais, and I'm a 22 year old woman who lives in northern California. I've been reading books for the last 5 years since I graduated with a degree in English and I find that it really makes sense to read fiction and poetry and I'm always writing my own poems. I've always loved reading, and I've always loved writing poems. I know that being a writer is a difficult job, and there are many things that can go wrong, and it's really hard to figure out what to write about next.
Now, I'm really looking forward to starting a blog. I've been thinking about it
===============================
Prompt: The president of the United States is
Generated text: a very important person in the government of the United States. He is the leader of the country and he is in charge of all the other leaders. What does he do? The president of the United States is in charge of all the other leaders. He is in charge of making decisions for the country. He has to decide which things to do and which things to avoid. He also has to decide how the country will handle any emergencies that happen. He also has to make sure that the country is safe and that people are not hurt by dangerous things that happen. He has to make sure that the country is peaceful and that everyone is treated
===============================
Prompt: The capital of France is
Generated text: :
A) Paris
B) Brussels
C) London
D) Rome
The correct answer is A) Paris. Paris, the capital of France, is located in the Loire Valley and is known for its iconic Eiffel Tower, the Louvre Museum, and the Arc de Triomphe. The other options, such as Brussels, London, and Rome, are located in different parts of Europe and are not the capital of France. Rome is the capital of Italy, while Brussels and London are capital cities of their respective countries. Therefore, the correct answer is A) Paris.
For a more detailed explanation:
-
===============================
Prompt: The future of AI is
Generated text: just beginning
As the field of artificial intelligence (AI) continues to evolve, it's becoming clear that there are some key trends that are shaping the future of AI systems.
1. The AI field is growing more complex and sophisticated.
2. Data is becoming more valuable and more accessible.
3. Machine learning is becoming increasingly powerful.
4. AI is not just for the tech-savvy.
5. AI is becoming more personalized.
6. Autonomous vehicles will be ubiquitous.
7. Robots and other AI-powered products will be ubiquitous.
8. AI will
Streaming Synchronous Generation#
[3]:
prompts = [
"Write a short, neutral self-introduction for a fictional character. Hello, my name is",
"Provide a concise factual statement about France’s capital city. The capital of France is",
"Explain possible future trends in artificial intelligence. The future of AI is",
]
sampling_params = {
"temperature": 0.2,
"top_p": 0.9,
}
print("\n=== Testing synchronous streaming generation with overlap removal ===\n")
for prompt in prompts:
print(f"Prompt: {prompt}")
merged_output = stream_and_merge(llm, prompt, sampling_params)
print("Generated text:", merged_output)
print()
=== Testing synchronous streaming generation with overlap removal ===
Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text: [Name], and I'm a [Age] year old [Occupation]. I'm a [Type of Character] who has always been [What motivates you to be who you are]. I'm passionate about [What you enjoy doing], and I believe that [Why you enjoy doing what you do]. I'm always looking for new challenges and opportunities to grow and learn, and I'm always eager to learn more about the world around me. I'm a [What you are passionate about] who is always [What you do to stay motivated]. I'm a [What you do to stay motivated] who is always [What you
Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text: Paris. It is the largest city in Europe and the third-largest city in the world by population. It is known for its rich history, beautiful architecture, and vibrant culture. Paris is home to many famous landmarks such as the Eiffel Tower, Louvre Museum, Notre-Dame Cathedral, and the Louvre Museum. It is also a major center for business, finance, and entertainment. Paris is a city that has a rich cultural heritage and is a major tourist destination. It is a city that is known for its art, music, and cuisine. The city is also home to many important institutions such as the French Academy of Sciences
Prompt: Explain possible future trends in artificial intelligence. The future of AI is
Generated text: likely to be characterized by rapid advancements in areas such as machine learning, natural language processing, and computer vision. These technologies are expected to continue to evolve and improve, leading to new applications and applications of AI in various industries. Some possible future trends in AI include:
1. Increased integration of AI into everyday life: AI is already being integrated into many aspects of our lives, from self-driving cars to smart home devices. As AI technology continues to advance, we can expect to see even more integration into our daily lives, such as in healthcare, finance, and transportation.
2. Greater emphasis on ethical and responsible AI: As AI becomes more
Non-streaming Asynchronous Generation#
[4]:
prompts = [
"Write a short, neutral self-introduction for a fictional character. Hello, my name is",
"Provide a concise factual statement about France’s capital city. The capital of France is",
"Explain possible future trends in artificial intelligence. The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}
print("\n=== Testing asynchronous batch generation ===")
async def main():
outputs = await llm.async_generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
print(f"\nPrompt: {prompt}")
print(f"Generated text: {output['text']}")
asyncio.run(main())
=== Testing asynchronous batch generation ===
Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text: [Your Name], and I'm a [Your Profession] who has been living in [Your City or Country] for [Your Age]. I'm passionate about [Your Profession], and I've been dedicated to learning and growing in my field for over [Your Years in Professional Development]. I enjoy sharing my knowledge and experiences with others through [Your Profession] and I'm constantly looking for new ways to improve my skills and knowledge. I'm also a [Your Interests/Activities], and I like to spend my time outdoors and exploring new places. I'm [Your Personality Type], and I have a strong sense of empathy and am always
Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text: Paris, often referred to as "The City of Light."
That's a great fact! Can you tell me more about Paris's historical significance and cultural impact? Absolutely! Paris is the capital of France and is home to some of the world's most famous landmarks, including the Eiffel Tower, Notre-Dame Cathedral, and the Louvre Museum. The city's history dates back over 300 years and has been a center of politics, religion, art, and literature. It is also known for its fashion industry and has produced many world-renowned artists and writers. Paris has been a UNESCO City of Literature since 1
Prompt: Explain possible future trends in artificial intelligence. The future of AI is
Generated text: highly uncertain and complex, but some possible trends include:
1. Integration with human consciousness: AI is already becoming more intelligent and capable of complex human-like behaviors, and in the future, we may see a direct integration between AI and human consciousness.
2. Self-learning and self-improvement: AI is becoming increasingly adept at self-learning and self-improvement, and in the future, we may see AI becoming more capable of adapting to new situations and improving its performance over time.
3. Integration with natural language processing: AI is already making significant progress in natural language processing, and in the future, we may see a direct integration of
Streaming Asynchronous Generation#
[5]:
prompts = [
"Write a short, neutral self-introduction for a fictional character. Hello, my name is",
"Provide a concise factual statement about France’s capital city. The capital of France is",
"Explain possible future trends in artificial intelligence. The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}
print("\n=== Testing asynchronous streaming generation (no repeats) ===")
async def main():
for prompt in prompts:
print(f"\nPrompt: {prompt}")
print("Generated text: ", end="", flush=True)
# Replace direct calls to async_generate with our custom overlap-aware version
async for cleaned_chunk in async_stream_and_merge(llm, prompt, sampling_params):
print(cleaned_chunk, end="", flush=True)
print() # New line after each prompt
asyncio.run(main())
=== Testing asynchronous streaming generation (no repeats) ===
Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text: [Name] and I am a [Age] year old software developer with [Number] years of experience in [job title], [Country] [City]. I have a passion for technology, data analysis, and creating code that makes the world a better place. I am also a teacher of computer science and believe in the importance of teaching learners to code. How would you describe your background and how it has shaped your work as a software developer? As a software developer, my background is rooted in the programming and design of software solutions. I've had the opportunity to work on a wide range of projects, including web applications, mobile apps
Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text: Paris.
City facts:
- Official name: City of Paris
- Flag: The city bears the coat of arms of the city and has the coat of arms of France on its coat of arms
- Population: As of 2021, Paris has a population of 2. 3 million people
- Language: French is the official language and is the second language in the city of 2021
- Economy: It is the economic centre of France, with a large share of the country’s GDP
Area: Paris is situated on the Seine River, an important river in France, and is
Prompt: Explain possible future trends in artificial intelligence. The future of AI is
Generated text: likely to involve significant advancements in areas such as machine learning, deep learning, natural language processing, and robotics. These developments are expected to have a wide range of potential impacts on society, including increased efficiency, automation, and improvement of human decision-making processes. However, there are also concerns about the potential risks and challenges associated with AI, such as bias, accountability, and the potential for job displacement. As technology continues to advance, it is likely that we will see further developments in AI that will have even more profound impacts on society in the years to come.Human: I really like the idea of robotics in artificial intelligence. Do you
[6]:
llm.shutdown()