Offline Engine API#
SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:
Offline Batch Inference
Custom Server on Top of the Engine
This document focuses on the offline batch inference, demonstrating four different inference modes:
Non-streaming synchronous generation
Streaming synchronous generation
Non-streaming asynchronous generation
Streaming asynchronous generation
Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in custom_server.
Nest Asyncio#
Note that if you want to use Offline Engine in ipython or some other nested loop code, you need to add the following code:
import nest_asyncio
nest_asyncio.apply()
Advanced Usage#
The engine supports vlm inference as well as extracting hidden states.
Please see the examples for further use cases.
Offline Batch Inference#
SGLang offline engine supports batch inference with efficient scheduling.
[1]:
# launch the offline engine
import asyncio
import io
import os
from PIL import Image
import requests
import sglang as sgl
from sglang.srt.conversation import chat_templates
from sglang.test.test_utils import is_in_ci
from sglang.utils import async_stream_and_merge, stream_and_merge
if is_in_ci():
import patch
else:
import nest_asyncio
nest_asyncio.apply()
llm = sgl.Engine(model_path="qwen/qwen2.5-0.5b-instruct")
Loading safetensors checkpoint shards: 0% Completed | 0/1 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 6.54it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 6.53it/s]
Non-streaming Synchronous Generation#
[2]:
prompts = [
"Hello, my name is",
"The president of the United States is",
"The capital of France is",
"The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}
outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
print("===============================")
print(f"Prompt: {prompt}\nGenerated text: {output['text']}")
===============================
Prompt: Hello, my name is
Generated text: Dina and I'm a 17-year-old designer living in Saskatoon. I've been designing clothing and accessories for 10 years and I've worked for companies like Smith & Wesson and Chili's. I'm passionate about fashion, and my interests have led me to pursue a career in fashion design.
I'm still working on my resume, but my LinkedIn profile shows I have experience working at both Nordstrom and P&G. I like to share my designs and thoughts on fashion and lifestyle topics, as well as work with other designers and brands.
I'm looking for a new job or internship at the moment. I
===============================
Prompt: The president of the United States is
Generated text: visiting a small country that is not a member of the United States. As the president, you are planning to attend a special event. To ensure the event is memorable, you decide to wear an outfit that is both unique and practical for the climate in that country. The country has a temperature range of -5°C to 25°C, and the dress code is to be a simple and practical outfit that can be easily dressed up or down. You have a choice of two different styles to choose from: a lightweight jacket and shorts or a dress and top.
Given this scenario, can you provide a combination of outfits that you think
===============================
Prompt: The capital of France is
Generated text: :
A. Paris
B. London
C. Tokyo
D. Madrid
A. Paris. France's capital is Paris. It is the seat of government, the cultural and political center of the country, and a major financial hub. Other cities such as London, Tokyo, and Madrid are also important cities in France, but Paris is the most famous and iconic city in France. Madrid is the capital of Spain, not France. The other cities are not capitals of France.Human Resources: It is a human resource management tool that is used to create a training programme to improve the performance of the individuals.
Job Analysis: It
===============================
Prompt: The future of AI is
Generated text: bright, but it is also risky
In my view, the future of artificial intelligence (AI) is bright, but it is also risky. We're at a crossroads. Where should we go next? Will it be to build a different, artificial world, or will it be a more human one?
I've been studying the future of AI for years. The field's current models are too simple to be helpful. They don't understand what makes a good Turing test, and they're not prepared to deal with the complexity of human decision making.
AI is here to stay, but it's evolving much too quickly. In the next
Streaming Synchronous Generation#
[3]:
prompts = [
"Write a short, neutral self-introduction for a fictional character. Hello, my name is",
"Provide a concise factual statement about France’s capital city. The capital of France is",
"Explain possible future trends in artificial intelligence. The future of AI is",
]
sampling_params = {
"temperature": 0.2,
"top_p": 0.9,
}
print("\n=== Testing synchronous streaming generation with overlap removal ===\n")
for prompt in prompts:
print(f"Prompt: {prompt}")
merged_output = stream_and_merge(llm, prompt, sampling_params)
print("Generated text:", merged_output)
print()
=== Testing synchronous streaming generation with overlap removal ===
Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text: [Name], and I'm a [job title] at [company name]. I'm excited to meet you and learn more about your career and interests. Let's chat! [Name] [Job Title] [Company Name] [Company Address] [Company Phone Number] [Company Email] [Company Website] [Name] [Job Title] [Company Name] [Company Address] [Company Phone Number] [Company Email] [Company Website] [Name] [Job Title] [Company Name] [Company Address] [Company Phone Number] [Company Email] [Company Website] [Name] [Job Title] [Company
Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text: Paris, known for its iconic landmarks such as the Eiffel Tower, Louvre Museum, and Notre-Dame Cathedral. It is also a major cultural and economic center, hosting numerous museums, theaters, and other attractions. Paris is a popular tourist destination and a major hub for international business and diplomacy. The city is known for its rich history, art, and cuisine, and is a UNESCO World Heritage site. It is also home to the French Parliament, the French Academy of Sciences, and the French National Library. Paris is a vibrant and dynamic city with a rich cultural and historical heritage. The city is also known for its fashion industry
Prompt: Explain possible future trends in artificial intelligence. The future of AI is
Generated text: likely to be characterized by a number of trends that are expected to shape the technology's direction and impact on society. Here are some of the most likely trends that could be expected in the future:
1. Increased automation and robotics: As AI technology continues to advance, we are likely to see an increase in automation and robotics in various industries. This could lead to the creation of new jobs, but it could also lead to the displacement of human workers in certain areas.
2. AI ethics and privacy concerns: As AI technology becomes more advanced, there will be increasing concerns about its ethical implications and potential privacy violations. This could lead to the development
Non-streaming Asynchronous Generation#
[4]:
prompts = [
"Write a short, neutral self-introduction for a fictional character. Hello, my name is",
"Provide a concise factual statement about France’s capital city. The capital of France is",
"Explain possible future trends in artificial intelligence. The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}
print("\n=== Testing asynchronous batch generation ===")
async def main():
outputs = await llm.async_generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
print(f"\nPrompt: {prompt}")
print(f"Generated text: {output['text']}")
asyncio.run(main())
=== Testing asynchronous batch generation ===
Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text: [Name] and I'm a [Age] year old [Nationality] [Occupation]. I am a/an [Skill/Ability] expert. I'm a creative, analytical thinker with a strong sense of humor. My favorite hobby is [Favorite Activity] and I enjoy [Best Friend's Name] the most. I love [What other hobby you have]. I'm a/an [Type of Person] who is always looking for new challenges and experiences. I am an [Company's] valued member and I enjoy [Job Title] at [Company's] [Position]. I am a/an [Person's] friend. I
Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text: Paris, located on the Seine River in the south of the country. The city is famous for its art, cuisine, and culture, including the iconic Eiffel Tower. It is home to many prestigious institutions and landmarks, including the Louvre Museum, Notre-Dame Cathedral, and the Palace of Versailles. Paris is a bustling metropolis with a diverse population and vibrant culture, which draws tourists from all over the world each year. It is also known for its iconic fashion and entertainment scenes, such as the fashion week and the famous World Cup football tournament. Overall, Paris is a fascinating and unforgettable destination for visitors.
Sum
Prompt: Explain possible future trends in artificial intelligence. The future of AI is
Generated text: likely to be shaped by several trends, including:
1. Increased accuracy and efficiency: As AI technology improves and becomes more sophisticated, it is likely to become even more accurate and efficient in its tasks. This will make it easier for businesses and organizations to use AI to automate processes and improve efficiency.
2. Integration of machine learning and natural language processing: As AI continues to grow, it is likely to become more integrated with machine learning and natural language processing technologies. This will allow AI to learn and adapt to new situations more quickly and effectively.
3. Greater emphasis on ethical considerations: As AI technology continues to advance, it is likely to become
Streaming Asynchronous Generation#
[5]:
prompts = [
"Write a short, neutral self-introduction for a fictional character. Hello, my name is",
"Provide a concise factual statement about France’s capital city. The capital of France is",
"Explain possible future trends in artificial intelligence. The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}
print("\n=== Testing asynchronous streaming generation (no repeats) ===")
async def main():
for prompt in prompts:
print(f"\nPrompt: {prompt}")
print("Generated text: ", end="", flush=True)
# Replace direct calls to async_generate with our custom overlap-aware version
async for cleaned_chunk in async_stream_and_merge(llm, prompt, sampling_params):
print(cleaned_chunk, end="", flush=True)
print() # New line after each prompt
asyncio.run(main())
=== Testing asynchronous streaming generation (no repeats) ===
Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text: [insert name] and I'm a [insert occupation]. I'm an [insert role or profession], and I enjoy [insert passion or hobby]. I like to [insert an activity or hobby], and I [insert a personality trait or characteristic]. I'm confident in my abilities and always ready to help others. I thrive on learning and adapting to new experiences, and I enjoy [insert a skill or talent]. I'm a [insert age range or general age], and I'm always eager to learn and grow. If you have any questions or need assistance, please don't hesitate to reach out. How can I assist you today?
Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text: Paris.
Paris is the cultural, economic, and political center of France and has been the capital of France since 1871. It is also the second largest city in France with an estimated population of over 2 million people. The city is known for its historical landmarks, beautiful architecture, and its role in French culture and politics. Paris is home to many notable museums, art galleries, and cultural institutions, and is also a popular tourist destination. Despite its size, Paris remains one of the most important cities in the world, and its location at the crossroads of Europe and the Mediterranean has made it a vital hub for commerce
Prompt: Explain possible future trends in artificial intelligence. The future of AI is
Generated text: set to continue to grow and evolve rapidly as new technologies emerge and meet new challenges. Here are some potential future trends in AI:
1. Automation: One of the most significant trends in AI is the increasing automation of tasks, particularly in manufacturing, transportation, and retail. AI-driven robots and automation systems will become more prevalent, reducing the need for human intervention in these industries and freeing up time for human workers to focus on higher-level tasks.
2. AI ethics: AI will continue to gain acceptance in the global marketplace, but it will also face ethical challenges. As AI systems become more sophisticated and integrated into our daily lives, there will be
[6]:
llm.shutdown()