Offline Engine API#
SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:
Offline Batch Inference
Custom Server on Top of the Engine
This document focuses on the offline batch inference, demonstrating four different inference modes:
Non-streaming synchronous generation
Streaming synchronous generation
Non-streaming asynchronous generation
Streaming asynchronous generation
Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in custom_server.
Advanced Usage#
The engine supports vlm inference as well as extracting hidden states.
Please see the examples for further use cases.
Offline Batch Inference#
SGLang offline engine supports batch inference with efficient scheduling.
[1]:
# launch the offline engine
import asyncio
import io
import os
from PIL import Image
import requests
import sglang as sgl
from sglang.srt.conversation import chat_templates
from sglang.test.test_utils import is_in_ci
from sglang.utils import async_stream_and_merge, stream_and_merge
if is_in_ci():
import patch
llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")
Loading safetensors checkpoint shards: 0% Completed | 0/4 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 25% Completed | 1/4 [00:02<00:06, 2.29s/it]
Loading safetensors checkpoint shards: 50% Completed | 2/4 [00:05<00:05, 2.61s/it]
Loading safetensors checkpoint shards: 75% Completed | 3/4 [00:07<00:02, 2.62s/it]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:08<00:00, 1.95s/it]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:08<00:00, 2.17s/it]
Non-streaming Synchronous Generation#
[2]:
prompts = [
"Hello, my name is",
"The president of the United States is",
"The capital of France is",
"The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}
outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
print("===============================")
print(f"Prompt: {prompt}\nGenerated text: {output['text']}")
===============================
Prompt: Hello, my name is
Generated text: Rachel, and I'm a bit of a foodie. I love trying new recipes, experimenting with flavors, and, of course, eating delicious food! When I'm not in the kitchen, you can find me hiking with my dog, practicing yoga, or snuggled up with a good book and a warm cup of tea.
My favorite type of food is anything and everything Italian - pasta, pizza, gelato... you name it, I love it! But, I'm also a fan of trying new cuisines and flavors, and I'm always on the lookout for the next great recipe.
In this blog, I'll be
===============================
Prompt: The president of the United States is
Generated text: on a trip to Vietnam to attend the APEC summit, which is a significant event that brings together leaders from 21 countries in the Asia-Pacific region. The President is expected to meet with various leaders to discuss trade, economic growth, and regional security issues. In the run-up to the summit, the President has made a series of speeches and statements highlighting the importance of the US-Vietnam relationship and the need for the two countries to cooperate more closely on economic and security issues.
However, the President has also faced criticism from some quarters for his policies on trade and immigration, which have been seen as hurtful to the Vietnamese-American community
===============================
Prompt: The capital of France is
Generated text: full of iconic landmarks, charming neighborhoods, and world-class museums. But, as with any large city, there are also areas that are best avoided, especially at night. Here are some tips for staying safe in Paris:
1. Be aware of your surroundings: As with any city, be mindful of your belongings and keep an eye out for pickpocketing or scams.
2. Avoid walking alone in dimly lit or deserted areas: Some areas of Paris can be quite dark and empty, especially at night. Stick to well-lit streets and avoid walking alone in areas that seem deserted.
3. Use licensed taxis or ride-sharing services
===============================
Prompt: The future of AI is
Generated text: shaped by the data it’s trained on. As AI systems become increasingly autonomous, they will rely on high-quality, diverse, and unbiased data to make accurate decisions. But this data is often obtained from flawed sources, including biased language models, algorithms with inherent racial and gender biases, and incomplete datasets. This has led to many AI systems replicating and amplifying existing social biases, rather than mitigating them. To create a more equitable and inclusive future, it’s essential to understand the data’s role in shaping AI decisions and take steps to address these issues. In this article, we will explore the concept of data and its impact on AI
Streaming Synchronous Generation#
[3]:
prompts = [
"Write a short, neutral self-introduction for a fictional character. Hello, my name is",
"Provide a concise factual statement about France’s capital city. The capital of France is",
"Explain possible future trends in artificial intelligence. The future of AI is",
]
sampling_params = {
"temperature": 0.2,
"top_p": 0.9,
}
print("\n=== Testing synchronous streaming generation with overlap removal ===\n")
for prompt in prompts:
print(f"Prompt: {prompt}")
merged_output = stream_and_merge(llm, prompt, sampling_params)
print("Generated text:", merged_output)
print()
=== Testing synchronous streaming generation with overlap removal ===
Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text: Kaida. I'm a 22-year-old student at the University of Tokyo, studying environmental science. I'm originally from a small town in Hokkaido, where I grew up surrounded by nature. I'm interested in sustainable development and conservation, and I'm currently working on a research project about the impact of climate change on Japan's coastal ecosystems. I'm a bit of a introvert, but I enjoy hiking and trying out new foods. I'm looking forward to meeting new people and learning from them.
This self-introduction is neutral because it doesn't reveal any personal opinions or biases. It simply states facts about Kaida
Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text: Paris.
Provide a concise factual statement about France’s capital city.
The capital of France is Paris. The city is located in the northern part of the country, along the Seine River. Paris is known for its beautiful architecture, art museums, and fashion industry. The city is home to many famous landmarks, including the Eiffel Tower, Notre Dame Cathedral, and the Louvre Museum. Paris is a popular tourist destination and a major cultural center in Europe. The city has a population of over 2.1 million people and is the largest city in France. Paris is also a major hub for business
Prompt: Explain possible future trends in artificial intelligence. The future of AI is
Generated text: a topic of much speculation and debate. While it's difficult to predict exactly what the future will hold, here are some possible trends that could shape the development and impact of artificial intelligence in the coming years:
1. Increased Adoption of AI in Everyday Life: As AI technology becomes more accessible and affordable, we can expect to see its adoption in various aspects of daily life, such as:
a. Virtual assistants: AI-powered virtual assistants like Siri, Alexa, and Google Assistant will become even more prevalent and sophisticated, making it easier for people to interact with technology.
b. Smart homes: AI will play a key role in the development of smart
Non-streaming Asynchronous Generation#
[4]:
prompts = [
"Write a short, neutral self-introduction for a fictional character. Hello, my name is",
"Provide a concise factual statement about France’s capital city. The capital of France is",
"Explain possible future trends in artificial intelligence. The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}
print("\n=== Testing asynchronous batch generation ===")
async def main():
outputs = await llm.async_generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
print(f"\nPrompt: {prompt}")
print(f"Generated text: {output['text']}")
asyncio.run(main())
=== Testing asynchronous batch generation ===
Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text: Emilia Flynn. I am a 25-year-old freelance writer and artist living in a small town on the coast of Oregon. I have a passion for creative storytelling and love to explore the natural world for inspiration. When I'm not writing or painting, you can find me hiking in the woods or reading a good book. I'm a curious and creative person who values authenticity and individuality.
In this example, the introduction is neutral, but it gives a sense of the person's personality and interests. It also establishes the character's profession and location, which can be helpful for context. Here are some key points to consider when writing a
Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text: located on the Seine River and is known for its historical landmarks, such as the Eiffel Tower and Notre-Dame Cathedral.
In which city is the Eiffel Tower located? The Eiffel Tower is located in the city of Paris, France.
What is the significance of the Eiffel Tower? The Eiffel Tower is a symbol of French culture and engineering ingenuity, and it has become an iconic representation of Paris and France.
What is the main building style of the Eiffel Tower? The main building style of the Eiffel Tower is an iron lattice tower, also known as a "latt
Prompt: Explain possible future trends in artificial intelligence. The future of AI is
Generated text: filled with possibilities and some predictions are already coming true. In this article, we will highlight some of the trends in AI that could shape our future.
Future Trends in Artificial Intelligence
The future of artificial intelligence is filled with possibilities, and some predictions are already coming true. Some of the trends in AI that could shape our future include:
1. Increased Use of AI in Healthcare
Artificial intelligence (AI) is expected to play a larger role in the healthcare industry in the future. AI can help diagnose diseases more accurately and quickly, develop personalized treatment plans, and improve patient outcomes. AI-powered chatbots can also assist with patient engagement,
Streaming Asynchronous Generation#
[5]:
prompts = [
"Write a short, neutral self-introduction for a fictional character. Hello, my name is",
"Provide a concise factual statement about France’s capital city. The capital of France is",
"Explain possible future trends in artificial intelligence. The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}
print("\n=== Testing asynchronous streaming generation (no repeats) ===")
async def main():
for prompt in prompts:
print(f"\nPrompt: {prompt}")
print("Generated text: ", end="", flush=True)
# Replace direct calls to async_generate with our custom overlap-aware version
async for cleaned_chunk in async_stream_and_merge(llm, prompt, sampling_params):
print(cleaned_chunk, end="", flush=True)
print() # New line after each prompt
asyncio.run(main())
=== Testing asynchronous streaming generation (no repeats) ===
Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text: Kaida and I'm a 24-year-old freelance graphic designer currently living in Tokyo. I'm a bit of a night owl and enjoy experimenting with different design styles and mediums. I'm a student at the local art school, but I'm taking a break from classes to focus on my freelance business. I'm an introvert and prefer quieter environments, but I do enjoy trying new foods and drinks at local cafes. What can I tell you about? I'm happy to chat about design, art, or anything else you'd like to talk about.
I'm Kaida, a 25-year-old freelance graphic designer based in Osaka,
Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text: Paris.
https://learn.360training.com/courses/the-french-capital
https://www.britannica.com/place/Paris-France
The capital of France is Paris. https://learn.360training.com/courses/the-french-capital
https://www.britannica.com/place/Paris-France
The statement that the capital of France is Paris is correct. The Eiffel Tower, Notre-Dame Cathedral, and the Louvre Museum are just a few of the many famous landmarks that Paris is home to. The city is known for its rich history, art, fashion, and cuisine. Paris is also a
Prompt: Explain possible future trends in artificial intelligence. The future of AI is
Generated text: not just about the machines getting smarter, it will also have a significant impact on human life, jobs, and society. Some possible future trends in AI include:
Artificial General Intelligence (AGI) that surpasses human intelligence in many areas, leading to both benefits and risks.
Increased use of AI in healthcare, education, and other sectors, leading to improved efficiency and productivity.
Rise of Explainable AI (XAI) to ensure transparency and trust in AI decision-making.
More emphasis on Human-Centric AI that focuses on collaboration and augmentation of human capabilities.
Development of Autonomous Systems that can operate independently and make decisions without human intervention.
[6]:
llm.shutdown()