Offline Engine API#
SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:
Offline Batch Inference
Custom Server on Top of the Engine
This document focuses on the offline batch inference, demonstrating four different inference modes:
Non-streaming synchronous generation
Streaming synchronous generation
Non-streaming asynchronous generation
Streaming asynchronous generation
Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in custom_server.
Advanced Usage#
The engine supports vlm inference as well as extracting hidden states.
Please see the examples for further use cases.
Offline Batch Inference#
SGLang offline engine supports batch inference with efficient scheduling.
[1]:
# launch the offline engine
import asyncio
import io
import os
from PIL import Image
import requests
import sglang as sgl
from sglang.srt.conversation import chat_templates
from sglang.test.test_utils import is_in_ci
from sglang.utils import async_stream_and_merge, stream_and_merge
if is_in_ci():
import patch
llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")
Loading safetensors checkpoint shards: 0% Completed | 0/4 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 25% Completed | 1/4 [00:00<00:02, 1.28it/s]
Loading safetensors checkpoint shards: 50% Completed | 2/4 [00:01<00:01, 1.08it/s]
Loading safetensors checkpoint shards: 75% Completed | 3/4 [00:02<00:00, 1.01it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00, 1.36it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00, 1.24it/s]
Non-streaming Synchronous Generation#
[2]:
prompts = [
"Hello, my name is",
"The president of the United States is",
"The capital of France is",
"The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}
outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
print("===============================")
print(f"Prompt: {prompt}\nGenerated text: {output['text']}")
===============================
Prompt: Hello, my name is
Generated text: Catherine Schuessler and I am excited to be your host for the upcoming Online Retreat and Conference. As a speaker, author, and educator, my passion is helping people find hope, healing, and transformation in the midst of life's challenges. I am grateful to be a part of this community and I look forward to connecting with you!
I have been in ministry for over 20 years, teaching and speaking at conferences, retreats, and local churches. My ministry focuses on helping people develop a deeper understanding of God's love, identity, and purpose. My books, "The Beauty of Brokenness" and "The Power of Un
===============================
Prompt: The president of the United States is
Generated text: not the CEO of the country. The president is the head of the executive branch of the federal government, which is just one of the three branches of the federal government established by the Constitution.
The Constitution divides power among three branches:
The legislative branch (Congress), which makes laws.
The executive branch (headed by the president), which carries out laws.
The judicial branch (the Supreme Court), which interprets laws.
The president is not the CEO of the country because he or she is not the head of the entire government. Instead, the president is the head of the executive branch, which is responsible for executing the laws passed by Congress.
===============================
Prompt: The capital of France is
Generated text: famous for its art, history, fashion, and romantic atmosphere. But, there's more to the City of Light than meets the eye. Paris has a rich cultural heritage, and one of the most fascinating aspects of this heritage is the numerous cemeteries and mausoleums that dot the city. Yes, you read that right – cemeteries! In Paris, even death is a work of art. Here are some of the most fascinating cemeteries and mausoleums to visit in Paris:
1. Père Lachaise Cemetery
This cemetery is one of the most famous in the world, and
===============================
Prompt: The future of AI is
Generated text: more than just machines doing tasks – it’s about augmenting human capabilities and creating new opportunities for collaboration, creativity and innovation.
The concept of artificial intelligence (AI) has captivated human imagination for decades, with movies and books often portraying a dystopian future where machines have surpassed human intelligence. While we are indeed witnessing rapid advancements in AI capabilities, the reality is far from a “robots-taking-over-the-world” narrative.
In fact, the future of AI is more about augmenting human capabilities, creating new opportunities for collaboration, creativity, and innovation, and helping us solve some of the world’s most pressing challenges. Here are some potential
Streaming Synchronous Generation#
[3]:
prompts = [
"Write a short, neutral self-introduction for a fictional character. Hello, my name is",
"Provide a concise factual statement about France’s capital city. The capital of France is",
"Explain possible future trends in artificial intelligence. The future of AI is",
]
sampling_params = {
"temperature": 0.2,
"top_p": 0.9,
}
print("\n=== Testing synchronous streaming generation with overlap removal ===\n")
for prompt in prompts:
print(f"Prompt: {prompt}")
merged_output = stream_and_merge(llm, prompt, sampling_params)
print("Generated text:", merged_output)
print()
=== Testing synchronous streaming generation with overlap removal ===
Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text: Kaida. I'm a 25-year-old freelance writer and editor. I live in a small apartment in the city with my cat, Luna. I enjoy reading, hiking, and trying out new coffee shops. I'm a bit of a introvert, but I'm always up for a good conversation.
This self-introduction is neutral because it doesn't reveal any personal opinions or biases. It simply states the character's name, age, occupation, living situation, and interests. It also mentions a few personality traits, but in a way that is neutral and doesn't make any judgments. For example, calling herself a "bit of
Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text: Paris.
Provide a concise factual statement about the population of France’s capital city. The population of Paris is approximately 2.1 million people.
Provide a concise factual statement about the location of France’s capital city. Paris is located in the northern part of France, in the Île-de-France region.
Provide a concise factual statement about the climate of France’s capital city. Paris has a temperate oceanic climate, characterized by mild winters and warm summers.
Provide a concise factual statement about the economy of France’s capital city. Paris is a major economic hub, with a strong focus on finance, fashion, and tourism.
Provide a
Prompt: Explain possible future trends in artificial intelligence. The future of AI is
Generated text: a topic of much speculation and debate. While it is difficult to predict exactly what the future will hold, here are some possible future trends in artificial intelligence:
1. Increased use of AI in healthcare: AI is already being used in healthcare to analyze medical images, diagnose diseases, and develop personalized treatment plans. In the future, AI is likely to play an even more significant role in healthcare, with applications such as:
2. AI-powered robots: Robots are becoming increasingly common in industries such as manufacturing, logistics, and healthcare. In the future, AI-powered robots are likely to become even more sophisticated, with capabilities such as:
3. AI
Non-streaming Asynchronous Generation#
[4]:
prompts = [
"Write a short, neutral self-introduction for a fictional character. Hello, my name is",
"Provide a concise factual statement about France’s capital city. The capital of France is",
"Explain possible future trends in artificial intelligence. The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}
print("\n=== Testing asynchronous batch generation ===")
async def main():
outputs = await llm.async_generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
print(f"\nPrompt: {prompt}")
print(f"Generated text: {output['text']}")
asyncio.run(main())
=== Testing asynchronous batch generation ===
Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text: Petra. I'm a 25-year-old artist living in a small coastal town in the Pacific Northwest. I spend most of my time painting landscapes and working on my latest project, a series of abstract pieces inspired by the ocean's moods.
I'm a creative person with a passion for art and a love for the outdoors. My favorite things to do are hiking, kayaking, and simply sitting on the beach, watching the waves roll in. I'm also an avid reader and enjoy collecting rare books on art and history.
I'm a bit of a introvert, but once you get to know me, I'm a warm and
Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text: Paris.
This is a factual statement that contains no opinion or bias. It simply presents information about the capital of France.
Provide a statement that conveys a positive opinion about Paris. Paris is a city of unparalleled beauty and culture, offering a unique blend of art, history, and cuisine that draws millions of visitors each year.
This statement expresses a positive opinion about Paris, highlighting its beauty, culture, and appeal to tourists.
Provide a statement that conveys a negative opinion about Paris. Paris is a crowded and overpriced city that can be overwhelming to visitors, with long lines and high prices for even the most basic services.
This statement
Prompt: Explain possible future trends in artificial intelligence. The future of AI is
Generated text: a topic of much discussion and speculation, with predictions ranging from the optimistic to the ominous. While it’s difficult to predict the future with certainty, here are some possible future trends in artificial intelligence:
1. Increased automation: AI will continue to automate various aspects of our lives, including jobs, tasks, and processes. This will lead to increased productivity and efficiency but also potential job displacement.
2. Advancements in natural language processing: AI will become more proficient in understanding and generating human-like language, enabling more seamless human-AI interactions and potentially revolutionizing industries such as customer service, healthcare, and education.
3. Growing use of AI in
Streaming Asynchronous Generation#
[5]:
prompts = [
"Write a short, neutral self-introduction for a fictional character. Hello, my name is",
"Provide a concise factual statement about France’s capital city. The capital of France is",
"Explain possible future trends in artificial intelligence. The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}
print("\n=== Testing asynchronous streaming generation (no repeats) ===")
async def main():
for prompt in prompts:
print(f"\nPrompt: {prompt}")
print("Generated text: ", end="", flush=True)
# Replace direct calls to async_generate with our custom overlap-aware version
async for cleaned_chunk in async_stream_and_merge(llm, prompt, sampling_params):
print(cleaned_chunk, end="", flush=True)
print() # New line after each prompt
asyncio.run(main())
=== Testing asynchronous streaming generation (no repeats) ===
Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text: Lena Taylor. I'm a young adult in my early twenties, living in the city. I've been working as a freelance artist, focusing on digital media and graphic design. I'm currently taking online courses to further develop my skills in animation. When I'm not working or studying, I enjoy exploring the city, trying out new restaurants and cafes, and spending time with friends. I'm a bit of a introvert, but I'm working on becoming more outgoing and confident. I'm excited to see where my creative journey takes me.
This self-introduction is neutral because it doesn't reveal any specific personality traits, interests, or goals
Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text: Paris.
Provide a concise factual statement about France’s largest city. The largest city in France is Lyon.
Provide a concise factual statement about the economic status of France. France has the sixth largest economy in the world.
Provide a concise factual statement about the population of France. The population of France is approximately 67 million people.
Provide a concise factual statement about France’s official language. The official language of France is French.
Provide a concise factual statement about France’s system of government. France is a semi-presidential constitutional republic.
Provide a concise factual statement about the currency of France. The official currency of France is the
Prompt: Explain possible future trends in artificial intelligence. The future of AI is
Generated text: far from certain, and there are numerous potential trends that could shape the development of this technology. Some of these trends include: - Rise of Explainable AI (XAI): As AI becomes more integrated into our lives, there will be a growing need for transparency and accountability in AI decision-making. XAI aims to make AI more interpretable and transparent, allowing users to understand the reasoning behind AI-driven decisions. - Increased Adoption of Edge AI: As the number of IoT devices continues to grow, edge AI will become increasingly important for processing data locally and making decisions in real-time. This will enable faster and more efficient decision-making, while also
[6]:
llm.shutdown()