Offline Engine API#

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

  • Offline Batch Inference

  • Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

  • Non-streaming synchronous generation

  • Streaming synchronous generation

  • Non-streaming asynchronous generation

  • Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in custom_server.

Offline Batch Inference#

SGLang offline engine supports batch inference with efficient scheduling.

[1]:
# launch the offline engine

import sglang as sgl
import asyncio

llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")
Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:02,  1.43it/s]
Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.32it/s]
Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:02<00:00,  1.23it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.68it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.51it/s]

100%|██████████| 23/23 [00:07<00:00,  3.20it/s]

Non-streaming Synchronous Generation#

[2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")
===============================
Prompt: Hello, my name is
Generated text:  Drew. I’m a 25-year-old from the UK. I am looking for a pen pal from the UK, USA or Australia.
I’m interested in learning about your daily life, hobbies and culture. I enjoy playing guitar, reading and watching films. I’m a bit of a music lover and enjoy listening to a variety of genres.
I would love to hear from you if you are interested in exchanging letters and learning more about each other’s lives.
Hi Drew, I'm Emily. I'm 24 from the USA. I'm a big fan of music too! I play the piano and love listening to indie rock and folk
===============================
Prompt: The president of the United States is
Generated text:  the head of state and head of government of the United States, and is the ceremonial and political leader of the nation. The president is indirectly elected by the people through the Electoral College, and serves a four-year term. The president is the commander-in-chief of the armed forces, and has the power to negotiate treaties, appoint federal judges, and veto legislation.
The president of the United States is a unique position that has evolved over time, with powers and responsibilities that have been shaped by the Constitution, laws, and historical events. The president has a wide range of duties, including:
Leading the federal government and setting national policy
Representing
===============================
Prompt: The capital of France is
Generated text:  Paris. It is located in the north-central part of the country.
The official language is French.
Paris is a city that is known for its art, fashion, cuisine, and architecture. It is home to many famous landmarks, including the Eiffel Tower, the Louvre Museum, and Notre-Dame Cathedral.
The city has a population of over 2.1 million people and a metropolitan area population of over 12.2 million people. It is a major tourist destination and a hub for business, finance, and culture.
Paris is also known for its beautiful parks and gardens, including the Luxembourg Gardens and the Tuileries
===============================
Prompt: The future of AI is
Generated text:  in the cloud
Artificial intelligence (AI) is no longer just a buzzword, but a rapidly evolving technology that’s being integrated into various industries. From chatbots and virtual assistants to predictive analytics and autonomous vehicles, AI is transforming the way we live and work. However, as AI continues to grow in complexity and applications, it requires massive computing power, storage, and data processing capabilities. This is where the cloud comes in.
The cloud is not just a cost-effective way to store and process data; it also provides the scalability, flexibility, and on-demand access to resources that AI needs to thrive. In this article, we’ll

Streaming Synchronous Generation#

[3]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing synchronous streaming generation ===")

for prompt in prompts:
    print(f"\nPrompt: {prompt}")
    print("Generated text: ", end="", flush=True)

    for chunk in llm.generate(prompt, sampling_params, stream=True):
        print(chunk["text"], end="", flush=True)
    print()

=== Testing synchronous streaming generation ===

Prompt: Hello, my name is
Generated text:  Jack and I am a 12-year-old boy. I was diagnosed with cystic fibrosis (CF) when I was just two years old. My mom says I'm a fighter and that's because of all the medicine I take and the treatments I do every day. It's not always easy, but it's worth it because it helps me live a healthy life.
I love playing soccer and riding my bike. I'm really good at soccer and I'm on a team with my friends. I like to ride my bike to school and around my neighborhood. Sometimes I get tired easily, but that's okay because I just stop and

Prompt: The capital of France is
Generated text:  a city that needs no introduction. Paris, the City of Light, has been a cultural and artistic hub for centuries, attracting visitors from all over the world. From the iconic Eiffel Tower to the world-class museums like the Louvre and Orsay, Paris has something to offer for everyone.
But beyond the popular tourist attractions, there are many more hidden gems to explore in the City of Light. Here are some unique experiences to help you discover the authentic Paris:
1. Explore the Street Art of Montmartre
Montmartre, a historic neighborhood in the north of Paris, is a treasure trove of street art. Wander

Prompt: The future of AI is
Generated text:  here. As AI technology continues to advance, we're witnessing an exciting new wave of innovation across industries. But, what does this mean for the workforce, and how can we prepare our employees for this new reality?
In this video, we explore the trends shaping the future of work, the impact of AI on jobs, and provide practical advice on how to upskill and reskill your workforce for success in an AI-driven world.
Here are some key takeaways from the video:
1. AI will augment human capabilities, not replace them. While AI will certainly automate some tasks, it will also create new opportunities for humans to work alongside machines

Non-streaming Asynchronous Generation#

[4]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())

=== Testing asynchronous batch generation ===

Prompt: Hello, my name is
Generated text:  Ms. Brenda Goodman. I am the principal of our school, and I am excited to welcome you to our community! Our school is a place where students can grow and learn in a safe and supportive environment.
Our school offers a wide range of educational programs and activities to help students succeed. We have a dedicated team of teachers and staff who are committed to providing high-quality education to our students. We also have a variety of extracurricular activities and clubs that cater to different interests and talents.
At our school, we believe in the importance of social responsibility and community service. We encourage our students to get involved in local volunteer work and

Prompt: The capital of France is
Generated text:  Paris. Paris is located in the north-central part of France and is known for its rich history, art, fashion, and culture. Paris is the most visited city in the world and is home to many famous landmarks such as the Eiffel Tower, Notre Dame Cathedral, and the Louvre Museum.
Paris is a popular destination for tourists and business travelers alike. The city has a wide range of accommodations, from budget-friendly hostels to luxury hotels. Visitors can enjoy a variety of activities, including visiting museums, galleries, and historic sites, as well as exploring the city's charming neighborhoods and markets.
Paris is also known for its fashion

Prompt: The future of AI is
Generated text:  bright, and it will bring about exciting new technologies and innovations that will change the world in ways both big and small. But it will also bring about significant challenges and concerns that need to be addressed.
In this section, we'll explore the future of AI, including the benefits and risks, and how we can ensure that AI is developed and used responsibly.
AI and the Future of Work
The future of work will be significantly impacted by AI. While AI has the potential to bring about many benefits, such as increased productivity and efficiency, it also raises concerns about job displacement and the future of work.
On the one hand, AI has the

Streaming Asynchronous Generation#

[5]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        generator = await llm.async_generate(prompt, sampling_params, stream=True)
        async for chunk in generator:
            print(chunk["text"], end="", flush=True)
        print()


asyncio.run(main())

=== Testing asynchronous streaming generation ===

Prompt: Hello, my name is
Generated text:  Emily, and I am a fashion enthusiast! I have always loved expressing myself through clothes, shoes, and accessories. As a fashion blogger, I share my personal style, trends, and tips with you to help you feel confident and stylish.
On my blog, you will find a wide range of topics, from fashion trends and style advice to product reviews and shopping guides. I love discovering new brands and styles, and I am always on the lookout for inspiration to share with you.
My fashion style is eclectic and modern, with a touch of vintage flair. I love mixing high-end and low-end pieces, and I'm not afraid to take

Prompt: The capital of France is
Generated text:  a city of grandeur and elegance, a place that is steeped in history and culture. Paris, the City of Light, is known for its iconic landmarks, artistic treasures, and romantic atmosphere. From the Eiffel Tower to the Louvre Museum, there is no shortage of exciting things to see and do in this incredible city. Here are some of the top attractions and experiences that you shouldn’t miss on your trip to Paris:
1. The Eiffel Tower: This iconic iron lady is a must-visit attraction in Paris. You can take the elevator to the top for breathtaking views of the city, or dine at the

Prompt: The future of AI is
Generated text:  here – and it's all about human-centered AI
The future of AI is not just about machines getting smarter; it's about humans becoming more effective. By creating AI that is human-centered, organizations can unlock new levels of productivity, efficiency, and innovation.
The latest advancements in AI are not just about processing power, but about creating technologies that augment human capabilities and enhance our work lives. By leveraging machine learning, natural language processing, and computer vision, organizations can now create AI that is more intuitive, more flexible, and more human-centric.
But what does it mean to create an AI that is truly human-centered? It means that AI
[6]:
llm.shutdown()