Structured Outputs For Reasoning Models#
When working with reasoning models that use special tokens like <think>...</think>
to denote reasoning sections, you might want to allow free-form text within these sections while still enforcing grammar constraints on the rest of the output.
SGLang provides a feature to disable grammar restrictions within reasoning sections. This is particularly useful for models that need to perform complex reasoning steps before providing a structured output.
To enable this feature, use the --reasoning-parser
flag which decide the think_end_token, such as </think>
, when launching the server. You can also specify the reasoning parser using the --reasoning-parser
flag.
Supported Models#
Currently, SGLang supports the following reasoning models:
DeepSeek R1 series: The reasoning content is wrapped with
<think>
and</think>
tags.QwQ: The reasoning content is wrapped with
<think>
and</think>
tags.
Usage#
OpenAI Compatible API#
Specify the --grammar-backend
, --reasoning-parser
option.
[1]:
import openai
import os
from sglang.test.test_utils import is_in_ci
if is_in_ci():
from patch import launch_server_cmd
else:
from sglang.utils import launch_server_cmd
from sglang.utils import wait_for_server, print_highlight, terminate_process
os.environ["TOKENIZERS_PARALLELISM"] = "false"
server_process, port = launch_server_cmd(
"python -m sglang.launch_server --model-path deepseek-ai/DeepSeek-R1-Distill-Qwen-7B --host 0.0.0.0 --reasoning-parser deepseek-r1"
)
wait_for_server(f"http://localhost:{port}")
client = openai.Client(base_url=f"http://127.0.0.1:{port}/v1", api_key="None")
[2025-04-25 07:49:11] server_args=ServerArgs(model_path='deepseek-ai/DeepSeek-R1-Distill-Qwen-7B', tokenizer_path='deepseek-ai/DeepSeek-R1-Distill-Qwen-7B', tokenizer_mode='auto', skip_tokenizer_init=False, enable_tokenizer_batch_encode=False, load_format='auto', trust_remote_code=False, dtype='auto', kv_cache_dtype='auto', quantization=None, quantization_param_path=None, context_length=None, device='cuda', served_model_name='deepseek-ai/DeepSeek-R1-Distill-Qwen-7B', chat_template=None, completion_template=None, is_embedding=False, revision=None, host='0.0.0.0', port=38822, mem_fraction_static=0.88, max_running_requests=200, max_total_tokens=20480, chunked_prefill_size=8192, max_prefill_tokens=16384, schedule_policy='fcfs', schedule_conservativeness=1.0, cpu_offload_gb=0, page_size=1, tp_size=1, stream_interval=1, stream_output=False, random_seed=626545175, constrained_json_whitespace_pattern=None, watchdog_timeout=300, dist_timeout=None, download_dir=None, base_gpu_id=0, gpu_id_step=1, log_level='info', log_level_http=None, log_requests=False, log_requests_level=0, show_time_cost=False, enable_metrics=False, decode_log_interval=40, api_key=None, file_storage_path='sglang_storage', enable_cache_report=False, reasoning_parser='deepseek-r1', dp_size=1, load_balance_method='round_robin', ep_size=1, dist_init_addr=None, nnodes=1, node_rank=0, json_model_override_args='{}', lora_paths=None, max_loras_per_batch=8, lora_backend='triton', attention_backend=None, sampling_backend='flashinfer', grammar_backend='xgrammar', speculative_algorithm=None, speculative_draft_model_path=None, speculative_num_steps=None, speculative_eagle_topk=None, speculative_num_draft_tokens=None, speculative_accept_threshold_single=1.0, speculative_accept_threshold_acc=1.0, speculative_token_map=None, enable_double_sparsity=False, ds_channel_config_path=None, ds_heavy_channel_num=32, ds_heavy_token_num=256, ds_heavy_channel_type='qk', ds_sparse_decode_threshold=4096, disable_radix_cache=False, disable_cuda_graph=True, disable_cuda_graph_padding=False, enable_nccl_nvls=False, disable_outlines_disk_cache=False, disable_custom_all_reduce=False, enable_multimodal=None, disable_overlap_schedule=False, enable_mixed_chunk=False, enable_dp_attention=False, enable_ep_moe=False, enable_deepep_moe=False, deepep_mode='auto', enable_torch_compile=False, torch_compile_max_bs=32, cuda_graph_max_bs=None, cuda_graph_bs=None, torchao_config='', enable_nan_detection=False, enable_p2p_check=False, triton_attention_reduce_in_fp32=False, triton_attention_num_kv_splits=8, num_continuous_decode_steps=1, delete_ckpt_after_loading=False, enable_memory_saver=False, allow_auto_truncate=False, enable_custom_logit_processor=False, tool_call_parser=None, enable_hierarchical_cache=False, hicache_ratio=2.0, hicache_size=0, hicache_write_policy='write_through_selective', flashinfer_mla_disable_ragged=False, warmups=None, moe_dense_tp_size=None, n_share_experts_fusion=0, disable_chunked_prefix_cache=False, disable_fast_image_processor=False, debug_tensor_dump_output_folder=None, debug_tensor_dump_input_file=None, debug_tensor_dump_inject=False, disaggregation_mode='null', disaggregation_bootstrap_port=8998, disaggregation_transfer_backend='mooncake', disaggregation_ib_device=None)
[2025-04-25 07:49:22 TP0] Attention backend not set. Use fa3 backend by default.
[2025-04-25 07:49:22 TP0] Init torch distributed begin.
[2025-04-25 07:49:23 TP0] Init torch distributed ends. mem usage=0.00 GB
[2025-04-25 07:49:23 TP0] Load weight begin. avail mem=65.41 GB
[2025-04-25 07:49:23 TP0] Ignore import error when loading sglang.srt.models.arctic. No module named 'sglang.srt.layers.fused_moe'
[2025-04-25 07:49:23 TP0] Ignore import error when loading sglang.srt.models.llama4.
[2025-04-25 07:49:23 TP0] Using model weights format ['*.safetensors']
Loading safetensors checkpoint shards: 0% Completed | 0/2 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 50% Completed | 1/2 [00:01<00:01, 1.41s/it]
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:02<00:00, 1.34s/it]
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:02<00:00, 1.35s/it]
[2025-04-25 07:49:26 TP0] Load weight end. type=Qwen2ForCausalLM, dtype=torch.bfloat16, avail mem=39.29 GB, mem usage=26.13 GB.
[2025-04-25 07:49:26 TP0] KV Cache is allocated. #tokens: 20480, K size: 0.55 GB, V size: 0.55 GB
[2025-04-25 07:49:26 TP0] Memory pool end. avail mem=37.91 GB
[2025-04-25 07:49:27 TP0]
CUDA Graph is DISABLED.
This will cause significant performance degradation.
CUDA Graph should almost never be disabled in most usage scenarios.
If you encounter OOM issues, please try setting --mem-fraction-static to a lower value (such as 0.8 or 0.7) instead of disabling CUDA Graph.
[2025-04-25 07:49:27 TP0] max_total_num_tokens=20480, chunked_prefill_size=8192, max_prefill_tokens=16384, max_running_requests=200, context_len=131072
[2025-04-25 07:49:27] INFO: Started server process [183237]
[2025-04-25 07:49:27] INFO: Waiting for application startup.
[2025-04-25 07:49:27] INFO: Application startup complete.
[2025-04-25 07:49:27] INFO: Uvicorn running on http://0.0.0.0:38822 (Press CTRL+C to quit)
[2025-04-25 07:49:28] INFO: 127.0.0.1:51214 - "GET /v1/models HTTP/1.1" 200 OK
[2025-04-25 07:49:28] INFO: 127.0.0.1:51220 - "GET /get_model_info HTTP/1.1" 200 OK
[2025-04-25 07:49:28 TP0] Prefill batch. #new-seq: 1, #new-token: 7, #cached-token: 0, token usage: 0.00, #running-req: 0, #queue-req: 0,
[2025-04-25 07:49:31] INFO: 127.0.0.1:51228 - "POST /generate HTTP/1.1" 200 OK
[2025-04-25 07:49:31] The server is fired up and ready to roll!
NOTE: Typically, the server runs in a separate terminal.
In this notebook, we run the server and notebook code together, so their outputs are combined.
To improve clarity, the server logs are displayed in the original black color, while the notebook outputs are highlighted in blue.
We are running those notebooks in a CI parallel environment, so the throughput is not representative of the actual performance.
JSON#
you can directly define a JSON schema or use Pydantic to define and validate the response.
Using Pydantic
[2]:
from pydantic import BaseModel, Field
# Define the schema using Pydantic
class CapitalInfo(BaseModel):
name: str = Field(..., pattern=r"^\w+$", description="Name of the capital city")
population: int = Field(..., description="Population of the capital city")
response = client.chat.completions.create(
model="deepseek-ai/DeepSeek-R1-Distill-Qwen-7B",
messages=[
{
"role": "user",
"content": "Please generate the information of the capital of France in the JSON format.",
},
],
temperature=0,
max_tokens=2048,
response_format={
"type": "json_schema",
"json_schema": {
"name": "foo",
# convert the pydantic model to json schema
"schema": CapitalInfo.model_json_schema(),
},
},
)
print_highlight(
f"reasoing_content: {response.choices[0].message.reasoning_content}\n\ncontent: {response.choices[0].message.content}"
)
[2025-04-25 07:49:34 TP0] Prefill batch. #new-seq: 1, #new-token: 18, #cached-token: 1, token usage: 0.00, #running-req: 0, #queue-req: 0,
[2025-04-25 07:49:35 TP0] Decode batch. #running-req: 1, #token: 52, token usage: 0.00, gen throughput (token/s): 5.09, #queue-req: 0,
[2025-04-25 07:49:35 TP0] Decode batch. #running-req: 1, #token: 92, token usage: 0.00, gen throughput (token/s): 104.58, #queue-req: 0,
[2025-04-25 07:49:36 TP0] Decode batch. #running-req: 1, #token: 132, token usage: 0.01, gen throughput (token/s): 103.28, #queue-req: 0,
[2025-04-25 07:49:36 TP0] Decode batch. #running-req: 1, #token: 172, token usage: 0.01, gen throughput (token/s): 103.04, #queue-req: 0,
[2025-04-25 07:49:36 TP0] Decode batch. #running-req: 1, #token: 212, token usage: 0.01, gen throughput (token/s): 98.44, #queue-req: 0,
[2025-04-25 07:49:37 TP0] Decode batch. #running-req: 1, #token: 252, token usage: 0.01, gen throughput (token/s): 98.64, #queue-req: 0,
[2025-04-25 07:49:37 TP0] Decode batch. #running-req: 1, #token: 292, token usage: 0.01, gen throughput (token/s): 101.19, #queue-req: 0,
[2025-04-25 07:49:38 TP0] Decode batch. #running-req: 1, #token: 332, token usage: 0.02, gen throughput (token/s): 101.64, #queue-req: 0,
[2025-04-25 07:49:38 TP0] Decode batch. #running-req: 1, #token: 372, token usage: 0.02, gen throughput (token/s): 101.79, #queue-req: 0,
[2025-04-25 07:49:38 TP0] Decode batch. #running-req: 1, #token: 412, token usage: 0.02, gen throughput (token/s): 95.72, #queue-req: 0,
[2025-04-25 07:49:39 TP0] Decode batch. #running-req: 1, #token: 452, token usage: 0.02, gen throughput (token/s): 100.87, #queue-req: 0,
[2025-04-25 07:49:39 TP0] Decode batch. #running-req: 1, #token: 492, token usage: 0.02, gen throughput (token/s): 101.62, #queue-req: 0,
[2025-04-25 07:49:40 TP0] Decode batch. #running-req: 1, #token: 532, token usage: 0.03, gen throughput (token/s): 96.70, #queue-req: 0,
[2025-04-25 07:49:40 TP0] Decode batch. #running-req: 1, #token: 572, token usage: 0.03, gen throughput (token/s): 99.41, #queue-req: 0,
[2025-04-25 07:49:40 TP0] Decode batch. #running-req: 1, #token: 612, token usage: 0.03, gen throughput (token/s): 99.35, #queue-req: 0,
[2025-04-25 07:49:41 TP0] Decode batch. #running-req: 1, #token: 652, token usage: 0.03, gen throughput (token/s): 99.10, #queue-req: 0,
[2025-04-25 07:49:41 TP0] Decode batch. #running-req: 1, #token: 692, token usage: 0.03, gen throughput (token/s): 99.71, #queue-req: 0,
[2025-04-25 07:49:42 TP0] Decode batch. #running-req: 1, #token: 732, token usage: 0.04, gen throughput (token/s): 99.00, #queue-req: 0,
[2025-04-25 07:49:42 TP0] Decode batch. #running-req: 1, #token: 772, token usage: 0.04, gen throughput (token/s): 101.70, #queue-req: 0,
[2025-04-25 07:49:42 TP0] Decode batch. #running-req: 1, #token: 812, token usage: 0.04, gen throughput (token/s): 95.93, #queue-req: 0,
[2025-04-25 07:49:43 TP0] Decode batch. #running-req: 1, #token: 852, token usage: 0.04, gen throughput (token/s): 97.39, #queue-req: 0,
[2025-04-25 07:49:43 TP0] Decode batch. #running-req: 1, #token: 892, token usage: 0.04, gen throughput (token/s): 99.32, #queue-req: 0,
[2025-04-25 07:49:44 TP0] Decode batch. #running-req: 1, #token: 932, token usage: 0.05, gen throughput (token/s): 101.20, #queue-req: 0,
[2025-04-25 07:49:44 TP0] Decode batch. #running-req: 1, #token: 972, token usage: 0.05, gen throughput (token/s): 99.92, #queue-req: 0,
[2025-04-25 07:49:44 TP0] Decode batch. #running-req: 1, #token: 1012, token usage: 0.05, gen throughput (token/s): 97.12, #queue-req: 0,
[2025-04-25 07:49:45 TP0] Decode batch. #running-req: 1, #token: 1052, token usage: 0.05, gen throughput (token/s): 99.19, #queue-req: 0,
[2025-04-25 07:49:45 TP0] Decode batch. #running-req: 1, #token: 1092, token usage: 0.05, gen throughput (token/s): 101.57, #queue-req: 0,
[2025-04-25 07:49:46 TP0] Decode batch. #running-req: 1, #token: 1132, token usage: 0.06, gen throughput (token/s): 99.75, #queue-req: 0,
[2025-04-25 07:49:46 TP0] Decode batch. #running-req: 1, #token: 1172, token usage: 0.06, gen throughput (token/s): 96.00, #queue-req: 0,
[2025-04-25 07:49:46 TP0] Decode batch. #running-req: 1, #token: 1212, token usage: 0.06, gen throughput (token/s): 97.95, #queue-req: 0,
[2025-04-25 07:49:47 TP0] Decode batch. #running-req: 1, #token: 1252, token usage: 0.06, gen throughput (token/s): 101.60, #queue-req: 0,
[2025-04-25 07:49:47 TP0] Decode batch. #running-req: 1, #token: 1292, token usage: 0.06, gen throughput (token/s): 97.45, #queue-req: 0,
[2025-04-25 07:49:48 TP0] Decode batch. #running-req: 1, #token: 1332, token usage: 0.07, gen throughput (token/s): 99.99, #queue-req: 0,
[2025-04-25 07:49:48 TP0] Decode batch. #running-req: 1, #token: 1372, token usage: 0.07, gen throughput (token/s): 99.78, #queue-req: 0,
[2025-04-25 07:49:48 TP0] Decode batch. #running-req: 1, #token: 1412, token usage: 0.07, gen throughput (token/s): 100.17, #queue-req: 0,
[2025-04-25 07:49:49 TP0] Decode batch. #running-req: 1, #token: 1452, token usage: 0.07, gen throughput (token/s): 102.68, #queue-req: 0,
[2025-04-25 07:49:49 TP0] Decode batch. #running-req: 1, #token: 1492, token usage: 0.07, gen throughput (token/s): 99.75, #queue-req: 0,
[2025-04-25 07:49:50 TP0] Decode batch. #running-req: 1, #token: 1532, token usage: 0.07, gen throughput (token/s): 100.28, #queue-req: 0,
[2025-04-25 07:49:50 TP0] Decode batch. #running-req: 1, #token: 1572, token usage: 0.08, gen throughput (token/s): 97.98, #queue-req: 0,
[2025-04-25 07:49:50 TP0] Decode batch. #running-req: 1, #token: 1612, token usage: 0.08, gen throughput (token/s): 99.48, #queue-req: 0,
[2025-04-25 07:49:51 TP0] Decode batch. #running-req: 1, #token: 1652, token usage: 0.08, gen throughput (token/s): 100.89, #queue-req: 0,
[2025-04-25 07:49:51 TP0] Decode batch. #running-req: 1, #token: 1692, token usage: 0.08, gen throughput (token/s): 98.51, #queue-req: 0,
[2025-04-25 07:49:52 TP0] Decode batch. #running-req: 1, #token: 1732, token usage: 0.08, gen throughput (token/s): 97.63, #queue-req: 0,
[2025-04-25 07:49:52 TP0] Decode batch. #running-req: 1, #token: 1772, token usage: 0.09, gen throughput (token/s): 99.82, #queue-req: 0,
[2025-04-25 07:49:52 TP0] Decode batch. #running-req: 1, #token: 1812, token usage: 0.09, gen throughput (token/s): 102.56, #queue-req: 0,
[2025-04-25 07:49:53 TP0] Decode batch. #running-req: 1, #token: 1852, token usage: 0.09, gen throughput (token/s): 100.31, #queue-req: 0,
[2025-04-25 07:49:53 TP0] Decode batch. #running-req: 1, #token: 1892, token usage: 0.09, gen throughput (token/s): 97.37, #queue-req: 0,
[2025-04-25 07:49:54 TP0] Decode batch. #running-req: 1, #token: 1932, token usage: 0.09, gen throughput (token/s): 99.54, #queue-req: 0,
[2025-04-25 07:49:54 TP0] Decode batch. #running-req: 1, #token: 1972, token usage: 0.10, gen throughput (token/s): 99.86, #queue-req: 0,
[2025-04-25 07:49:54 TP0] Decode batch. #running-req: 1, #token: 2012, token usage: 0.10, gen throughput (token/s): 99.14, #queue-req: 0,
[2025-04-25 07:49:55 TP0] Decode batch. #running-req: 1, #token: 2052, token usage: 0.10, gen throughput (token/s): 96.95, #queue-req: 0,
[2025-04-25 07:49:55] INFO: 127.0.0.1:51230 - "POST /v1/chat/completions HTTP/1.1" 200 OK
I should probably start by listing the basic facts. The capital of France is Paris, so that's straightforward. The country it's the capital of is France, which I can confirm. The location is in the northern part of the country, near the Seine River. I remember that Paris is located in the Île-de-France region, which is a large area including other cities like Lyon and Marseille.
Next, I should think about the population. I think Paris is the second-largest city in France, after metropolitan Paris, which includes a much larger area. The population numbers might be around 2 million for the city proper and 8 million for the metropolitan area. I should double-check that, but I'm pretty sure that's correct.
Moving on to landmarks, the Eiffel Tower is a must. It's a symbol of the city and the country. The Louvre Museum is another famous landmark, one of the largest art museums in the world. The Paris Opera House is also iconic, especially for its architecture. The Arc de Triomphe is a significant historical monument, and Notre-Dame, despite the recent issues, is still a major attraction, though it's currently undergoing renovations.
I should include some key facts about Paris. It's known for its rich history, being the birthplace of many famous people like Victor Hugo, Ernest Hemingway, and others. It's also a global city with a vibrant cultural scene, hosting events like the French网球公开赛 and the Tour de France. The cuisine is a big part of its identity, with famous dishes like croissant and boeuf bourguignon.
Transportation is another area. Paris has an extensive public transportation system, including the Métro, which is a large subway network. The RER is another rail network that connects to other cities. Taxis are also a common mode of transportation, and there are bike lanes throughout the city, especially in the Île-de-France region.
I should structure this information into a JSON format. The JSON should have a key for the capital, which is "Paris", and then an object containing the details. I'll list each piece of information as a key-value pair under the "capital" key. I need to make sure the JSON is properly formatted with commas and brackets, and that strings are enclosed in quotes.
Wait, I should also consider the population numbers. I think the population of Paris itself is around 2.1 million, while the metropolitan area is about 8.5 million. I should include that. Also, the area of Paris is approximately 105 square kilometers, and the metropolitan area is about 12,500 square kilometers.
I should also mention the time zone. Paris is in Central European Time (CET) during standard time and Central European Summer Time (CEST) in summer. That's important for international users.
Let me organize all this information into a JSON structure. I'll start with the capital key, then include the population, location, landmarks, key facts, transportation, and area. I'll make sure each key is descriptive and the values are accurate.
I think I've covered all the main points. Now, I'll format it correctly, ensuring that the JSON syntax is correct with proper commas and brackets. I'll avoid any markdown formatting as per the instructions and just present the JSON.
content: {
"name": "Paris",
"population": 214300000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
JSON Schema Directly
[3]:
import json
json_schema = json.dumps(
{
"type": "object",
"properties": {
"name": {"type": "string", "pattern": "^[\\w]+$"},
"population": {"type": "integer"},
},
"required": ["name", "population"],
}
)
response = client.chat.completions.create(
model="deepseek-ai/DeepSeek-R1-Distill-Qwen-7B",
messages=[
{
"role": "user",
"content": "Give me the information of the capital of France in the JSON format.",
},
],
temperature=0,
max_tokens=2048,
response_format={
"type": "json_schema",
"json_schema": {"name": "foo", "schema": json.loads(json_schema)},
},
)
print_highlight(
f"reasoing_content: {response.choices[0].message.reasoning_content}\n\ncontent: {response.choices[0].message.content}"
)
[2025-04-25 07:49:55 TP0] Prefill batch. #new-seq: 1, #new-token: 17, #cached-token: 2, token usage: 0.00, #running-req: 0, #queue-req: 0,
[2025-04-25 07:49:55 TP0] Decode batch. #running-req: 1, #token: 44, token usage: 0.00, gen throughput (token/s): 74.33, #queue-req: 0,
[2025-04-25 07:49:56 TP0] Decode batch. #running-req: 1, #token: 84, token usage: 0.00, gen throughput (token/s): 103.12, #queue-req: 0,
[2025-04-25 07:49:56 TP0] Decode batch. #running-req: 1, #token: 124, token usage: 0.01, gen throughput (token/s): 102.10, #queue-req: 0,
[2025-04-25 07:49:57 TP0] Decode batch. #running-req: 1, #token: 164, token usage: 0.01, gen throughput (token/s): 101.34, #queue-req: 0,
[2025-04-25 07:49:57 TP0] Decode batch. #running-req: 1, #token: 204, token usage: 0.01, gen throughput (token/s): 98.94, #queue-req: 0,
[2025-04-25 07:49:57 TP0] Decode batch. #running-req: 1, #token: 244, token usage: 0.01, gen throughput (token/s): 101.84, #queue-req: 0,
[2025-04-25 07:49:58 TP0] Decode batch. #running-req: 1, #token: 284, token usage: 0.01, gen throughput (token/s): 104.70, #queue-req: 0,
[2025-04-25 07:49:58 TP0] Decode batch. #running-req: 1, #token: 324, token usage: 0.02, gen throughput (token/s): 102.68, #queue-req: 0,
[2025-04-25 07:49:59 TP0] Decode batch. #running-req: 1, #token: 364, token usage: 0.02, gen throughput (token/s): 90.79, #queue-req: 0,
[2025-04-25 07:49:59 TP0] Decode batch. #running-req: 1, #token: 404, token usage: 0.02, gen throughput (token/s): 104.16, #queue-req: 0,
[2025-04-25 07:49:59 TP0] Decode batch. #running-req: 1, #token: 444, token usage: 0.02, gen throughput (token/s): 96.70, #queue-req: 0,
[2025-04-25 07:50:00 TP0] Decode batch. #running-req: 1, #token: 484, token usage: 0.02, gen throughput (token/s): 102.69, #queue-req: 0,
[2025-04-25 07:50:00 TP0] Decode batch. #running-req: 1, #token: 524, token usage: 0.03, gen throughput (token/s): 102.19, #queue-req: 0,
[2025-04-25 07:50:01 TP0] Decode batch. #running-req: 1, #token: 564, token usage: 0.03, gen throughput (token/s): 100.98, #queue-req: 0,
[2025-04-25 07:50:01 TP0] Decode batch. #running-req: 1, #token: 604, token usage: 0.03, gen throughput (token/s): 100.47, #queue-req: 0,
[2025-04-25 07:50:01 TP0] Decode batch. #running-req: 1, #token: 644, token usage: 0.03, gen throughput (token/s): 100.10, #queue-req: 0,
[2025-04-25 07:50:02 TP0] Decode batch. #running-req: 1, #token: 684, token usage: 0.03, gen throughput (token/s): 100.42, #queue-req: 0,
[2025-04-25 07:50:02 TP0] Decode batch. #running-req: 1, #token: 724, token usage: 0.04, gen throughput (token/s): 98.64, #queue-req: 0,
[2025-04-25 07:50:03 TP0] Decode batch. #running-req: 1, #token: 764, token usage: 0.04, gen throughput (token/s): 99.28, #queue-req: 0,
[2025-04-25 07:50:03 TP0] Decode batch. #running-req: 1, #token: 804, token usage: 0.04, gen throughput (token/s): 98.15, #queue-req: 0,
[2025-04-25 07:50:03 TP0] Decode batch. #running-req: 1, #token: 844, token usage: 0.04, gen throughput (token/s): 103.44, #queue-req: 0,
[2025-04-25 07:50:04 TP0] Decode batch. #running-req: 1, #token: 884, token usage: 0.04, gen throughput (token/s): 101.36, #queue-req: 0,
[2025-04-25 07:50:04 TP0] Decode batch. #running-req: 1, #token: 924, token usage: 0.05, gen throughput (token/s): 101.56, #queue-req: 0,
[2025-04-25 07:50:05 TP0] Decode batch. #running-req: 1, #token: 964, token usage: 0.05, gen throughput (token/s): 101.39, #queue-req: 0,
[2025-04-25 07:50:05 TP0] Decode batch. #running-req: 1, #token: 1004, token usage: 0.05, gen throughput (token/s): 98.91, #queue-req: 0,
[2025-04-25 07:50:05 TP0] Decode batch. #running-req: 1, #token: 1044, token usage: 0.05, gen throughput (token/s): 103.39, #queue-req: 0,
[2025-04-25 07:50:06 TP0] Decode batch. #running-req: 1, #token: 1084, token usage: 0.05, gen throughput (token/s): 100.80, #queue-req: 0,
[2025-04-25 07:50:06 TP0] Decode batch. #running-req: 1, #token: 1124, token usage: 0.05, gen throughput (token/s): 100.55, #queue-req: 0,
[2025-04-25 07:50:07 TP0] Decode batch. #running-req: 1, #token: 1164, token usage: 0.06, gen throughput (token/s): 99.87, #queue-req: 0,
[2025-04-25 07:50:07 TP0] Decode batch. #running-req: 1, #token: 1204, token usage: 0.06, gen throughput (token/s): 97.93, #queue-req: 0,
[2025-04-25 07:50:07 TP0] Decode batch. #running-req: 1, #token: 1244, token usage: 0.06, gen throughput (token/s): 100.47, #queue-req: 0,
[2025-04-25 07:50:08 TP0] Decode batch. #running-req: 1, #token: 1284, token usage: 0.06, gen throughput (token/s): 100.61, #queue-req: 0,
[2025-04-25 07:50:08 TP0] Decode batch. #running-req: 1, #token: 1324, token usage: 0.06, gen throughput (token/s): 100.16, #queue-req: 0,
[2025-04-25 07:50:09 TP0] Decode batch. #running-req: 1, #token: 1364, token usage: 0.07, gen throughput (token/s): 102.74, #queue-req: 0,
[2025-04-25 07:50:09 TP0] Decode batch. #running-req: 1, #token: 1404, token usage: 0.07, gen throughput (token/s): 98.43, #queue-req: 0,
[2025-04-25 07:50:09 TP0] Decode batch. #running-req: 1, #token: 1444, token usage: 0.07, gen throughput (token/s): 100.36, #queue-req: 0,
[2025-04-25 07:50:10 TP0] Decode batch. #running-req: 1, #token: 1484, token usage: 0.07, gen throughput (token/s): 102.64, #queue-req: 0,
[2025-04-25 07:50:10 TP0] Decode batch. #running-req: 1, #token: 1524, token usage: 0.07, gen throughput (token/s): 98.10, #queue-req: 0,
[2025-04-25 07:50:11 TP0] Decode batch. #running-req: 1, #token: 1564, token usage: 0.08, gen throughput (token/s): 100.34, #queue-req: 0,
[2025-04-25 07:50:11 TP0] Decode batch. #running-req: 1, #token: 1604, token usage: 0.08, gen throughput (token/s): 100.47, #queue-req: 0,
[2025-04-25 07:50:11 TP0] Decode batch. #running-req: 1, #token: 1644, token usage: 0.08, gen throughput (token/s): 99.63, #queue-req: 0,
[2025-04-25 07:50:12 TP0] Decode batch. #running-req: 1, #token: 1684, token usage: 0.08, gen throughput (token/s): 97.60, #queue-req: 0,
[2025-04-25 07:50:12 TP0] Decode batch. #running-req: 1, #token: 1724, token usage: 0.08, gen throughput (token/s): 99.12, #queue-req: 0,
[2025-04-25 07:50:13 TP0] Decode batch. #running-req: 1, #token: 1764, token usage: 0.09, gen throughput (token/s): 99.25, #queue-req: 0,
[2025-04-25 07:50:13 TP0] Decode batch. #running-req: 1, #token: 1804, token usage: 0.09, gen throughput (token/s): 97.58, #queue-req: 0,
[2025-04-25 07:50:13 TP0] Decode batch. #running-req: 1, #token: 1844, token usage: 0.09, gen throughput (token/s): 101.50, #queue-req: 0,
[2025-04-25 07:50:14 TP0] Decode batch. #running-req: 1, #token: 1884, token usage: 0.09, gen throughput (token/s): 99.26, #queue-req: 0,
[2025-04-25 07:50:14 TP0] Decode batch. #running-req: 1, #token: 1924, token usage: 0.09, gen throughput (token/s): 96.94, #queue-req: 0,
[2025-04-25 07:50:15 TP0] Decode batch. #running-req: 1, #token: 1964, token usage: 0.10, gen throughput (token/s): 98.85, #queue-req: 0,
[2025-04-25 07:50:15 TP0] Decode batch. #running-req: 1, #token: 2004, token usage: 0.10, gen throughput (token/s): 96.41, #queue-req: 0,
[2025-04-25 07:50:15 TP0] Decode batch. #running-req: 1, #token: 2044, token usage: 0.10, gen throughput (token/s): 99.43, #queue-req: 0,
[2025-04-25 07:50:16] INFO: 127.0.0.1:51230 - "POST /v1/chat/completions HTTP/1.1" 200 OK
Now, moving on to the population. I think Paris is a very large city, one of the biggest in the world. I remember reading somewhere that it's over 3 million people, but I'm not sure of the exact figure. Maybe around 3.5 million? I should look that up to be accurate, but since I'm just brainstorming, I'll go with that estimate.
Next, the area. Paris is a big city, but it's also a dense urban area. I think the metropolitan area covers a large region, maybe around 12,000 square kilometers? But the city proper is smaller. I'm not exactly sure, but I'll put 10,500 square kilometers for the city area and 12,000 for the metropolitan area.
Language is another point. Paris is a center for French culture, so the predominant language there is definitely French. I don't think they speak any other language there predominantly, though there might be some English, especially in tourist areas or with expatriates.
Cuisine is interesting. Paris is known for its high-end, fine dining, especially French cuisine. I know places like Le Faitout and others that are famous for their intricate dishes. Parisians are also known for their coffee culture, so maybe that's another point to include.
Transportation-wise, Paris has an extensive public transit system. The RER and地铁 are part of the BTP, which I think stands for Bahn, Tram, and Metro in German, but in French, it's the same. The city is well-connected by train, with major stations like Gare du Nord and Châtelet. The Eiffel Tower is a major landmark, and it's accessible by train from Paris.
I should also mention some of the main attractions. The Eiffel Tower is iconic, along with the Louvre Museum, Notre-Dame Cathedral, and the Sacré-Cœur Basilica. These are must-see spots for tourists.
Now, putting this all together into JSON format. I'll structure it with an "info" key that contains a "capital" object with "name," "population," "area," and "language." Then, an "attractions" array that lists the main points of interest. I'll make sure the numbers are approximate since I don't have exact figures on hand.
Wait, I should check if the population is over 3 million or 3.5. I think it's around 3.5 million as of recent estimates. The area, I'm pretty sure the metropolitan area is about 12,000 km², and the city proper is a bit less, maybe 10,500 km². That seems right.
So, the JSON structure would have the info object with the necessary details, and the attractions array listing the main landmarks. I think that covers everything the user asked for. I should present it clearly, making sure the JSON is properly formatted with commas and quotes.
content: {
"name": "Paris",
"population": 350000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
EBNF#
[4]:
ebnf_grammar = """
root ::= city | description
city ::= "London" | "Paris" | "Berlin" | "Rome"
description ::= city " is " status
status ::= "the capital of " country
country ::= "England" | "France" | "Germany" | "Italy"
"""
response = client.chat.completions.create(
model="deepseek-ai/DeepSeek-R1-Distill-Qwen-7B",
messages=[
{"role": "system", "content": "You are a helpful geography bot."},
{
"role": "user",
"content": "Give me the information of the capital of France.",
},
],
temperature=0,
max_tokens=2048,
extra_body={"ebnf": ebnf_grammar},
)
print_highlight(
f"reasoing_content: {response.choices[0].message.reasoning_content}\n\ncontent: {response.choices[0].message.content}"
)
[2025-04-25 07:50:16 TP0] Prefill batch. #new-seq: 1, #new-token: 21, #cached-token: 1, token usage: 0.00, #running-req: 0, #queue-req: 0,
[2025-04-25 07:50:16 TP0] Decode batch. #running-req: 1, #token: 39, token usage: 0.00, gen throughput (token/s): 89.48, #queue-req: 0,
[2025-04-25 07:50:16 TP0] Decode batch. #running-req: 1, #token: 79, token usage: 0.00, gen throughput (token/s): 101.09, #queue-req: 0,
[2025-04-25 07:50:17 TP0] Decode batch. #running-req: 1, #token: 119, token usage: 0.01, gen throughput (token/s): 101.36, #queue-req: 0,
[2025-04-25 07:50:17] INFO: 127.0.0.1:51230 - "POST /v1/chat/completions HTTP/1.1" 200 OK
content: Paris is the capital of France
Regular expression#
[5]:
response = client.chat.completions.create(
model="deepseek-ai/DeepSeek-R1-Distill-Qwen-7B",
messages=[
{"role": "user", "content": "What is the capital of France?"},
],
temperature=0,
max_tokens=2048,
extra_body={"regex": "(Paris|London)"},
)
print_highlight(
f"reasoing_content: {response.choices[0].message.reasoning_content}\n\ncontent: {response.choices[0].message.content}"
)
[2025-04-25 07:50:17 TP0] Prefill batch. #new-seq: 1, #new-token: 10, #cached-token: 2, token usage: 0.00, #running-req: 0, #queue-req: 0,
[2025-04-25 07:50:17 TP0] Decode batch. #running-req: 1, #token: 33, token usage: 0.00, gen throughput (token/s): 92.06, #queue-req: 0,
[2025-04-25 07:50:17 TP0] Decode batch. #running-req: 1, #token: 73, token usage: 0.00, gen throughput (token/s): 98.69, #queue-req: 0,
[2025-04-25 07:50:18 TP0] Decode batch. #running-req: 1, #token: 113, token usage: 0.01, gen throughput (token/s): 101.77, #queue-req: 0,
[2025-04-25 07:50:18 TP0] Decode batch. #running-req: 1, #token: 153, token usage: 0.01, gen throughput (token/s): 103.47, #queue-req: 0,
[2025-04-25 07:50:19 TP0] Decode batch. #running-req: 1, #token: 193, token usage: 0.01, gen throughput (token/s): 101.60, #queue-req: 0,
[2025-04-25 07:50:19 TP0] Decode batch. #running-req: 1, #token: 233, token usage: 0.01, gen throughput (token/s): 101.65, #queue-req: 0,
[2025-04-25 07:50:19 TP0] Decode batch. #running-req: 1, #token: 273, token usage: 0.01, gen throughput (token/s): 101.62, #queue-req: 0,
[2025-04-25 07:50:20 TP0] Decode batch. #running-req: 1, #token: 313, token usage: 0.02, gen throughput (token/s): 101.69, #queue-req: 0,
[2025-04-25 07:50:20 TP0] Decode batch. #running-req: 1, #token: 353, token usage: 0.02, gen throughput (token/s): 95.04, #queue-req: 0,
[2025-04-25 07:50:20] INFO: 127.0.0.1:51230 - "POST /v1/chat/completions HTTP/1.1" 200 OK
Wait, I think the capital is the official seat of government, right? So maybe Paris is both the capital and the most famous city. But I'm not entirely certain. I recall that some countries have their capital in a different city than their main tourist attraction. For example, I think Brazil's capital is not Rio de Janeiro, which is more famous. So maybe France is like that too.
Let me try to remember any specific information. I think the French government declares Paris as the capital. Yeah, that sounds right. I also remember that the Eiffel Tower is in Paris, and it's a symbol of the country. So if Paris is the capital, then that makes sense. But I'm a bit confused because sometimes people say "the capital of France is Paris," but I also think about other capitals I know, like London for the UK or Berlin for Germany. So maybe it's the same for France.
I should also consider if there are any other capitals in France. I don't think so. France has only one capital city, which is Paris. So, putting it all together, I'm pretty confident that Paris is the capital of France. It's the main government building area, and it's the most well-known city in the country. Yeah, I think that's correct.
content: Paris
Structural Tag#
[6]:
tool_get_current_weather = {
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "The city to find the weather for, e.g. 'San Francisco'",
},
"state": {
"type": "string",
"description": "the two-letter abbreviation for the state that the city is"
" in, e.g. 'CA' which would mean 'California'",
},
"unit": {
"type": "string",
"description": "The unit to fetch the temperature in",
"enum": ["celsius", "fahrenheit"],
},
},
"required": ["city", "state", "unit"],
},
},
}
tool_get_current_date = {
"type": "function",
"function": {
"name": "get_current_date",
"description": "Get the current date and time for a given timezone",
"parameters": {
"type": "object",
"properties": {
"timezone": {
"type": "string",
"description": "The timezone to fetch the current date and time for, e.g. 'America/New_York'",
}
},
"required": ["timezone"],
},
},
}
schema_get_current_weather = tool_get_current_weather["function"]["parameters"]
schema_get_current_date = tool_get_current_date["function"]["parameters"]
def get_messages():
return [
{
"role": "system",
"content": f"""
# Tool Instructions
- Always execute python code in messages that you share.
- When looking for real time information use relevant functions if available else fallback to brave_search
You have access to the following functions:
Use the function 'get_current_weather' to: Get the current weather in a given location
{tool_get_current_weather["function"]}
Use the function 'get_current_date' to: Get the current date and time for a given timezone
{tool_get_current_date["function"]}
If a you choose to call a function ONLY reply in the following format:
<{{start_tag}}={{function_name}}>{{parameters}}{{end_tag}}
where
start_tag => `<function`
parameters => a JSON dict with the function argument name as key and function argument value as value.
end_tag => `</function>`
Here is an example,
<function=example_function_name>{{"example_name": "example_value"}}</function>
Reminder:
- Function calls MUST follow the specified format
- Required parameters MUST be specified
- Only call one function at a time
- Put the entire function call reply on one line
- Always add your sources when using search results to answer the user query
You are a helpful assistant.""",
},
{
"role": "user",
"content": "You are in New York. Please get the current date and time, and the weather.",
},
]
messages = get_messages()
response = client.chat.completions.create(
model="deepseek-ai/DeepSeek-R1-Distill-Qwen-7B",
messages=messages,
response_format={
"type": "structural_tag",
"max_new_tokens": 2048,
"structures": [
{
"begin": "<function=get_current_weather>",
"schema": schema_get_current_weather,
"end": "</function>",
},
{
"begin": "<function=get_current_date>",
"schema": schema_get_current_date,
"end": "</function>",
},
],
"triggers": ["<function="],
},
)
print_highlight(
f"reasoing_content: {response.choices[0].message.reasoning_content}\n\ncontent: {response.choices[0].message.content}"
)
[2025-04-25 07:50:21 TP0] Prefill batch. #new-seq: 1, #new-token: 471, #cached-token: 1, token usage: 0.00, #running-req: 0, #queue-req: 0,
[2025-04-25 07:50:21 TP0] Decode batch. #running-req: 1, #token: 495, token usage: 0.02, gen throughput (token/s): 44.41, #queue-req: 0,
[2025-04-25 07:50:22 TP0] Decode batch. #running-req: 1, #token: 535, token usage: 0.03, gen throughput (token/s): 100.38, #queue-req: 0,
[2025-04-25 07:50:22 TP0] Decode batch. #running-req: 1, #token: 575, token usage: 0.03, gen throughput (token/s): 99.73, #queue-req: 0,
[2025-04-25 07:50:22 TP0] Decode batch. #running-req: 1, #token: 615, token usage: 0.03, gen throughput (token/s): 96.90, #queue-req: 0,
[2025-04-25 07:50:23 TP0] Decode batch. #running-req: 1, #token: 655, token usage: 0.03, gen throughput (token/s): 100.98, #queue-req: 0,
[2025-04-25 07:50:23 TP0] Decode batch. #running-req: 1, #token: 695, token usage: 0.03, gen throughput (token/s): 98.88, #queue-req: 0,
[2025-04-25 07:50:24 TP0] Decode batch. #running-req: 1, #token: 735, token usage: 0.04, gen throughput (token/s): 87.04, #queue-req: 0,
[2025-04-25 07:50:24 TP0] Decode batch. #running-req: 1, #token: 775, token usage: 0.04, gen throughput (token/s): 98.61, #queue-req: 0,
[2025-04-25 07:50:24 TP0] Decode batch. #running-req: 1, #token: 815, token usage: 0.04, gen throughput (token/s): 99.72, #queue-req: 0,
[2025-04-25 07:50:25 TP0] Decode batch. #running-req: 1, #token: 855, token usage: 0.04, gen throughput (token/s): 98.69, #queue-req: 0,
[2025-04-25 07:50:25 TP0] Decode batch. #running-req: 1, #token: 895, token usage: 0.04, gen throughput (token/s): 96.24, #queue-req: 0,
[2025-04-25 07:50:25] INFO: 127.0.0.1:51230 - "POST /v1/chat/completions HTTP/1.1" 200 OK
First, I need to determine which functions to use. The user mentioned they are in New York, so I should get the current date and time for that location. Looking at the functions provided, there's 'get_current_date' which requires a timezone parameter. New York is in the 'America/New_York' timezone, so I'll use that.
Next, for the weather, the user wants the current conditions in New York. The function 'get_current_weather' requires a city, state, and unit. I know the city is New York, but I need the state abbreviation. New York is NY, so the state is 'NY'. The unit can be either Celsius or Fahrenheit; the user didn't specify, so I'll include both options in the parameters to show flexibility.
Now, I'll structure the function calls. I'll start with 'get_current_date', providing the timezone as 'America/New_York'. Then, I'll call 'get_current_weather' with city, state, and both units. This way, the user gets both the date/time and the weather in their preferred temperature unit.
I should make sure each function call is on its own line, following the required format strictly. I'll include the parameters as JSON objects within each function call. Also, I'll add sources at the end to indicate where the timezone information comes from, as it's sourced from Wikipedia.
Putting it all together, I'll write the two function calls: one for the date and time, and another for the weather with both units. This should provide the user with the information they're seeking in a clear and organized manner.
content:
Native API and SGLang Runtime (SRT)#
JSON#
Using Pydantic
[7]:
import requests
from pydantic import BaseModel, Field
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Qwen-7B")
# Define the schema using Pydantic
class CapitalInfo(BaseModel):
name: str = Field(..., pattern=r"^\w+$", description="Name of the capital city")
population: int = Field(..., description="Population of the capital city")
messages = [
{
"role": "user",
"content": "Here is the information of the capital of France in the JSON format.\n",
}
]
text = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
# Make API request
response = requests.post(
f"http://localhost:{port}/generate",
json={
"text": text,
"sampling_params": {
"temperature": 0,
"max_new_tokens": 2048,
"json_schema": json.dumps(CapitalInfo.model_json_schema()),
},
},
)
print(response.json())
reasoing_content = response.json()["text"].split("</think>")[0]
content = response.json()["text"].split("</think>")[1]
print_highlight(f"reasoing_content: {reasoing_content}\n\ncontent: {content}")
[2025-04-25 07:50:26 TP0] Prefill batch. #new-seq: 1, #new-token: 19, #cached-token: 1, token usage: 0.00, #running-req: 0, #queue-req: 0,
[2025-04-25 07:50:26 TP0] Decode batch. #running-req: 1, #token: 44, token usage: 0.00, gen throughput (token/s): 43.97, #queue-req: 0,
[2025-04-25 07:50:27 TP0] Decode batch. #running-req: 1, #token: 84, token usage: 0.00, gen throughput (token/s): 103.36, #queue-req: 0,
[2025-04-25 07:50:27 TP0] Decode batch. #running-req: 1, #token: 124, token usage: 0.01, gen throughput (token/s): 101.85, #queue-req: 0,
[2025-04-25 07:50:27 TP0] Decode batch. #running-req: 1, #token: 164, token usage: 0.01, gen throughput (token/s): 99.63, #queue-req: 0,
[2025-04-25 07:50:28 TP0] Decode batch. #running-req: 1, #token: 204, token usage: 0.01, gen throughput (token/s): 103.20, #queue-req: 0,
[2025-04-25 07:50:28 TP0] Decode batch. #running-req: 1, #token: 244, token usage: 0.01, gen throughput (token/s): 101.20, #queue-req: 0,
[2025-04-25 07:50:29 TP0] Decode batch. #running-req: 1, #token: 284, token usage: 0.01, gen throughput (token/s): 101.51, #queue-req: 0,
[2025-04-25 07:50:29 TP0] Decode batch. #running-req: 1, #token: 324, token usage: 0.02, gen throughput (token/s): 99.44, #queue-req: 0,
[2025-04-25 07:50:29 TP0] Decode batch. #running-req: 1, #token: 364, token usage: 0.02, gen throughput (token/s): 101.70, #queue-req: 0,
[2025-04-25 07:50:30 TP0] Decode batch. #running-req: 1, #token: 404, token usage: 0.02, gen throughput (token/s): 102.66, #queue-req: 0,
[2025-04-25 07:50:30 TP0] Decode batch. #running-req: 1, #token: 444, token usage: 0.02, gen throughput (token/s): 98.66, #queue-req: 0,
[2025-04-25 07:50:31 TP0] Decode batch. #running-req: 1, #token: 484, token usage: 0.02, gen throughput (token/s): 100.15, #queue-req: 0,
[2025-04-25 07:50:31 TP0] Decode batch. #running-req: 1, #token: 524, token usage: 0.03, gen throughput (token/s): 100.93, #queue-req: 0,
[2025-04-25 07:50:31 TP0] Decode batch. #running-req: 1, #token: 564, token usage: 0.03, gen throughput (token/s): 99.02, #queue-req: 0,
[2025-04-25 07:50:32 TP0] Decode batch. #running-req: 1, #token: 604, token usage: 0.03, gen throughput (token/s): 98.79, #queue-req: 0,
[2025-04-25 07:50:32 TP0] Decode batch. #running-req: 1, #token: 644, token usage: 0.03, gen throughput (token/s): 99.73, #queue-req: 0,
[2025-04-25 07:50:33 TP0] Decode batch. #running-req: 1, #token: 684, token usage: 0.03, gen throughput (token/s): 94.85, #queue-req: 0,
[2025-04-25 07:50:33 TP0] Decode batch. #running-req: 1, #token: 724, token usage: 0.04, gen throughput (token/s): 94.42, #queue-req: 0,
[2025-04-25 07:50:33 TP0] Decode batch. #running-req: 1, #token: 764, token usage: 0.04, gen throughput (token/s): 100.15, #queue-req: 0,
[2025-04-25 07:50:34 TP0] Decode batch. #running-req: 1, #token: 804, token usage: 0.04, gen throughput (token/s): 97.99, #queue-req: 0,
[2025-04-25 07:50:34 TP0] Decode batch. #running-req: 1, #token: 844, token usage: 0.04, gen throughput (token/s): 95.85, #queue-req: 0,
[2025-04-25 07:50:35 TP0] Decode batch. #running-req: 1, #token: 884, token usage: 0.04, gen throughput (token/s): 97.10, #queue-req: 0,
[2025-04-25 07:50:35 TP0] Decode batch. #running-req: 1, #token: 924, token usage: 0.05, gen throughput (token/s): 99.39, #queue-req: 0,
[2025-04-25 07:50:35 TP0] Decode batch. #running-req: 1, #token: 964, token usage: 0.05, gen throughput (token/s): 96.26, #queue-req: 0,
[2025-04-25 07:50:36 TP0] Decode batch. #running-req: 1, #token: 1004, token usage: 0.05, gen throughput (token/s): 97.69, #queue-req: 0,
[2025-04-25 07:50:36 TP0] Decode batch. #running-req: 1, #token: 1044, token usage: 0.05, gen throughput (token/s): 94.03, #queue-req: 0,
[2025-04-25 07:50:37 TP0] Decode batch. #running-req: 1, #token: 1084, token usage: 0.05, gen throughput (token/s): 97.95, #queue-req: 0,
[2025-04-25 07:50:37 TP0] Decode batch. #running-req: 1, #token: 1124, token usage: 0.05, gen throughput (token/s): 96.57, #queue-req: 0,
[2025-04-25 07:50:37 TP0] Decode batch. #running-req: 1, #token: 1164, token usage: 0.06, gen throughput (token/s): 97.20, #queue-req: 0,
[2025-04-25 07:50:38 TP0] Decode batch. #running-req: 1, #token: 1204, token usage: 0.06, gen throughput (token/s): 96.12, #queue-req: 0,
[2025-04-25 07:50:38 TP0] Decode batch. #running-req: 1, #token: 1244, token usage: 0.06, gen throughput (token/s): 97.16, #queue-req: 0,
[2025-04-25 07:50:39 TP0] Decode batch. #running-req: 1, #token: 1284, token usage: 0.06, gen throughput (token/s): 95.35, #queue-req: 0,
[2025-04-25 07:50:39 TP0] Decode batch. #running-req: 1, #token: 1324, token usage: 0.06, gen throughput (token/s): 98.13, #queue-req: 0,
[2025-04-25 07:50:40 TP0] Decode batch. #running-req: 1, #token: 1364, token usage: 0.07, gen throughput (token/s): 100.10, #queue-req: 0,
[2025-04-25 07:50:40 TP0] Decode batch. #running-req: 1, #token: 1404, token usage: 0.07, gen throughput (token/s): 98.49, #queue-req: 0,
[2025-04-25 07:50:40 TP0] Decode batch. #running-req: 1, #token: 1444, token usage: 0.07, gen throughput (token/s): 95.99, #queue-req: 0,
[2025-04-25 07:50:41 TP0] Decode batch. #running-req: 1, #token: 1484, token usage: 0.07, gen throughput (token/s): 96.17, #queue-req: 0,
[2025-04-25 07:50:41 TP0] Decode batch. #running-req: 1, #token: 1524, token usage: 0.07, gen throughput (token/s): 100.71, #queue-req: 0,
[2025-04-25 07:50:42 TP0] Decode batch. #running-req: 1, #token: 1564, token usage: 0.08, gen throughput (token/s): 97.64, #queue-req: 0,
[2025-04-25 07:50:42 TP0] Decode batch. #running-req: 1, #token: 1604, token usage: 0.08, gen throughput (token/s): 97.62, #queue-req: 0,
[2025-04-25 07:50:42 TP0] Decode batch. #running-req: 1, #token: 1644, token usage: 0.08, gen throughput (token/s): 96.85, #queue-req: 0,
[2025-04-25 07:50:43 TP0] Decode batch. #running-req: 1, #token: 1684, token usage: 0.08, gen throughput (token/s): 97.63, #queue-req: 0,
[2025-04-25 07:50:43 TP0] Decode batch. #running-req: 1, #token: 1724, token usage: 0.08, gen throughput (token/s): 96.73, #queue-req: 0,
[2025-04-25 07:50:44 TP0] Decode batch. #running-req: 1, #token: 1764, token usage: 0.09, gen throughput (token/s): 95.42, #queue-req: 0,
[2025-04-25 07:50:44 TP0] Decode batch. #running-req: 1, #token: 1804, token usage: 0.09, gen throughput (token/s): 96.98, #queue-req: 0,
[2025-04-25 07:50:44 TP0] Decode batch. #running-req: 1, #token: 1844, token usage: 0.09, gen throughput (token/s): 96.47, #queue-req: 0,
[2025-04-25 07:50:45 TP0] Decode batch. #running-req: 1, #token: 1884, token usage: 0.09, gen throughput (token/s): 100.18, #queue-req: 0,
[2025-04-25 07:50:45 TP0] Decode batch. #running-req: 1, #token: 1924, token usage: 0.09, gen throughput (token/s): 97.01, #queue-req: 0,
[2025-04-25 07:50:46 TP0] Decode batch. #running-req: 1, #token: 1964, token usage: 0.10, gen throughput (token/s): 97.61, #queue-req: 0,
[2025-04-25 07:50:46 TP0] Decode batch. #running-req: 1, #token: 2004, token usage: 0.10, gen throughput (token/s): 97.07, #queue-req: 0,
[2025-04-25 07:50:47 TP0] Decode batch. #running-req: 1, #token: 2044, token usage: 0.10, gen throughput (token/s): 93.65, #queue-req: 0,
[2025-04-25 07:50:47] INFO: 127.0.0.1:50688 - "POST /generate HTTP/1.1" 200 OK
{'text': 'Okay, so I need to provide the information about the capital of France in JSON format. Hmm, I\'m not entirely sure about all the details, but I\'ll try to think it through.\n\nFirst, I know that the capital of France is Paris. That\'s pretty much a given, right? But I should double-check that. Maybe I can recall any other capitals I know. London is the capital of the UK, Rome is Italy, and maybe Tokyo is Japan\'s. Yeah, Paris seems correct for France.\n\nNow, moving on to the population. I think Paris is a very large city, but I\'m not sure of the exact number. I remember it\'s over 3 million, but I\'m not certain. Maybe around 3.5 million? I should probably look that up, but since I can\'t right now, I\'ll go with 3,500,000 as an estimate.\n\nNext, the area. Paris is a big city, but I think it\'s not as large as Tokyo or London. Maybe around 10 square kilometers? I\'m not sure, but that seems plausible. I\'ll note that as 10,000,000 square meters.\n\nCoordinates are next. Paris is in France, so the country code is "FR". The latitude and longitude... I think the approximate coordinates are around 48.8566° N latitude and 2.3522° E longitude. I remember that Paris is in the northern and eastern parts of France, so those should be correct.\n\nOfficial languages. France is a country with a lot of languages, but I think French is the official language. I\'m not sure if they have others, but French is definitely the primary one. Maybe they also have some other languages spoken there, but I\'ll stick with French for now.\n\nOfficial currency is the euro, right? Yeah, I\'m pretty sure that\'s correct. They use the euro as their main currency.\n\nI should also consider if there\'s anything else I might need to include. Maybe the capital\'s nickname? I think Paris is called the "City of Light" or something like that. But the user didn\'t ask for that, so maybe it\'s not necessary.\n\nPutting it all together, I\'ll structure the JSON with the key-value pairs. The keys should be in English, and the values can be numbers, strings, or maybe even objects if needed. Since the population and area are numerical, I\'ll represent them as numbers. The rest can be strings.\n\nWait, but in JSON, numbers don\'t have commas, right? So 3,500,000 should be written as 3500000 without the comma. Same with the area, 10,000,000 square meters becomes 10000000.\n\nLet me make sure I\'m formatting the JSON correctly. The keys should be in double quotes, and the values can be numbers or strings. So the structure would be something like:\n\n{\n "capital": "Paris",\n "population": 3500000,\n "area": 10000000,\n "country": "FR",\n "coordinates": {\n "latitude": 48.8566,\n "longitude": 2.3522\n },\n "languages": "French",\n "currency": "Euro"\n}\n\nWait, but the coordinates are just two numbers, so maybe I don\'t need a nested object. So it would be:\n\n{\n "capital": "Paris",\n "population": 3500000,\n "area": 10000000,\n "country": "FR",\n "coordinates": {\n "latitude": 48.8566,\n "longitude": 2.3522\n },\n "languages": "French",\n "currency": "Euro"\n}\n\nThat looks better. I think that\'s all the information I need. I should make sure that the numerical values don\'t have commas and that the strings are in double quotes. Also, the keys should be in lowercase letters as per JSON standards.\n\nI think I\'ve covered everything. Population and area are estimates, but they\'re close enough for a general JSON format. I don\'t think I need to include more details unless specified. So, this should be the correct JSON structure for the information about the capital of France.\n</think>{\n\n"name": "Paris",\n"population": 3500000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000', 'meta_info': {'id': 'd6ba5824b5554e4aa042624d6b87d4e3', 'finish_reason': {'type': 'length', 'length': 2048}, 'prompt_tokens': 20, 'completion_tokens': 2048, 'cached_tokens': 1, 'e2e_latency': 20.88463568687439}}
First, I know that the capital of France is Paris. That's pretty much a given, right? But I should double-check that. Maybe I can recall any other capitals I know. London is the capital of the UK, Rome is Italy, and maybe Tokyo is Japan's. Yeah, Paris seems correct for France.
Now, moving on to the population. I think Paris is a very large city, but I'm not sure of the exact number. I remember it's over 3 million, but I'm not certain. Maybe around 3.5 million? I should probably look that up, but since I can't right now, I'll go with 3,500,000 as an estimate.
Next, the area. Paris is a big city, but I think it's not as large as Tokyo or London. Maybe around 10 square kilometers? I'm not sure, but that seems plausible. I'll note that as 10,000,000 square meters.
Coordinates are next. Paris is in France, so the country code is "FR". The latitude and longitude... I think the approximate coordinates are around 48.8566° N latitude and 2.3522° E longitude. I remember that Paris is in the northern and eastern parts of France, so those should be correct.
Official languages. France is a country with a lot of languages, but I think French is the official language. I'm not sure if they have others, but French is definitely the primary one. Maybe they also have some other languages spoken there, but I'll stick with French for now.
Official currency is the euro, right? Yeah, I'm pretty sure that's correct. They use the euro as their main currency.
I should also consider if there's anything else I might need to include. Maybe the capital's nickname? I think Paris is called the "City of Light" or something like that. But the user didn't ask for that, so maybe it's not necessary.
Putting it all together, I'll structure the JSON with the key-value pairs. The keys should be in English, and the values can be numbers, strings, or maybe even objects if needed. Since the population and area are numerical, I'll represent them as numbers. The rest can be strings.
Wait, but in JSON, numbers don't have commas, right? So 3,500,000 should be written as 3500000 without the comma. Same with the area, 10,000,000 square meters becomes 10000000.
Let me make sure I'm formatting the JSON correctly. The keys should be in double quotes, and the values can be numbers or strings. So the structure would be something like:
{
"capital": "Paris",
"population": 3500000,
"area": 10000000,
"country": "FR",
"coordinates": {
"latitude": 48.8566,
"longitude": 2.3522
},
"languages": "French",
"currency": "Euro"
}
Wait, but the coordinates are just two numbers, so maybe I don't need a nested object. So it would be:
{
"capital": "Paris",
"population": 3500000,
"area": 10000000,
"country": "FR",
"coordinates": {
"latitude": 48.8566,
"longitude": 2.3522
},
"languages": "French",
"currency": "Euro"
}
That looks better. I think that's all the information I need. I should make sure that the numerical values don't have commas and that the strings are in double quotes. Also, the keys should be in lowercase letters as per JSON standards.
I think I've covered everything. Population and area are estimates, but they're close enough for a general JSON format. I don't think I need to include more details unless specified. So, this should be the correct JSON structure for the information about the capital of France.
content: {
"name": "Paris",
"population": 3500000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
JSON Schema Directly
[8]:
json_schema = json.dumps(
{
"type": "object",
"properties": {
"name": {"type": "string", "pattern": "^[\\w]+$"},
"population": {"type": "integer"},
},
"required": ["name", "population"],
}
)
# JSON
text = tokenizer.apply_chat_template(text, tokenize=False, add_generation_prompt=True)
response = requests.post(
f"http://localhost:{port}/generate",
json={
"text": text,
"sampling_params": {
"temperature": 0,
"max_new_tokens": 2048,
"json_schema": json_schema,
},
},
)
print_highlight(response.json())
[2025-04-25 07:50:47 TP0] Prefill batch. #new-seq: 1, #new-token: 3, #cached-token: 2, token usage: 0.00, #running-req: 0, #queue-req: 0,
[2025-04-25 07:50:47 TP0] Decode batch. #running-req: 1, #token: 21, token usage: 0.00, gen throughput (token/s): 94.03, #queue-req: 0,
[2025-04-25 07:50:47 TP0] Decode batch. #running-req: 1, #token: 61, token usage: 0.00, gen throughput (token/s): 99.08, #queue-req: 0,
[2025-04-25 07:50:48 TP0] Decode batch. #running-req: 1, #token: 101, token usage: 0.00, gen throughput (token/s): 99.48, #queue-req: 0,
[2025-04-25 07:50:48 TP0] Decode batch. #running-req: 1, #token: 141, token usage: 0.01, gen throughput (token/s): 99.76, #queue-req: 0,
[2025-04-25 07:50:49 TP0] Decode batch. #running-req: 1, #token: 181, token usage: 0.01, gen throughput (token/s): 99.95, #queue-req: 0,
[2025-04-25 07:50:49 TP0] Decode batch. #running-req: 1, #token: 221, token usage: 0.01, gen throughput (token/s): 100.51, #queue-req: 0,
[2025-04-25 07:50:49 TP0] Decode batch. #running-req: 1, #token: 261, token usage: 0.01, gen throughput (token/s): 96.78, #queue-req: 0,
[2025-04-25 07:50:50 TP0] Decode batch. #running-req: 1, #token: 301, token usage: 0.01, gen throughput (token/s): 103.53, #queue-req: 0,
[2025-04-25 07:50:50 TP0] Decode batch. #running-req: 1, #token: 341, token usage: 0.02, gen throughput (token/s): 100.48, #queue-req: 0,
[2025-04-25 07:50:51 TP0] Decode batch. #running-req: 1, #token: 381, token usage: 0.02, gen throughput (token/s): 100.52, #queue-req: 0,
[2025-04-25 07:50:51 TP0] Decode batch. #running-req: 1, #token: 421, token usage: 0.02, gen throughput (token/s): 99.18, #queue-req: 0,
[2025-04-25 07:50:51 TP0] Decode batch. #running-req: 1, #token: 461, token usage: 0.02, gen throughput (token/s): 100.09, #queue-req: 0,
[2025-04-25 07:50:52 TP0] Decode batch. #running-req: 1, #token: 501, token usage: 0.02, gen throughput (token/s): 100.66, #queue-req: 0,
[2025-04-25 07:50:52 TP0] Decode batch. #running-req: 1, #token: 541, token usage: 0.03, gen throughput (token/s): 99.21, #queue-req: 0,
[2025-04-25 07:50:53 TP0] Decode batch. #running-req: 1, #token: 581, token usage: 0.03, gen throughput (token/s): 100.10, #queue-req: 0,
[2025-04-25 07:50:53 TP0] Decode batch. #running-req: 1, #token: 621, token usage: 0.03, gen throughput (token/s): 96.18, #queue-req: 0,
[2025-04-25 07:50:53 TP0] Decode batch. #running-req: 1, #token: 661, token usage: 0.03, gen throughput (token/s): 98.15, #queue-req: 0,
[2025-04-25 07:50:54 TP0] Decode batch. #running-req: 1, #token: 701, token usage: 0.03, gen throughput (token/s): 98.43, #queue-req: 0,
[2025-04-25 07:50:54 TP0] Decode batch. #running-req: 1, #token: 741, token usage: 0.04, gen throughput (token/s): 100.00, #queue-req: 0,
[2025-04-25 07:50:55 TP0] Decode batch. #running-req: 1, #token: 781, token usage: 0.04, gen throughput (token/s): 100.50, #queue-req: 0,
[2025-04-25 07:50:55] INFO: 127.0.0.1:40004 - "POST /generate HTTP/1.1" 200 OK
EBNF#
[9]:
response = requests.post(
f"http://localhost:{port}/generate",
json={
"text": "Give me the information of the capital of France.",
"sampling_params": {
"max_new_tokens": 2048,
"temperature": 0,
"n": 3,
"ebnf": (
"root ::= city | description\n"
'city ::= "London" | "Paris" | "Berlin" | "Rome"\n'
'description ::= city " is " status\n'
'status ::= "the capital of " country\n'
'country ::= "England" | "France" | "Germany" | "Italy"'
),
},
"stream": False,
"return_logprob": False,
},
)
print(response.json())
[2025-04-25 07:50:55 TP0] Prefill batch. #new-seq: 1, #new-token: 10, #cached-token: 1, token usage: 0.00, #running-req: 0, #queue-req: 0,
[2025-04-25 07:50:55 TP0] Prefill batch. #new-seq: 3, #new-token: 3, #cached-token: 30, token usage: 0.00, #running-req: 0, #queue-req: 0,
[2025-04-25 07:50:55 TP0] Decode batch. #running-req: 3, #token: 86, token usage: 0.00, gen throughput (token/s): 144.01, #queue-req: 0,
[2025-04-25 07:50:56 TP0] Decode batch. #running-req: 3, #token: 206, token usage: 0.01, gen throughput (token/s): 284.98, #queue-req: 0,
[2025-04-25 07:50:56 TP0] Decode batch. #running-req: 3, #token: 326, token usage: 0.02, gen throughput (token/s): 280.95, #queue-req: 0,
[2025-04-25 07:50:56 TP0] Decode batch. #running-req: 3, #token: 446, token usage: 0.02, gen throughput (token/s): 285.86, #queue-req: 0,
[2025-04-25 07:50:57 TP0] Decode batch. #running-req: 3, #token: 566, token usage: 0.03, gen throughput (token/s): 285.00, #queue-req: 0,
[2025-04-25 07:50:57] INFO: 127.0.0.1:40006 - "POST /generate HTTP/1.1" 200 OK
[{'text': "\nThe capital of France is Paris.\n\nThat's all the information I have.\n\nOkay, so I need to figure out the capital of France. I know that Paris is the capital, but I'm not entirely sure. Let me think about why I think that. I've heard it mentioned a lot, especially in movies and TV shows. People often go there for business or tourism. Also, I remember learning in school that Paris is a major city in France, known for landmarks like the Eiffel Tower and the Louvre Museum. Those places are famous worldwide, which makes me think that Paris is indeed the capital. Maybe I can cross-check this with some other sources or my notes. Wait, I don't have any other information right now, but based on what I know, Paris is the capital of France. I don't recall any other major city in France being referred to as the capital. So, I'm pretty confident that Paris is correct.\n</think>Paris is the capital of France", 'meta_info': {'id': 'fb5d4d851fd042949e3fa1eda76412ed', 'finish_reason': {'type': 'stop', 'matched': 151643}, 'prompt_tokens': 11, 'completion_tokens': 201, 'cached_tokens': 10, 'e2e_latency': 2.321866273880005}}, {'text': "\nThe capital of France is Paris.\n\nThat's all the information I have.\n\nOkay, so I need to figure out the capital of France. I know that Paris is the capital, but I'm not entirely sure. Let me think about why I think that. I've heard it mentioned a lot, especially in movies and TV shows. People often go there for business or tourism. Also, I remember learning in school that Paris is a major city in France, known for landmarks like the Eiffel Tower and the Louvre Museum. Those places are famous worldwide, which makes me think that Paris is indeed the capital. Maybe I can cross-check this with some other sources or my notes. Wait, I don't have any other information right now, but based on what I know, Paris is the capital of France. I don't recall any other major city in France being referred to as the capital. So, I'm pretty confident that Paris is correct.\n</think>Paris is the capital of France", 'meta_info': {'id': 'd51a30264fb74159b6fcabebd2bf6f02', 'finish_reason': {'type': 'stop', 'matched': 151643}, 'prompt_tokens': 11, 'completion_tokens': 201, 'cached_tokens': 10, 'e2e_latency': 2.321871519088745}}, {'text': "\nThe capital of France is Paris.\n\nThat's all the information I have.\n\nOkay, so I need to figure out the capital of France. I know that Paris is the capital, but I'm not entirely sure. Let me think about why I think that. I've heard it mentioned a lot, especially in movies and TV shows. People often go there for business or tourism. Also, I remember learning in school that Paris is a major city in France, known for landmarks like the Eiffel Tower and the Louvre Museum. Those places are famous worldwide, which makes me think that Paris is indeed the capital. Maybe I can cross-check this with some other sources or my notes. Wait, I don't have any other information right now, but based on what I know, Paris is the capital of France. I don't recall any other major city in France being referred to as the capital. So, I'm pretty confident that Paris is correct.\n</think>Paris is the capital of France", 'meta_info': {'id': '8bd1eee9540c4a52bf8e57887526ce70', 'finish_reason': {'type': 'stop', 'matched': 151643}, 'prompt_tokens': 11, 'completion_tokens': 201, 'cached_tokens': 10, 'e2e_latency': 2.3218741416931152}}]
Regular expression#
[10]:
response = requests.post(
f"http://localhost:{port}/generate",
json={
"text": "Paris is the capital of",
"sampling_params": {
"temperature": 0,
"max_new_tokens": 2048,
"regex": "(France|England)",
},
},
)
print(response.json())
[2025-04-25 07:50:57 TP0] Prefill batch. #new-seq: 1, #new-token: 5, #cached-token: 1, token usage: 0.00, #running-req: 0, #queue-req: 0,
[2025-04-25 07:50:57 TP0] Decode batch. #running-req: 1, #token: 30, token usage: 0.00, gen throughput (token/s): 170.85, #queue-req: 0,
[2025-04-25 07:50:58 TP0] Decode batch. #running-req: 1, #token: 70, token usage: 0.00, gen throughput (token/s): 99.77, #queue-req: 0,
[2025-04-25 07:50:58 TP0] Decode batch. #running-req: 1, #token: 110, token usage: 0.01, gen throughput (token/s): 96.92, #queue-req: 0,
[2025-04-25 07:50:59 TP0] Decode batch. #running-req: 1, #token: 150, token usage: 0.01, gen throughput (token/s): 98.76, #queue-req: 0,
[2025-04-25 07:50:59 TP0] Decode batch. #running-req: 1, #token: 190, token usage: 0.01, gen throughput (token/s): 98.51, #queue-req: 0,
[2025-04-25 07:50:59 TP0] Decode batch. #running-req: 1, #token: 230, token usage: 0.01, gen throughput (token/s): 102.93, #queue-req: 0,
[2025-04-25 07:51:00 TP0] Decode batch. #running-req: 1, #token: 270, token usage: 0.01, gen throughput (token/s): 100.95, #queue-req: 0,
[2025-04-25 07:51:00 TP0] Decode batch. #running-req: 1, #token: 310, token usage: 0.02, gen throughput (token/s): 98.20, #queue-req: 0,
[2025-04-25 07:51:01 TP0] Decode batch. #running-req: 1, #token: 350, token usage: 0.02, gen throughput (token/s): 100.40, #queue-req: 0,
[2025-04-25 07:51:01 TP0] Decode batch. #running-req: 1, #token: 390, token usage: 0.02, gen throughput (token/s): 100.01, #queue-req: 0,
[2025-04-25 07:51:01 TP0] Decode batch. #running-req: 1, #token: 430, token usage: 0.02, gen throughput (token/s): 103.33, #queue-req: 0,
[2025-04-25 07:51:02 TP0] Decode batch. #running-req: 1, #token: 470, token usage: 0.02, gen throughput (token/s): 102.22, #queue-req: 0,
[2025-04-25 07:51:02 TP0] Decode batch. #running-req: 1, #token: 510, token usage: 0.02, gen throughput (token/s): 100.53, #queue-req: 0,
[2025-04-25 07:51:03 TP0] Decode batch. #running-req: 1, #token: 550, token usage: 0.03, gen throughput (token/s): 98.29, #queue-req: 0,
[2025-04-25 07:51:03 TP0] Decode batch. #running-req: 1, #token: 590, token usage: 0.03, gen throughput (token/s): 99.57, #queue-req: 0,
[2025-04-25 07:51:03 TP0] Decode batch. #running-req: 1, #token: 630, token usage: 0.03, gen throughput (token/s): 99.85, #queue-req: 0,
[2025-04-25 07:51:04 TP0] Decode batch. #running-req: 1, #token: 670, token usage: 0.03, gen throughput (token/s): 99.20, #queue-req: 0,
[2025-04-25 07:51:04 TP0] Decode batch. #running-req: 1, #token: 710, token usage: 0.03, gen throughput (token/s): 97.01, #queue-req: 0,
[2025-04-25 07:51:05 TP0] Decode batch. #running-req: 1, #token: 750, token usage: 0.04, gen throughput (token/s): 100.21, #queue-req: 0,
[2025-04-25 07:51:05 TP0] Decode batch. #running-req: 1, #token: 790, token usage: 0.04, gen throughput (token/s): 99.89, #queue-req: 0,
[2025-04-25 07:51:05 TP0] Decode batch. #running-req: 1, #token: 830, token usage: 0.04, gen throughput (token/s): 98.93, #queue-req: 0,
[2025-04-25 07:51:06 TP0] Decode batch. #running-req: 1, #token: 870, token usage: 0.04, gen throughput (token/s): 97.00, #queue-req: 0,
[2025-04-25 07:51:06 TP0] Decode batch. #running-req: 1, #token: 910, token usage: 0.04, gen throughput (token/s): 96.46, #queue-req: 0,
[2025-04-25 07:51:07 TP0] Decode batch. #running-req: 1, #token: 950, token usage: 0.05, gen throughput (token/s): 100.79, #queue-req: 0,
[2025-04-25 07:51:07 TP0] Decode batch. #running-req: 1, #token: 990, token usage: 0.05, gen throughput (token/s): 103.02, #queue-req: 0,
[2025-04-25 07:51:07 TP0] Decode batch. #running-req: 1, #token: 1030, token usage: 0.05, gen throughput (token/s): 97.87, #queue-req: 0,
[2025-04-25 07:51:08 TP0] Decode batch. #running-req: 1, #token: 1070, token usage: 0.05, gen throughput (token/s): 100.18, #queue-req: 0,
[2025-04-25 07:51:08 TP0] Decode batch. #running-req: 1, #token: 1110, token usage: 0.05, gen throughput (token/s): 99.80, #queue-req: 0,
[2025-04-25 07:51:09 TP0] Decode batch. #running-req: 1, #token: 1150, token usage: 0.06, gen throughput (token/s): 100.02, #queue-req: 0,
[2025-04-25 07:51:09 TP0] Decode batch. #running-req: 1, #token: 1190, token usage: 0.06, gen throughput (token/s): 100.10, #queue-req: 0,
[2025-04-25 07:51:09 TP0] Decode batch. #running-req: 1, #token: 1230, token usage: 0.06, gen throughput (token/s): 99.78, #queue-req: 0,
[2025-04-25 07:51:10 TP0] Decode batch. #running-req: 1, #token: 1270, token usage: 0.06, gen throughput (token/s): 100.02, #queue-req: 0,
[2025-04-25 07:51:10 TP0] Decode batch. #running-req: 1, #token: 1310, token usage: 0.06, gen throughput (token/s): 99.22, #queue-req: 0,
[2025-04-25 07:51:11 TP0] Decode batch. #running-req: 1, #token: 1350, token usage: 0.07, gen throughput (token/s): 99.98, #queue-req: 0,
[2025-04-25 07:51:11 TP0] Decode batch. #running-req: 1, #token: 1390, token usage: 0.07, gen throughput (token/s): 98.30, #queue-req: 0,
[2025-04-25 07:51:11 TP0] Decode batch. #running-req: 1, #token: 1430, token usage: 0.07, gen throughput (token/s): 88.62, #queue-req: 0,
[2025-04-25 07:51:12 TP0] Decode batch. #running-req: 1, #token: 1470, token usage: 0.07, gen throughput (token/s): 97.94, #queue-req: 0,
[2025-04-25 07:51:12 TP0] Decode batch. #running-req: 1, #token: 1510, token usage: 0.07, gen throughput (token/s): 98.39, #queue-req: 0,
[2025-04-25 07:51:13 TP0] Decode batch. #running-req: 1, #token: 1550, token usage: 0.08, gen throughput (token/s): 97.88, #queue-req: 0,
[2025-04-25 07:51:13 TP0] Decode batch. #running-req: 1, #token: 1590, token usage: 0.08, gen throughput (token/s): 97.19, #queue-req: 0,
[2025-04-25 07:51:13 TP0] Decode batch. #running-req: 1, #token: 1630, token usage: 0.08, gen throughput (token/s): 97.51, #queue-req: 0,
[2025-04-25 07:51:14 TP0] Decode batch. #running-req: 1, #token: 1670, token usage: 0.08, gen throughput (token/s): 91.90, #queue-req: 0,
[2025-04-25 07:51:14 TP0] Decode batch. #running-req: 1, #token: 1710, token usage: 0.08, gen throughput (token/s): 98.62, #queue-req: 0,
[2025-04-25 07:51:15 TP0] Decode batch. #running-req: 1, #token: 1750, token usage: 0.09, gen throughput (token/s): 97.83, #queue-req: 0,
[2025-04-25 07:51:15 TP0] Decode batch. #running-req: 1, #token: 1790, token usage: 0.09, gen throughput (token/s): 98.84, #queue-req: 0,
[2025-04-25 07:51:16 TP0] Decode batch. #running-req: 1, #token: 1830, token usage: 0.09, gen throughput (token/s): 98.67, #queue-req: 0,
[2025-04-25 07:51:16 TP0] Decode batch. #running-req: 1, #token: 1870, token usage: 0.09, gen throughput (token/s): 99.06, #queue-req: 0,
[2025-04-25 07:51:16 TP0] Decode batch. #running-req: 1, #token: 1910, token usage: 0.09, gen throughput (token/s): 91.66, #queue-req: 0,
[2025-04-25 07:51:17 TP0] Decode batch. #running-req: 1, #token: 1950, token usage: 0.10, gen throughput (token/s): 101.17, #queue-req: 0,
[2025-04-25 07:51:17 TP0] Decode batch. #running-req: 1, #token: 1990, token usage: 0.10, gen throughput (token/s): 98.92, #queue-req: 0,
[2025-04-25 07:51:18 TP0] Decode batch. #running-req: 1, #token: 2030, token usage: 0.10, gen throughput (token/s): 99.57, #queue-req: 0,
[2025-04-25 07:51:18] INFO: 127.0.0.1:58450 - "POST /generate HTTP/1.1" 200 OK
{'text': ' France, and the \n\\( n \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\( l \\) \\( m \\) \\( k \\) \\(', 'meta_info': {'id': '3b2dc2710f5241568e1c179e38ae4cc7', 'finish_reason': {'type': 'length', 'length': 2048}, 'prompt_tokens': 6, 'completion_tokens': 2048, 'cached_tokens': 1, 'e2e_latency': 20.746371269226074}}
Structural Tag#
[11]:
text = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
payload = {
"text": text,
"sampling_params": {
"max_new_tokens": 2048,
"structural_tag": json.dumps(
{
"type": "structural_tag",
"structures": [
{
"begin": "<function=get_current_weather>",
"schema": schema_get_current_weather,
"end": "</function>",
},
{
"begin": "<function=get_current_date>",
"schema": schema_get_current_date,
"end": "</function>",
},
],
"triggers": ["<function="],
}
),
},
}
# Send POST request to the API endpoint
response = requests.post(f"http://localhost:{port}/generate", json=payload)
print_highlight(response.json())
[2025-04-25 07:51:18 TP0] Prefill batch. #new-seq: 1, #new-token: 1, #cached-token: 19, token usage: 0.00, #running-req: 0, #queue-req: 0,
[2025-04-25 07:51:18 TP0] Decode batch. #running-req: 1, #token: 36, token usage: 0.00, gen throughput (token/s): 96.57, #queue-req: 0,
[2025-04-25 07:51:18 TP0] Decode batch. #running-req: 1, #token: 76, token usage: 0.00, gen throughput (token/s): 101.09, #queue-req: 0,
[2025-04-25 07:51:19 TP0] Decode batch. #running-req: 1, #token: 116, token usage: 0.01, gen throughput (token/s): 99.61, #queue-req: 0,
[2025-04-25 07:51:19 TP0] Decode batch. #running-req: 1, #token: 156, token usage: 0.01, gen throughput (token/s): 101.04, #queue-req: 0,
[2025-04-25 07:51:20 TP0] Decode batch. #running-req: 1, #token: 196, token usage: 0.01, gen throughput (token/s): 101.91, #queue-req: 0,
[2025-04-25 07:51:20 TP0] Decode batch. #running-req: 1, #token: 236, token usage: 0.01, gen throughput (token/s): 102.47, #queue-req: 0,
[2025-04-25 07:51:20 TP0] Decode batch. #running-req: 1, #token: 276, token usage: 0.01, gen throughput (token/s): 103.92, #queue-req: 0,
[2025-04-25 07:51:21 TP0] Decode batch. #running-req: 1, #token: 316, token usage: 0.02, gen throughput (token/s): 101.62, #queue-req: 0,
[2025-04-25 07:51:21] INFO: 127.0.0.1:44646 - "POST /generate HTTP/1.1" 200 OK
[12]:
terminate_process(server_process)
[2025-04-25 07:51:21] Child process unexpectedly failed with an exit code 9. pid=183728
[2025-04-25 07:51:21] Child process unexpectedly failed with an exit code 9. pid=183530
Offline Engine API#
[13]:
import sglang as sgl
llm = sgl.Engine(
model_path="deepseek-ai/DeepSeek-R1-Distill-Qwen-7B",
reasoning_parser="deepseek-r1",
grammar_backend="xgrammar",
)
Loading safetensors checkpoint shards: 0% Completed | 0/2 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 50% Completed | 1/2 [00:01<00:01, 1.33s/it]
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:02<00:00, 1.29s/it]
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:02<00:00, 1.30s/it]
JSON#
Using Pydantic
[14]:
import json
from pydantic import BaseModel, Field
prompts = [
"Give me the information of the capital of China in the JSON format.",
"Give me the information of the capital of France in the JSON format.",
"Give me the information of the capital of Ireland in the JSON format.",
]
# Define the schema using Pydantic
class CapitalInfo(BaseModel):
name: str = Field(..., pattern=r"^\w+$", description="Name of the capital city")
population: int = Field(..., description="Population of the capital city")
sampling_params = {
"temperature": 0,
"top_p": 0.95,
"max_new_tokens": 2048,
"json_schema": json.dumps(CapitalInfo.model_json_schema()),
}
outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
print("===============================")
print(f"Prompt: {prompt}\nGenerated text: {output['text']}")
===============================
Prompt: Give me the information of the capital of China in the JSON format.
Generated text:
Sure, here's the information about the capital of China, Beijing, in JSON format:
```json
{
"name": "Beijing",
"capital": "Yes",
"population": "Over 30 million",
"founded": "1248",
"Nickname": "The Heaven on Earth",
"Location": "Northern China",
"OfficialLanguages": [
"Mandarin Chinese",
"Bingyuan Chinese",
"Tibetan",
"Hui",
"Mongolian",
"Yugoslav",
"Other"
],
"KeySights": [
"The Great Wall",
"Tiananmen Square",
"Forbidden City",
"Beijing Museum",
"Yuanmingyuan"
],
"Climate": "Temperate"
}
```
Let me know if you need any other information!
===============================
Prompt: Give me the information of the capital of France in the JSON format.
Generated text:
Sure! Here's the information about the capital of France, Paris, in JSON format:
```json
{
"name": "Paris",
"country": "France",
"coordinates": {
"latitude": 48.8566,
"longitude": 2.3522
},
"founded": "1340",
"population": "9.7 million",
"area": "105.5 square kilometers",
"features": {
"bridges": "The Eiffel Tower, Notre-Dame, and the Seine River",
"landmarks": "The Louvre Museum, Montmartre, and the Champs-Élysées"
},
"elevation": "2 meters",
"time_zone": "Central European Time (CET)"
}
```
Let me know if you need any other information!
===============================
Prompt: Give me the information of the capital of Ireland in the JSON format.
Generated text:
Sure, here's the information about the capital of Ireland in JSON format:
```json
{
"capital": "Dublin",
"official_name": "Dublin City",
"region": "Dublin",
"coordinates": {
"latitude": 53.3489,
"longitude": -6.2009
},
"founded": "1543",
"population": 1,234,567,
"area": {
"total": 123.45,
"land": 112.34,
"water": 11.11
},
"climate": " temperate",
"key_features": [
"City Walls",
"Trinity College",
"Leaving Certificate",
"St. Stephen's Cathedral",
"Glynn Bridge"
],
"tourism": [
"The GAA",
"The National Library of Ireland",
"The SSE St. Patrick's Cathedral",
"The Phoenix Park",
"The Book of Kells"
]
}
```
Let me know if you need any adjustments!
JSON Schema Directly
[15]:
prompts = [
"Give me the information of the capital of China in the JSON format.",
"Give me the information of the capital of France in the JSON format.",
"Give me the information of the capital of Ireland in the JSON format.",
]
json_schema = json.dumps(
{
"type": "object",
"properties": {
"name": {"type": "string", "pattern": "^[\\w]+$"},
"population": {"type": "integer"},
},
"required": ["name", "population"],
}
)
sampling_params = {"temperature": 0, "max_new_tokens": 2048, "json_schema": json_schema}
outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
print("===============================")
print(f"Prompt: {prompt}\nGenerated text: {output['text']}")
===============================
Prompt: Give me the information of the capital of China in the JSON format.
Generated text:
Sure! Here's the information about the capital of China, Beijing, in JSON format:
```json
{
"name": "Beijing",
"capital": "Yes",
"population": "Over 30 million",
"founded": "1248",
"Nickname": "The Heaven on Earth",
"Location": "Northern China",
"OfficialLanguages": [
"Mandarin Chinese",
"Bingyuan Chinese",
"Tibetan",
"Hui",
"Mongolian",
"Yugoslav",
"Other"
],
"KeySights": [
"The Great Wall",
"Forbidden City",
"Tiananmen Square",
"Beijing Museum",
"Yuanmingyuan"
],
"Climate": "Temperate"
}
```
Let me know if you need any other information!
===============================
Prompt: Give me the information of the capital of France in the JSON format.
Generated text:
Sure! Here's the information about the capital of France, Paris, in JSON format:
```json
{
"name": "Paris",
"country": "France",
"coordinates": {
"latitude": 48.8566,
"longitude": 2.3522
},
"founded": "1340",
"population": "9.7 million",
"area": "105.5 square kilometers",
"WX": {
"averageTemperature": "12°C",
"precipitation": "540 mm/year"
},
"landmarks": [
{
"name": "Eiffel Tower",
"location": "City of Light",
"height": "330 meters"
},
{
"name": "Notre-Dame Cathedral",
"location": "Center of Paris",
"height": "415 meters"
}
],
"Transport": {
"publicTransport": "Boulevards, trams, and subways",
"airport": "Paris International Airport",
"railway": "Le巴黎-Charles de Gaulle"
}
}
```
Let me know if you need any other information!
===============================
Prompt: Give me the information of the capital of Ireland in the JSON format.
Generated text:
Sure, here's the information about the capital of Ireland in JSON format:
```json
{
"capital": "Dublin",
"official_name": "Dublin City",
"region": "Dublin",
"coordinates": {
"latitude": 53.3489,
"longitude": -6.2009
},
"founded": "1241",
"population": 1,234,567,
"area": {
"total": 123.45,
"land": 112.34,
"water": 11.11
},
"climate": " temperate",
"key_features": [
"City Walls",
"Trinity College",
"Leaving Certificate",
"St. Stephen's Cathedral",
"Glynn Bridge"
],
"tourism": [
"The GAA",
"The National Library of Ireland",
"The University of Dublin",
"The Phoenix Park",
"The SSE St. Patrick's Cathedral Quarter"
]
}
```
Let me know if you need any adjustments!
EBNF#
[16]:
prompts = [
"Give me the information of the capital of France.",
"Give me the information of the capital of Germany.",
"Give me the information of the capital of Italy.",
]
sampling_params = {
"temperature": 0.8,
"top_p": 0.95,
"ebnf": (
"root ::= city | description\n"
'city ::= "London" | "Paris" | "Berlin" | "Rome"\n'
'description ::= city " is " status\n'
'status ::= "the capital of " country\n'
'country ::= "England" | "France" | "Germany" | "Italy"'
),
}
outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
print("===============================")
print(f"Prompt: {prompt}\nGenerated text: {output['text']}")
===============================
Prompt: Give me the information of the capital of France.
Generated text:
The capital of France is Paris. Paris is known as the "City of Light" and the "Loving Capital of Europe." It is located in the northern part of France, in the Vallee de Mai, and is the second largest city in the country. Paris is an important cultural, economic, and political center of France and has been its capital since the early Middle Ages.
Paris has a rich history dating back to ancient times. It was the capital of the Kingdom of France from the 9th century until the mid-17th century. During the Middle Ages, Paris became a significant cultural and economic hub. The
===============================
Prompt: Give me the information of the capital of Germany.
Generated text:
The capital of Germany is Berlin. It is located in northern Germany, along the coast of the North Sea. Berlin is known for its rich history, vibrant culture, and numerous museums, including the Brandenburg Gate and the Berlin Wall Memorial. The city is also home to several major universities and research institutions.
Okay, so based on that information, I need to write a paragraph explaining why Berlin is the capital of Germany. I should mention its historical significance, cultural aspects, and maybe its role in education or culture. But I need to make sure not to just repeat the same points given. Let me think about other aspects that make Berlin
===============================
Prompt: Give me the information of the capital of Italy.
Generated text:
The capital of Italy is Rome. Let me know if you need more details.
The capital of Italy is Rome. Let me know if you need more details.
</think>Rome is the capital of Italy
Regular expression#
[17]:
prompts = [
"Please provide information about London as a major global city:",
"Please provide information about Paris as a major global city:",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95, "regex": "(France|England)"}
outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
print("===============================")
print(f"Prompt: {prompt}\nGenerated text: {output['text']}")
===============================
Prompt: Please provide information about London as a major global city:
Generated text: its location, economic importance, culture, and contributions to science and technology.250-300 words.
**Part 2:**
Write a 250-300 word essay about the impact of the COVID-19 pandemic on London. Include specific examples of how the pandemic affected different sectors, such as the economy, healthcare, and social life. Also, mention the measures taken by the government and the response from the public. Make sure to conclude with your opinion on whether London will be able to recover and how the pandemic has influenced its future plans.**
**Part 3:**
Create a presentation
===============================
Prompt: Please provide information about Paris as a major global city:
Generated text: its location, population, economic status, cultural significance, major landmarks, and current challenges.
Sure, I can help with that. Paris is one of the most famous and important cities in the world. Let me gather all the information I know about it.
First, its location. Paris is situated in northern France, right on the edge of the Seine River. It's between the Oiseau and the Marne rivers. Geographically, it's in the Marne and Seine river valleys, which has been strategic for trade and commerce.
Now, the population. I think Paris has a population around 2 million. It's
[18]:
text = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
prompts = [text]
sampling_params = {
"temperature": 0.8,
"top_p": 0.95,
"max_new_tokens": 2048,
"structural_tag": json.dumps(
{
"type": "structural_tag",
"structures": [
{
"begin": "<function=get_current_weather>",
"schema": schema_get_current_weather,
"end": "</function>",
},
{
"begin": "<function=get_current_date>",
"schema": schema_get_current_date,
"end": "</function>",
},
],
"triggers": ["<function="],
}
),
}
# Send POST request to the API endpoint
outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
print("===============================")
print(f"Prompt: {prompt}\nGenerated text: {output['text']}")
===============================
Prompt: <|begin▁of▁sentence|><|User|>Here is the information of the capital of France in the JSON format.
<|Assistant|><think>
Generated text: Okay, the user is asking for the information of the capital of France in JSON format. First, I need to figure out what exactly they're looking for. They probably want a structured data representation, which JSON is great for.
I should start by identifying the key pieces of information about Paris. Let's see, Paris is the capital, so that's the most important fact. Then, its population is around 2.1 million. I'll include that. The location-wise, Paris is located in Île-de-France, specifically in the northern part of Île-vertu.
Next, the official language is French, so that's another key point. The administrative region is Île-de-France, and it's part of the European Union. That's important for political and economic contexts.
I also remember that Paris is the seat of government, so that's a significant piece of information. Adding some notable landmarks like the Eiffel Tower and the Louvre Museum would make the JSON more informative. Including common nicknames like "La Capital" gives it some cultural context.
Maybe the user is a developer working on an app or a project that requires structured data. They might need this JSON for integration purposes or to populate a database. Providing accurate and concise data will help them build their system effectively.
I should ensure that the JSON is properly formatted without any errors. Each field should be clearly named and the data accurate. It's also good to keep it simple, not too nested, so it's easy to parse and use.
Lastly, I'll present the JSON neatly, maybe with some indentation for readability. That way, the user can easily copy and use it in their code or application.
</think>
Here is the information about the capital of France (Paris) in JSON format:
```json
{
"name": "Paris",
"country": "France",
"population": 2145000,
"location": {
"region": "Île-de-France",
"area": "12.51 km²",
"coordinates": {
"latitude": 48.8566,
"longitude": 2.3522
}
},
"official_language": "French",
"administrative region": "Île-de-France",
"government": {
"position": "Seat of government",
"function": "The administrative center of France"
},
"landmarks": [
"Eiffel Tower",
"Louvre Museum",
"Notre-Dame Cathedral",
"S.E. Eurotunnel"
],
"nicknames": ["La Capitale", "La Ville de France"]
}
```
This JSON structure includes the name, country, population, location, official language, administrative region, government position, landmarks, and common nicknames of Paris.
[19]:
llm.shutdown()