name	streaming-responses
description	Use when streaming tokens incrementally from an LLM via liter-llm over SSE or async iterators. Covers chat_stream, delta handling, and null-content chunks.

Streaming Responses

Use chat_stream(...) to receive tokens as they are produced instead of waiting for the full completion. The proxy streams over SSE; bindings expose async iterators.

Python

import asyncio, os
from liter_llm import LlmClient

async def main() -> None:
    client = LlmClient(api_key=os.environ["OPENAI_API_KEY"])
    async for chunk in await client.chat_stream(
        model="openai/gpt-4o",
        messages=[{"role": "user", "content": "Tell me a story"}],
    ):
        if chunk.choices and chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="", flush=True)
    print()

asyncio.run(main())

TypeScript

const chunks = await client.chatStream({
  model: "openai/gpt-4o",
  messages: [{ role: "user", content: "Tell me a story" }],
});
for (const chunk of chunks) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}

Notes

The first and last chunks often carry null content. Always null-check chunk.choices[0].delta.content (Python) or chunk.choices[0]?.delta?.content (TypeScript) before using it.
Tool-call deltas arrive in delta.tool_calls; accumulate function.arguments fragments across chunks before parsing.
Through the proxy, request streaming with "stream": true on /v1/chat/completions.

name	streaming-responses
description	Use when streaming tokens incrementally from an LLM via liter-llm over SSE or async iterators. Covers chat_stream, delta handling, and null-content chunks.

Streaming Responses

Use chat_stream(...) to receive tokens as they are produced instead of waiting for the full completion. The proxy streams over SSE; bindings expose async iterators.

Python

import asyncio, os
from liter_llm import LlmClient

async def main() -> None:
    client = LlmClient(api_key=os.environ["OPENAI_API_KEY"])
    async for chunk in await client.chat_stream(
        model="openai/gpt-4o",
        messages=[{"role": "user", "content": "Tell me a story"}],
    ):
        if chunk.choices and chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="", flush=True)
    print()

asyncio.run(main())

TypeScript

const chunks = await client.chatStream({
  model: "openai/gpt-4o",
  messages: [{ role: "user", content: "Tell me a story" }],
});
for (const chunk of chunks) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}

Notes

The first and last chunks often carry null content. Always null-check chunk.choices[0].delta.content (Python) or chunk.choices[0]?.delta?.content (TypeScript) before using it.
Tool-call deltas arrive in delta.tool_calls; accumulate function.arguments fragments across chunks before parsing.
Through the proxy, request streaming with "stream": true on /v1/chat/completions.

streaming-responses

Streaming Responses

Python

TypeScript

Notes

More from this repository

More from this repository

Streaming Responses

Python

TypeScript

Notes