| name | liter-llm |
| description | Universal LLM API client for 142+ providers with native bindings for 11 languages. Use when writing code that calls LLM APIs via liter-llm in Python, TypeScript, Rust, Go, Java, C#, Ruby, PHP, Elixir, WASM, or C. Covers chat, streaming, embeddings, image generation, speech, transcription, moderation, reranking, search, OCR, tool calling, and configuration. |
| license | MIT |
| metadata | {"author":"kreuzberg-dev","version":"1.0","repository":"https://github.com/kreuzberg-dev/liter-llm"} |
Liter-LLM Universal LLM Client
Liter-LLM is a universal LLM API client with a Rust core and native bindings for Python, TypeScript/Node.js, Go, Java, C#, Ruby, PHP, Elixir, WebAssembly, and C (FFI). It provides a unified interface to 142+ LLM providers (OpenAI, Anthropic, Google Gemini, Groq, Mistral, Cohere, AWS Bedrock, Azure, and many more) with built-in caching, budgets, rate limiting, hooks, streaming, cost tracking, health checks, and tracing.
Use this skill when writing code that:
- Calls LLM APIs (chat completions, streaming, embeddings) via liter-llm
- Configures liter-llm clients (API keys, timeouts, retries, cache, budget, hooks)
- Uses tool calling / function calling with LLM providers
- Implements streaming responses from LLMs
- Uses search, OCR, image generation, speech, transcription, moderation, or reranking APIs
- Routes requests to specific providers using model prefixes
- Handles LLM API errors across any of the 11 supported languages
Installation
Python
pip install liter-llm
TypeScript / Node.js
pnpm add @kreuzberg/liter-llm
Rust
[dependencies]
liter-llm = "0.1"
Go
go get github.com/kreuzberg-dev/liter-llm/packages/go
Java
<dependency>
<groupId>dev.kreuzberg</groupId>
<artifactId>liter-llm</artifactId>
<version>1.4.0-rc.17</version>
</dependency>
C# (.NET)
dotnet add package LiterLlm
Ruby
gem install liter_llm
PHP
composer require kreuzberg/liter-llm
Elixir
# mix.exs
{:liter_llm, "~> 1.4.0-rc.17"}
WebAssembly
pnpm add @kreuzberg/liter-llm-wasm
C / FFI
cargo build --release -p liter-llm-ffi
Quick Start
Python (Async)
import asyncio
import os
from liter_llm import LlmClient
async def main() -> None:
client = LlmClient(api_key=os.environ["OPENAI_API_KEY"])
response = await client.chat(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)
asyncio.run(main())
TypeScript
import { LlmClient } from "@kreuzberg/liter-llm";
const client = new LlmClient({ apiKey: process.env.OPENAI_API_KEY! });
const response = await client.chat({
model: "openai/gpt-4o",
messages: [{ role: "user", content: "Hello!" }],
});
console.log(response.choices[0].message.content);
Rust
use liter_llm::{
ChatCompletionRequest, ClientConfigBuilder, DefaultClient, LlmClient,
Message, UserContent, UserMessage,
};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let config = ClientConfigBuilder::new(std::env::var("OPENAI_API_KEY")?)
.build();
let client = DefaultClient::new(config, Some("openai/gpt-4o"))?;
let request = ChatCompletionRequest {
model: "openai/gpt-4o".into(),
messages: vec![Message::User(UserMessage {
content: UserContent::Text("Hello!".into()),
name: None,
})],
..Default::default()
};
let response = client.chat(request).await?;
if let Some(choice) = response.choices.first() {
println!("{}", choice.message.content.as_deref().unwrap_or(""));
}
Ok(())
}
Go
package main
import (
"context"
"fmt"
"os"
llm "github.com/kreuzberg-dev/liter-llm/packages/go"
)
func main() {
client := llm.NewClient(llm.WithAPIKey(os.Getenv("OPENAI_API_KEY")))
resp, err := client.Chat(context.Background(), &llm.ChatCompletionRequest{
Model: "openai/gpt-4o",
Messages: []llm.Message{
llm.NewTextMessage(llm.RoleUser, "Hello!"),
},
})
if err != nil {
panic(err)
}
if len(resp.Choices) > 0 && resp.Choices[0].Message.Content != nil {
fmt.Println(*resp.Choices[0].Message.Content)
}
}
Configuration
All languages use the same configuration structure with language-appropriate naming conventions (snake_case for Python/Rust/Ruby/Go/Elixir/PHP, camelCase for TypeScript/Node.js/WASM/C#/Java).
Python
from liter_llm import LlmClient
client = LlmClient(
api_key="sk-...",
base_url="https://custom-proxy.example.com/v1",
model_hint="openai",
max_retries=3,
timeout=60,
cache={"max_entries": 256, "ttl_seconds": 300},
budget={"global_limit": 10.0, "model_limits": {"openai/gpt-4o": 5.0}, "enforcement": "hard"},
cooldown=30,
rate_limit={"rpm": 60, "tpm": 100000},
health_check=60,
cost_tracking=True,
tracing=True,
)
client.add_hook(MyLoggingHook())
TypeScript
import { LlmClient } from "@kreuzberg/liter-llm";
const client = new LlmClient({
apiKey: process.env.OPENAI_API_KEY!,
baseUrl: "https://custom-proxy.example.com/v1",
modelHint: "openai",
maxRetries: 3,
timeout: 60,
cache: { maxEntries: 256, ttlSeconds: 300 },
budget: { globalLimit: 10.0, modelLimits: { "openai/gpt-4o": 5.0 }, enforcement: "hard" },
cooldown: 30,
rateLimit: { rpm: 60, tpm: 100000 },
healthCheck: 60,
costTracking: true,
tracing: true,
});
Configuration Options
| Option | Type | Default | Description |
|---|
api_key | string | required | Provider API key. Wrapped in SecretString internally. |
base_url | string | from registry | Override the provider's base URL. |
model_hint | string | none | Pre-resolve a provider at construction (e.g. "openai"). |
timeout | duration | 60s | Request timeout. |
max_retries | int | 3 | Retries on 429/5xx responses with exponential backoff. |
cache | object | none | Response caching config (max_entries, ttl_seconds). |
budget | object | none | Spending limits (global_limit, model_limits, enforcement). |
cooldown | int | none | Circuit breaker cooldown in seconds after transient errors. |
rate_limit | object | none | Rate limiting (rpm, tpm). |
health_check | int | none | Background health check interval in seconds. |
cost_tracking | bool | false | Enable per-request cost tracking. |
tracing | bool | false | Enable OpenTelemetry tracing spans. |
Configuration File
Instead of passing all options to the constructor, create a liter-llm.toml file in your project directory. liter-llm auto-discovers it by searching the current directory and parent directories.
api_key = "sk-..."
base_url = "https://api.openai.com/v1"
model_hint = "openai"
timeout_secs = 120
max_retries = 5
[cache]
max_entries = 512
ttl_seconds = 600
[budget]
global_limit = 50.0
enforcement = "hard"
[budget.model_limits]
"openai/gpt-4o" = 25.0
[rate_limit]
rpm = 60
tpm = 100000
cooldown_secs = 30
health_check_secs = 60
cost_tracking = true
tracing = true
[[providers]]
name = "my-provider"
base_url = "https://my-llm.example.com/v1"
model_prefixes = ["my-provider/"]
Load from code:
client = LlmClient.from_config()
client = LlmClient.from_config("path/to/config.toml")
const client = await LlmClient.fromConfig();
if let Some(config) = FileConfig::discover()? {
let client = ManagedClient::new(config.into_builder().build(), None)?;
}
API Key Environment Variables
| Provider | Environment Variable |
|---|
| OpenAI | OPENAI_API_KEY |
| Anthropic | ANTHROPIC_API_KEY |
| Google (Gemini) | GEMINI_API_KEY |
| Groq | GROQ_API_KEY |
| Mistral | MISTRAL_API_KEY |
| Cohere | CO_API_KEY |
| AWS Bedrock | AWS_ACCESS_KEY_ID + AWS_SECRET_ACCESS_KEY |
Provider Routing
Model routing uses a name prefix convention. The prefix before the / determines which provider handles the request:
response = await client.chat(model="openai/gpt-4o", messages=[...])
response = await client.chat(model="anthropic/claude-sonnet-4-20250514", messages=[...])
response = await client.chat(model="google/gemini-2.0-flash", messages=[...])
response = await client.chat(model="groq/llama3-70b", messages=[...])
response = await client.chat(model="mistral/mistral-large-latest", messages=[...])
response = await client.chat(model="azure/gpt-4o", messages=[...])
response = await client.chat(model="bedrock/anthropic.claude-v2", messages=[...])
With model_hint, you can skip the prefix:
client = LlmClient(api_key="sk-...", model_hint="openai")
response = await client.chat(model="gpt-4o", messages=[...])
Streaming
Python
import asyncio
import os
from liter_llm import LlmClient
async def main() -> None:
client = LlmClient(api_key=os.environ["OPENAI_API_KEY"])
async for chunk in await client.chat_stream(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Tell me a story"}],
):
if chunk.choices and chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
print()
asyncio.run(main())
TypeScript
import { LlmClient } from "@kreuzberg/liter-llm";
const client = new LlmClient({ apiKey: process.env.OPENAI_API_KEY! });
const chunks = await client.chatStream({
model: "openai/gpt-4o",
messages: [{ role: "user", content: "Tell me a story" }],
});
for (const chunk of chunks) {
process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}
console.log();
Tool Calling
Python
import asyncio
import os
from liter_llm import LlmClient
async def main() -> None:
client = LlmClient(api_key=os.environ["OPENAI_API_KEY"])
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"},
},
"required": ["location"],
},
},
}
]
response = await client.chat(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "What is the weather in Berlin?"}],
tools=tools,
)
choice = response.choices[0]
if choice.message.tool_calls:
for call in choice.message.tool_calls:
print(f"Tool: {call.function.name}, Args: {call.function.arguments}")
asyncio.run(main())
Search and OCR
Search
import asyncio
import os
from liter_llm import LlmClient
async def main() -> None:
client = LlmClient(api_key=os.environ["BRAVE_API_KEY"])
response = await client.search(
model="brave/web-search",
query="What is Rust programming language?",
max_results=5,
)
for result in response.results:
print(f"{result.title}: {result.url}")
asyncio.run(main())
OCR
import asyncio
import os
from liter_llm import LlmClient
async def main() -> None:
client = LlmClient(api_key=os.environ["MISTRAL_API_KEY"])
response = await client.ocr(
model="mistral/mistral-ocr-latest",
document={"type": "document_url", "url": "https://example.com/invoice.pdf"},
)
for page in response.pages:
print(f"Page {page.index}: {page.markdown[:100]}...")
asyncio.run(main())
Error Handling
Python
from liter_llm import LlmClient, LlmError
try:
response = await client.chat(model="openai/gpt-4o", messages=[...])
except LlmError as e:
print(f"LLM error: {e}")
TypeScript
import { LlmClient } from "@kreuzberg/liter-llm";
try {
const response = await client.chat({ model: "openai/gpt-4o", messages: [...] });
} catch (err) {
const msg = (err as Error).message;
if (msg.startsWith("[RateLimited]")) {
} else if (msg.startsWith("[Authentication]")) {
}
console.error(msg);
}
Rust
use liter_llm::{LlmClient, LiterLlmError};
match client.chat(request).await {
Ok(response) => println!("{}", response.choices[0].message.content.as_deref().unwrap_or("")),
Err(LiterLlmError::Authentication { message }) => eprintln!("Auth failed: {message}"),
Err(LiterLlmError::RateLimited { message, retry_after }) => {
eprintln!("Rate limited: {message}, retry after: {retry_after:?}");
}
Err(LiterLlmError::BadRequest { message }) => eprintln!("Bad request: {message}"),
Err(LiterLlmError::ContextWindowExceeded { message }) => eprintln!("Too long: {message}"),
Err(LiterLlmError::Timeout) => eprintln!("Request timed out"),
Err(e) => eprintln!("Error: {e}"),
}
Hooks
Register lifecycle hooks for request/response/error events:
Python
from liter_llm import LlmClient
class LoggingHook:
def on_request(self, request):
print(f"Sending request to {request['model']}")
def on_response(self, request, response):
print(f"Got response: {response.usage.total_tokens} tokens")
def on_error(self, request, error):
print(f"Error: {error}")
client = LlmClient(api_key="sk-...")
client.add_hook(LoggingHook())
TypeScript
import { LlmClient } from "@kreuzberg/liter-llm";
const client = new LlmClient({ apiKey: process.env.OPENAI_API_KEY! });
client.addHook({
onRequest(req) {
console.log(`Sending: ${req.model}`);
},
onResponse(req, res) {
console.log(`Tokens: ${res.usage?.totalTokens}`);
},
onError(req, err) {
console.error(`Error: ${err}`);
},
});
Common Pitfalls
-
Python: all methods are async. You must use await and run inside an async context. Use asyncio.run(main()) at the top level. There are no synchronous methods.
-
Naming conventions differ by language. TypeScript, Node.js, WASM, Java, and C# use camelCase (chatStream, apiKey, maxRetries). Python, Rust, Ruby, Go, Elixir, and PHP use snake_case (chat_stream, api_key, max_retries).
-
Provider prefix is required. Always use "provider/model-name" format (e.g. "openai/gpt-4o", "anthropic/claude-sonnet-4-20250514"). Without the prefix, routing will fail unless model_hint is set.
-
API keys are wrapped in SecretString. Keys passed to the constructor are never logged, serialized, or included in error messages. Read keys from environment variables, never hardcode them.
-
Streaming: first/last chunks may have null content. Always check chunk.choices[0].delta.content (Python) or chunk.choices[0]?.delta?.content (TypeScript) for null/undefined before using the value.
-
Rust: DefaultClient::new requires ClientConfigBuilder. Build config with ClientConfigBuilder::new(api_key).build(), then pass to DefaultClient::new(config, model_hint).
-
Rust: chat is async. Use #[tokio::main] or call from an async context. The LlmClient trait defines async fn chat(...).
-
Budget enforcement modes. "hard" rejects requests that exceed the budget. "soft" logs a warning but allows the request through. Default is no budget enforcement.
-
Cache is per-client. Each LlmClient instance has its own cache. Cache keys are derived from the full request (model + messages + parameters).
-
Go: check error returns and nil pointers. Response fields like Content are pointers -- always nil-check before dereferencing.
CLI Installation
Homebrew
brew tap kreuzberg-dev/tap
brew install liter-llm
Cargo
cargo install liter-llm-cli
Docker
docker pull ghcr.io/kreuzberg-dev/liter-llm
Proxy Server
liter-llm includes an OpenAI-compatible API gateway with 22 endpoints. It acts as a drop-in replacement for litellm proxy, routing requests to 142+ LLM providers.
Features
- 22 OpenAI-compatible REST endpoints (chat, embeddings, images, audio, files, batches, responses, models)
- Automatic model routing via provider prefixes
- Virtual API keys for multi-tenant access control
- Rate limiting (RPM/TPM per key or globally)
- Cost tracking and budget enforcement
- Response caching
- SSE streaming
- OpenAPI 3.1 spec at
/openapi.json
Running the Proxy
liter-llm api --config liter-llm-proxy.toml
Docker Quickstart
docker run -p 4000:4000 \
-e LITER_LLM_MASTER_KEY=sk-key \
ghcr.io/kreuzberg-dev/liter-llm
The Docker image (ghcr.io/kreuzberg-dev/liter-llm) is a 35MB Chainguard-based image.
Proxy Configuration
The proxy uses TOML configuration with ${ENV_VAR} interpolation. It auto-discovers liter-llm-proxy.toml in the current directory.
[server]
host = "0.0.0.0"
port = 4000
[auth]
master_key = "${LITER_LLM_MASTER_KEY}"
[[virtual_keys]]
key = "sk-team-frontend"
models = ["openai/*", "anthropic/*"]
rpm = 60
tpm = 100000
budget = 50.0
[[providers]]
name = "openai"
api_key = "${OPENAI_API_KEY}"
[[providers]]
name = "anthropic"
api_key = "${ANTHROPIC_API_KEY}"
MCP Server
liter-llm includes a Model Context Protocol (MCP) server exposing 22 tools that match the REST API endpoints. This allows MCP-compatible clients (Claude Desktop, Claude Code, etc.) to call LLM APIs through liter-llm.
Running the MCP Server
liter-llm mcp --transport stdio
liter-llm mcp --transport http --port 3001
MCP Tools
The MCP server exposes tools matching the proxy API: chat completions, streaming, embeddings, image generation, speech, transcription, moderation, search, OCR, reranking, file operations, batch operations, responses, and model listing.
Additional Resources
Detailed reference files for specific topics:
- Python API Reference -- All functions, config, types, error hierarchy
- TypeScript API Reference -- All functions with camelCase conventions
- Rust API Reference -- Traits, Tower middleware, feature flags
- Other Language Bindings -- Go, Java, C#, Ruby, PHP, Elixir, WASM, C FFI
- Configuration Reference -- All config options, cache backends, middleware
- Provider Reference -- Routing, auth types, custom providers
- Advanced Features -- Search, OCR, OpenDAL cache, Tower stack, tracing
Full documentation: https://docs.liter-llm.kreuzberg.dev
GitHub: https://github.com/kreuzberg-dev/liter-llm