원클릭으로 Manus에서 모든 스킬 실행

$pwd:

liter-llm

Name: Liter Llm
Author: kreuzberg-dev

// Universal LLM API client for 142+ providers with native bindings for 11 languages. Use when writing code that calls LLM APIs via liter-llm in Python, TypeScript, Rust, Go, Java, C#, Ruby, PHP, Elixir, WASM, or C. Covers chat, streaming, embeddings, image generation, speech, transcription, moderation, reranking, search, OCR, tool calling, and configuration.

Manus에서 실행

$ git log --oneline --stat

stars:197

forks:15

updated:2026년 5월 2일 15:24

파일 탐색기

8 개 파일

SKILL.md

readonly

related-skills.json

같은 저장소

add-language-generator.md

from "kreuzberg-dev/liter-llm"

Add a new language to Alef-based e2e test generation

2026-05-28197

create-e2e-fixture.md

from "kreuzberg-dev/liter-llm"

Create a new e2e test fixture for cross-language test generation

2026-05-28197

package.json

"author": "kreuzberg-dev"

"repository": "kreuzberg-dev/liter-llm"

GitHub 저장소 열기 Creator 저장소 보기

$ install --global

$ download --local

Manus에서 실행

$ useful --forSOC

소프트웨어 개발자컴퓨터 및 수학직15-1252L4

원클릭으로 모든 스킬 실행

name	liter-llm
description	Universal LLM API client for 142+ providers with native bindings for 11 languages. Use when writing code that calls LLM APIs via liter-llm in Python, TypeScript, Rust, Go, Java, C#, Ruby, PHP, Elixir, WASM, or C. Covers chat, streaming, embeddings, image generation, speech, transcription, moderation, reranking, search, OCR, tool calling, and configuration.
license	MIT
metadata	{"author":"kreuzberg-dev","version":"1.0","repository":"https://github.com/kreuzberg-dev/liter-llm"}

Liter-LLM Universal LLM Client

Liter-LLM is a universal LLM API client with a Rust core and native bindings for Python, TypeScript/Node.js, Go, Java, C#, Ruby, PHP, Elixir, WebAssembly, and C (FFI). It provides a unified interface to 142+ LLM providers (OpenAI, Anthropic, Google Gemini, Groq, Mistral, Cohere, AWS Bedrock, Azure, and many more) with built-in caching, budgets, rate limiting, hooks, streaming, cost tracking, health checks, and tracing.

Use this skill when writing code that:

Calls LLM APIs (chat completions, streaming, embeddings) via liter-llm
Configures liter-llm clients (API keys, timeouts, retries, cache, budget, hooks)
Uses tool calling / function calling with LLM providers
Implements streaming responses from LLMs
Uses search, OCR, image generation, speech, transcription, moderation, or reranking APIs
Routes requests to specific providers using model prefixes
Handles LLM API errors across any of the 11 supported languages

Installation

Python

pip install liter-llm

TypeScript / Node.js

pnpm add @kreuzberg/liter-llm

Rust

# Cargo.toml
[dependencies]
liter-llm = "0.1"

Go

go get github.com/kreuzberg-dev/liter-llm/packages/go

Java

<!-- pom.xml -->
<dependency>
    <groupId>dev.kreuzberg</groupId>
    <artifactId>liter-llm</artifactId>
    <version>1.4.0-rc.17</version>
</dependency>

C# (.NET)

dotnet add package LiterLlm

Ruby

gem install liter_llm

PHP

composer require kreuzberg/liter-llm

Elixir

# mix.exs
{:liter_llm, "~> 1.4.0-rc.17"}

WebAssembly

pnpm add @kreuzberg/liter-llm-wasm

C / FFI

cargo build --release -p liter-llm-ffi
# Produces libliter_llm_ffi.so / .dylib / .dll

Quick Start

Python (Async)

import asyncio
import os
from liter_llm import LlmClient

async def main() -> None:
    client = LlmClient(api_key=os.environ["OPENAI_API_KEY"])
    response = await client.chat(
        model="openai/gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}],
    )
    print(response.choices[0].message.content)

asyncio.run(main())

TypeScript

import { LlmClient } from "@kreuzberg/liter-llm";

const client = new LlmClient({ apiKey: process.env.OPENAI_API_KEY! });
const response = await client.chat({
  model: "openai/gpt-4o",
  messages: [{ role: "user", content: "Hello!" }],
});
console.log(response.choices[0].message.content);

Rust

use liter_llm::{
    ChatCompletionRequest, ClientConfigBuilder, DefaultClient, LlmClient,
    Message, UserContent, UserMessage,
};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let config = ClientConfigBuilder::new(std::env::var("OPENAI_API_KEY")?)
        .build();
    let client = DefaultClient::new(config, Some("openai/gpt-4o"))?;

    let request = ChatCompletionRequest {
        model: "openai/gpt-4o".into(),
        messages: vec![Message::User(UserMessage {
            content: UserContent::Text("Hello!".into()),
            name: None,
        })],
        ..Default::default()
    };

    let response = client.chat(request).await?;
    if let Some(choice) = response.choices.first() {
        println!("{}", choice.message.content.as_deref().unwrap_or(""));
    }
    Ok(())
}

Go

package main

import (
 "context"
 "fmt"
 "os"

 llm "github.com/kreuzberg-dev/liter-llm/packages/go"
)

func main() {
 client := llm.NewClient(llm.WithAPIKey(os.Getenv("OPENAI_API_KEY")))
 resp, err := client.Chat(context.Background(), &llm.ChatCompletionRequest{
  Model: "openai/gpt-4o",
  Messages: []llm.Message{
   llm.NewTextMessage(llm.RoleUser, "Hello!"),
  },
 })
 if err != nil {
  panic(err)
 }
 if len(resp.Choices) > 0 && resp.Choices[0].Message.Content != nil {
  fmt.Println(*resp.Choices[0].Message.Content)
 }
}

Configuration

All languages use the same configuration structure with language-appropriate naming conventions (snake_case for Python/Rust/Ruby/Go/Elixir/PHP, camelCase for TypeScript/Node.js/WASM/C#/Java).

Python

from liter_llm import LlmClient

client = LlmClient(
    api_key="sk-...",
    base_url="https://custom-proxy.example.com/v1",  # override provider URL
    model_hint="openai",            # pre-resolve provider (skip prefix lookup)
    max_retries=3,                  # retries on 429/5xx with exponential backoff
    timeout=60,                     # request timeout in seconds
    cache={"max_entries": 256, "ttl_seconds": 300},
    budget={"global_limit": 10.0, "model_limits": {"openai/gpt-4o": 5.0}, "enforcement": "hard"},
    cooldown=30,                    # circuit breaker after transient errors
    rate_limit={"rpm": 60, "tpm": 100000},
    health_check=60,                # background provider health checks
    cost_tracking=True,             # per-request cost tracking
    tracing=True,                   # OpenTelemetry tracing spans
)
client.add_hook(MyLoggingHook())

TypeScript

import { LlmClient } from "@kreuzberg/liter-llm";

const client = new LlmClient({
  apiKey: process.env.OPENAI_API_KEY!,
  baseUrl: "https://custom-proxy.example.com/v1",
  modelHint: "openai",
  maxRetries: 3,
  timeout: 60,
  cache: { maxEntries: 256, ttlSeconds: 300 },
  budget: { globalLimit: 10.0, modelLimits: { "openai/gpt-4o": 5.0 }, enforcement: "hard" },
  cooldown: 30,
  rateLimit: { rpm: 60, tpm: 100000 },
  healthCheck: 60,
  costTracking: true,
  tracing: true,
});

Configuration Options

Option	Type	Default	Description
`api_key`	string	required	Provider API key. Wrapped in `SecretString` internally.
`base_url`	string	from registry	Override the provider's base URL.
`model_hint`	string	none	Pre-resolve a provider at construction (e.g. `"openai"`).
`timeout`	duration	60s	Request timeout.
`max_retries`	int	3	Retries on 429/5xx responses with exponential backoff.
`cache`	object	none	Response caching config (`max_entries`, `ttl_seconds`).
`budget`	object	none	Spending limits (`global_limit`, `model_limits`, `enforcement`).
`cooldown`	int	none	Circuit breaker cooldown in seconds after transient errors.
`rate_limit`	object	none	Rate limiting (`rpm`, `tpm`).
`health_check`	int	none	Background health check interval in seconds.
`cost_tracking`	bool	false	Enable per-request cost tracking.
`tracing`	bool	false	Enable OpenTelemetry tracing spans.

Configuration File

Instead of passing all options to the constructor, create a liter-llm.toml file in your project directory. liter-llm auto-discovers it by searching the current directory and parent directories.

api_key = "sk-..."
base_url = "https://api.openai.com/v1"
model_hint = "openai"
timeout_secs = 120
max_retries = 5

[cache]
max_entries = 512
ttl_seconds = 600

[budget]
global_limit = 50.0
enforcement = "hard"

[budget.model_limits]
"openai/gpt-4o" = 25.0

[rate_limit]
rpm = 60
tpm = 100000

cooldown_secs = 30
health_check_secs = 60
cost_tracking = true
tracing = true

[[providers]]
name = "my-provider"
base_url = "https://my-llm.example.com/v1"
model_prefixes = ["my-provider/"]

Load from code:

# Python -- auto-discover
client = LlmClient.from_config()
# Or explicit path
client = LlmClient.from_config("path/to/config.toml")

// TypeScript -- auto-discover
const client = await LlmClient.fromConfig();

// Rust -- auto-discover
if let Some(config) = FileConfig::discover()? {
    let client = ManagedClient::new(config.into_builder().build(), None)?;
}

API Key Environment Variables

Provider	Environment Variable
OpenAI	`OPENAI_API_KEY`
Anthropic	`ANTHROPIC_API_KEY`
Google (Gemini)	`GEMINI_API_KEY`
Groq	`GROQ_API_KEY`
Mistral	`MISTRAL_API_KEY`
Cohere	`CO_API_KEY`
AWS Bedrock	`AWS_ACCESS_KEY_ID` + `AWS_SECRET_ACCESS_KEY`

Provider Routing

Model routing uses a name prefix convention. The prefix before the / determines which provider handles the request:

# OpenAI
response = await client.chat(model="openai/gpt-4o", messages=[...])

# Anthropic
response = await client.chat(model="anthropic/claude-sonnet-4-20250514", messages=[...])

# Google Gemini
response = await client.chat(model="google/gemini-2.0-flash", messages=[...])

# Groq
response = await client.chat(model="groq/llama3-70b", messages=[...])

# Mistral
response = await client.chat(model="mistral/mistral-large-latest", messages=[...])

# Azure OpenAI
response = await client.chat(model="azure/gpt-4o", messages=[...])

# AWS Bedrock
response = await client.chat(model="bedrock/anthropic.claude-v2", messages=[...])

With model_hint, you can skip the prefix:

client = LlmClient(api_key="sk-...", model_hint="openai")
response = await client.chat(model="gpt-4o", messages=[...])  # routes to OpenAI

Streaming

Python

import asyncio
import os
from liter_llm import LlmClient

async def main() -> None:
    client = LlmClient(api_key=os.environ["OPENAI_API_KEY"])
    async for chunk in await client.chat_stream(
        model="openai/gpt-4o",
        messages=[{"role": "user", "content": "Tell me a story"}],
    ):
        if chunk.choices and chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="", flush=True)
    print()

asyncio.run(main())

TypeScript

import { LlmClient } from "@kreuzberg/liter-llm";

const client = new LlmClient({ apiKey: process.env.OPENAI_API_KEY! });
const chunks = await client.chatStream({
  model: "openai/gpt-4o",
  messages: [{ role: "user", content: "Tell me a story" }],
});

for (const chunk of chunks) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}
console.log();

Tool Calling

Python

import asyncio
import os
from liter_llm import LlmClient

async def main() -> None:
    client = LlmClient(api_key=os.environ["OPENAI_API_KEY"])

    tools = [
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "Get the current weather for a location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {"type": "string", "description": "City name"},
                    },
                    "required": ["location"],
                },
            },
        }
    ]

    response = await client.chat(
        model="openai/gpt-4o",
        messages=[{"role": "user", "content": "What is the weather in Berlin?"}],
        tools=tools,
    )

    choice = response.choices[0]
    if choice.message.tool_calls:
        for call in choice.message.tool_calls:
            print(f"Tool: {call.function.name}, Args: {call.function.arguments}")

asyncio.run(main())

Search and OCR

Search

import asyncio
import os
from liter_llm import LlmClient

async def main() -> None:
    client = LlmClient(api_key=os.environ["BRAVE_API_KEY"])
    response = await client.search(
        model="brave/web-search",
        query="What is Rust programming language?",
        max_results=5,
    )
    for result in response.results:
        print(f"{result.title}: {result.url}")

asyncio.run(main())

OCR

import asyncio
import os
from liter_llm import LlmClient

async def main() -> None:
    client = LlmClient(api_key=os.environ["MISTRAL_API_KEY"])
    response = await client.ocr(
        model="mistral/mistral-ocr-latest",
        document={"type": "document_url", "url": "https://example.com/invoice.pdf"},
    )
    for page in response.pages:
        print(f"Page {page.index}: {page.markdown[:100]}...")

asyncio.run(main())

Error Handling

Python

from liter_llm import LlmClient, LlmError

try:
    response = await client.chat(model="openai/gpt-4o", messages=[...])
except LlmError as e:
    # All liter-llm errors inherit from LlmError
    # Specific variants: AuthenticationError, RateLimitedError,
    # BadRequestError, ContextWindowExceededError, ContentPolicyError,
    # NotFoundError, ServerError, ServiceUnavailableError, LlmTimeoutError,
    # BudgetExceededError
    print(f"LLM error: {e}")

TypeScript

import { LlmClient } from "@kreuzberg/liter-llm";

try {
  const response = await client.chat({ model: "openai/gpt-4o", messages: [...] });
} catch (err) {
  // Errors are plain Error objects with bracketed category in the message:
  // "[Authentication] Invalid API key"
  // "[RateLimited] Too many requests"
  // "[BadRequest] Messages must not be empty"
  const msg = (err as Error).message;
  if (msg.startsWith("[RateLimited]")) {
    // back off and retry
  } else if (msg.startsWith("[Authentication]")) {
    // check API key
  }
  console.error(msg);
}

Rust

use liter_llm::{LlmClient, LiterLlmError};

match client.chat(request).await {
    Ok(response) => println!("{}", response.choices[0].message.content.as_deref().unwrap_or("")),
    Err(LiterLlmError::Authentication { message }) => eprintln!("Auth failed: {message}"),
    Err(LiterLlmError::RateLimited { message, retry_after }) => {
        eprintln!("Rate limited: {message}, retry after: {retry_after:?}");
    }
    Err(LiterLlmError::BadRequest { message }) => eprintln!("Bad request: {message}"),
    Err(LiterLlmError::ContextWindowExceeded { message }) => eprintln!("Too long: {message}"),
    Err(LiterLlmError::Timeout) => eprintln!("Request timed out"),
    Err(e) => eprintln!("Error: {e}"),
}

Hooks

Python

from liter_llm import LlmClient

class LoggingHook:
    def on_request(self, request):
        print(f"Sending request to {request['model']}")

    def on_response(self, request, response):
        print(f"Got response: {response.usage.total_tokens} tokens")

    def on_error(self, request, error):
        print(f"Error: {error}")

client = LlmClient(api_key="sk-...")
client.add_hook(LoggingHook())

TypeScript

import { LlmClient } from "@kreuzberg/liter-llm";

const client = new LlmClient({ apiKey: process.env.OPENAI_API_KEY! });
client.addHook({
  onRequest(req) {
    console.log(`Sending: ${req.model}`);
  },
  onResponse(req, res) {
    console.log(`Tokens: ${res.usage?.totalTokens}`);
  },
  onError(req, err) {
    console.error(`Error: ${err}`);
  },
});

Common Pitfalls

Python: all methods are async. You must use await and run inside an async context. Use asyncio.run(main()) at the top level. There are no synchronous methods.
Naming conventions differ by language. TypeScript, Node.js, WASM, Java, and C# use camelCase (chatStream, apiKey, maxRetries). Python, Rust, Ruby, Go, Elixir, and PHP use snake_case (chat_stream, api_key, max_retries).
Provider prefix is required. Always use "provider/model-name" format (e.g. "openai/gpt-4o", "anthropic/claude-sonnet-4-20250514"). Without the prefix, routing will fail unless model_hint is set.
API keys are wrapped in SecretString. Keys passed to the constructor are never logged, serialized, or included in error messages. Read keys from environment variables, never hardcode them.
Streaming: first/last chunks may have null content. Always check chunk.choices[0].delta.content (Python) or chunk.choices[0]?.delta?.content (TypeScript) for null/undefined before using the value.
Rust: DefaultClient::new requires ClientConfigBuilder. Build config with ClientConfigBuilder::new(api_key).build(), then pass to DefaultClient::new(config, model_hint).
Rust: chat is async. Use #[tokio::main] or call from an async context. The LlmClient trait defines async fn chat(...).
Budget enforcement modes. "hard" rejects requests that exceed the budget. "soft" logs a warning but allows the request through. Default is no budget enforcement.
Cache is per-client. Each LlmClient instance has its own cache. Cache keys are derived from the full request (model + messages + parameters).
Go: check error returns and nil pointers. Response fields like Content are pointers -- always nil-check before dereferencing.

CLI Installation

Homebrew

brew tap kreuzberg-dev/tap
brew install liter-llm

Cargo

cargo install liter-llm-cli

Docker

docker pull ghcr.io/kreuzberg-dev/liter-llm

Proxy Server

liter-llm includes an OpenAI-compatible API gateway with 22 endpoints. It acts as a drop-in replacement for litellm proxy, routing requests to 142+ LLM providers.

Features

22 OpenAI-compatible REST endpoints (chat, embeddings, images, audio, files, batches, responses, models)
Automatic model routing via provider prefixes
Virtual API keys for multi-tenant access control
Rate limiting (RPM/TPM per key or globally)
Cost tracking and budget enforcement
Response caching
SSE streaming
OpenAPI 3.1 spec at /openapi.json

Running the Proxy

liter-llm api --config liter-llm-proxy.toml

Docker Quickstart

docker run -p 4000:4000 \
  -e LITER_LLM_MASTER_KEY=sk-key \
  ghcr.io/kreuzberg-dev/liter-llm

The Docker image (ghcr.io/kreuzberg-dev/liter-llm) is a 35MB Chainguard-based image.

Proxy Configuration

The proxy uses TOML configuration with ${ENV_VAR} interpolation. It auto-discovers liter-llm-proxy.toml in the current directory.

[server]
host = "0.0.0.0"
port = 4000

[auth]
master_key = "${LITER_LLM_MASTER_KEY}"

[[virtual_keys]]
key = "sk-team-frontend"
models = ["openai/*", "anthropic/*"]
rpm = 60
tpm = 100000
budget = 50.0

[[providers]]
name = "openai"
api_key = "${OPENAI_API_KEY}"

[[providers]]
name = "anthropic"
api_key = "${ANTHROPIC_API_KEY}"

MCP Server

liter-llm includes a Model Context Protocol (MCP) server exposing 22 tools that match the REST API endpoints. This allows MCP-compatible clients (Claude Desktop, Claude Code, etc.) to call LLM APIs through liter-llm.

Running the MCP Server

# stdio transport (for Claude Desktop / Claude Code)
liter-llm mcp --transport stdio

# HTTP transport
liter-llm mcp --transport http --port 3001

MCP Tools

The MCP server exposes tools matching the proxy API: chat completions, streaming, embeddings, image generation, speech, transcription, moderation, search, OCR, reranking, file operations, batch operations, responses, and model listing.

Additional Resources

Detailed reference files for specific topics:

Python API Reference -- All functions, config, types, error hierarchy
TypeScript API Reference -- All functions with camelCase conventions
Rust API Reference -- Traits, Tower middleware, feature flags
Other Language Bindings -- Go, Java, C#, Ruby, PHP, Elixir, WASM, C FFI
Configuration Reference -- All config options, cache backends, middleware
Provider Reference -- Routing, auth types, custom providers
Advanced Features -- Search, OCR, OpenDAL cache, Tower stack, tracing

Full documentation: https://docs.liter-llm.kreuzberg.dev GitHub: https://github.com/kreuzberg-dev/liter-llm

name	liter-llm
description	Universal LLM API client for 142+ providers with native bindings for 11 languages. Use when writing code that calls LLM APIs via liter-llm in Python, TypeScript, Rust, Go, Java, C#, Ruby, PHP, Elixir, WASM, or C. Covers chat, streaming, embeddings, image generation, speech, transcription, moderation, reranking, search, OCR, tool calling, and configuration.
license	MIT
metadata	{"author":"kreuzberg-dev","version":"1.0","repository":"https://github.com/kreuzberg-dev/liter-llm"}

Liter-LLM Universal LLM Client

Use this skill when writing code that:

Calls LLM APIs (chat completions, streaming, embeddings) via liter-llm
Configures liter-llm clients (API keys, timeouts, retries, cache, budget, hooks)
Uses tool calling / function calling with LLM providers
Implements streaming responses from LLMs
Uses search, OCR, image generation, speech, transcription, moderation, or reranking APIs
Routes requests to specific providers using model prefixes
Handles LLM API errors across any of the 11 supported languages

Installation

Python

pip install liter-llm

TypeScript / Node.js

pnpm add @kreuzberg/liter-llm

Rust

# Cargo.toml
[dependencies]
liter-llm = "0.1"

Go

go get github.com/kreuzberg-dev/liter-llm/packages/go

Java

<!-- pom.xml -->
<dependency>
    <groupId>dev.kreuzberg</groupId>
    <artifactId>liter-llm</artifactId>
    <version>1.4.0-rc.17</version>
</dependency>

C# (.NET)

dotnet add package LiterLlm

Ruby

gem install liter_llm

PHP

composer require kreuzberg/liter-llm

Elixir

# mix.exs
{:liter_llm, "~> 1.4.0-rc.17"}

WebAssembly

pnpm add @kreuzberg/liter-llm-wasm

C / FFI

cargo build --release -p liter-llm-ffi
# Produces libliter_llm_ffi.so / .dylib / .dll

Quick Start

Python (Async)

import asyncio
import os
from liter_llm import LlmClient

async def main() -> None:
    client = LlmClient(api_key=os.environ["OPENAI_API_KEY"])
    response = await client.chat(
        model="openai/gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}],
    )
    print(response.choices[0].message.content)

asyncio.run(main())

TypeScript

import { LlmClient } from "@kreuzberg/liter-llm";

const client = new LlmClient({ apiKey: process.env.OPENAI_API_KEY! });
const response = await client.chat({
  model: "openai/gpt-4o",
  messages: [{ role: "user", content: "Hello!" }],
});
console.log(response.choices[0].message.content);

Rust

use liter_llm::{
    ChatCompletionRequest, ClientConfigBuilder, DefaultClient, LlmClient,
    Message, UserContent, UserMessage,
};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let config = ClientConfigBuilder::new(std::env::var("OPENAI_API_KEY")?)
        .build();
    let client = DefaultClient::new(config, Some("openai/gpt-4o"))?;

    let request = ChatCompletionRequest {
        model: "openai/gpt-4o".into(),
        messages: vec![Message::User(UserMessage {
            content: UserContent::Text("Hello!".into()),
            name: None,
        })],
        ..Default::default()
    };

    let response = client.chat(request).await?;
    if let Some(choice) = response.choices.first() {
        println!("{}", choice.message.content.as_deref().unwrap_or(""));
    }
    Ok(())
}

Go

package main

import (
 "context"
 "fmt"
 "os"

 llm "github.com/kreuzberg-dev/liter-llm/packages/go"
)

func main() {
 client := llm.NewClient(llm.WithAPIKey(os.Getenv("OPENAI_API_KEY")))
 resp, err := client.Chat(context.Background(), &llm.ChatCompletionRequest{
  Model: "openai/gpt-4o",
  Messages: []llm.Message{
   llm.NewTextMessage(llm.RoleUser, "Hello!"),
  },
 })
 if err != nil {
  panic(err)
 }
 if len(resp.Choices) > 0 && resp.Choices[0].Message.Content != nil {
  fmt.Println(*resp.Choices[0].Message.Content)
 }
}

Configuration

All languages use the same configuration structure with language-appropriate naming conventions (snake_case for Python/Rust/Ruby/Go/Elixir/PHP, camelCase for TypeScript/Node.js/WASM/C#/Java).

Python

from liter_llm import LlmClient

client = LlmClient(
    api_key="sk-...",
    base_url="https://custom-proxy.example.com/v1",  # override provider URL
    model_hint="openai",            # pre-resolve provider (skip prefix lookup)
    max_retries=3,                  # retries on 429/5xx with exponential backoff
    timeout=60,                     # request timeout in seconds
    cache={"max_entries": 256, "ttl_seconds": 300},
    budget={"global_limit": 10.0, "model_limits": {"openai/gpt-4o": 5.0}, "enforcement": "hard"},
    cooldown=30,                    # circuit breaker after transient errors
    rate_limit={"rpm": 60, "tpm": 100000},
    health_check=60,                # background provider health checks
    cost_tracking=True,             # per-request cost tracking
    tracing=True,                   # OpenTelemetry tracing spans
)
client.add_hook(MyLoggingHook())

TypeScript

import { LlmClient } from "@kreuzberg/liter-llm";

const client = new LlmClient({
  apiKey: process.env.OPENAI_API_KEY!,
  baseUrl: "https://custom-proxy.example.com/v1",
  modelHint: "openai",
  maxRetries: 3,
  timeout: 60,
  cache: { maxEntries: 256, ttlSeconds: 300 },
  budget: { globalLimit: 10.0, modelLimits: { "openai/gpt-4o": 5.0 }, enforcement: "hard" },
  cooldown: 30,
  rateLimit: { rpm: 60, tpm: 100000 },
  healthCheck: 60,
  costTracking: true,
  tracing: true,
});

Configuration Options

Option	Type	Default	Description
`api_key`	string	required	Provider API key. Wrapped in `SecretString` internally.
`base_url`	string	from registry	Override the provider's base URL.
`model_hint`	string	none	Pre-resolve a provider at construction (e.g. `"openai"`).
`timeout`	duration	60s	Request timeout.
`max_retries`	int	3	Retries on 429/5xx responses with exponential backoff.
`cache`	object	none	Response caching config (`max_entries`, `ttl_seconds`).
`budget`	object	none	Spending limits (`global_limit`, `model_limits`, `enforcement`).
`cooldown`	int	none	Circuit breaker cooldown in seconds after transient errors.
`rate_limit`	object	none	Rate limiting (`rpm`, `tpm`).
`health_check`	int	none	Background health check interval in seconds.
`cost_tracking`	bool	false	Enable per-request cost tracking.
`tracing`	bool	false	Enable OpenTelemetry tracing spans.

Configuration File

Instead of passing all options to the constructor, create a liter-llm.toml file in your project directory. liter-llm auto-discovers it by searching the current directory and parent directories.

api_key = "sk-..."
base_url = "https://api.openai.com/v1"
model_hint = "openai"
timeout_secs = 120
max_retries = 5

[cache]
max_entries = 512
ttl_seconds = 600

[budget]
global_limit = 50.0
enforcement = "hard"

[budget.model_limits]
"openai/gpt-4o" = 25.0

[rate_limit]
rpm = 60
tpm = 100000

cooldown_secs = 30
health_check_secs = 60
cost_tracking = true
tracing = true

[[providers]]
name = "my-provider"
base_url = "https://my-llm.example.com/v1"
model_prefixes = ["my-provider/"]

Load from code:

# Python -- auto-discover
client = LlmClient.from_config()
# Or explicit path
client = LlmClient.from_config("path/to/config.toml")

// TypeScript -- auto-discover
const client = await LlmClient.fromConfig();

// Rust -- auto-discover
if let Some(config) = FileConfig::discover()? {
    let client = ManagedClient::new(config.into_builder().build(), None)?;
}

API Key Environment Variables

Provider	Environment Variable
OpenAI	`OPENAI_API_KEY`
Anthropic	`ANTHROPIC_API_KEY`
Google (Gemini)	`GEMINI_API_KEY`
Groq	`GROQ_API_KEY`
Mistral	`MISTRAL_API_KEY`
Cohere	`CO_API_KEY`
AWS Bedrock	`AWS_ACCESS_KEY_ID` + `AWS_SECRET_ACCESS_KEY`

Provider Routing

Model routing uses a name prefix convention. The prefix before the / determines which provider handles the request:

# OpenAI
response = await client.chat(model="openai/gpt-4o", messages=[...])

# Anthropic
response = await client.chat(model="anthropic/claude-sonnet-4-20250514", messages=[...])

# Google Gemini
response = await client.chat(model="google/gemini-2.0-flash", messages=[...])

# Groq
response = await client.chat(model="groq/llama3-70b", messages=[...])

# Mistral
response = await client.chat(model="mistral/mistral-large-latest", messages=[...])

# Azure OpenAI
response = await client.chat(model="azure/gpt-4o", messages=[...])

# AWS Bedrock
response = await client.chat(model="bedrock/anthropic.claude-v2", messages=[...])

With model_hint, you can skip the prefix:

client = LlmClient(api_key="sk-...", model_hint="openai")
response = await client.chat(model="gpt-4o", messages=[...])  # routes to OpenAI

Streaming

Python

import asyncio
import os
from liter_llm import LlmClient

async def main() -> None:
    client = LlmClient(api_key=os.environ["OPENAI_API_KEY"])
    async for chunk in await client.chat_stream(
        model="openai/gpt-4o",
        messages=[{"role": "user", "content": "Tell me a story"}],
    ):
        if chunk.choices and chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="", flush=True)
    print()

asyncio.run(main())

TypeScript

import { LlmClient } from "@kreuzberg/liter-llm";

const client = new LlmClient({ apiKey: process.env.OPENAI_API_KEY! });
const chunks = await client.chatStream({
  model: "openai/gpt-4o",
  messages: [{ role: "user", content: "Tell me a story" }],
});

for (const chunk of chunks) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}
console.log();

Tool Calling

Python

import asyncio
import os
from liter_llm import LlmClient

async def main() -> None:
    client = LlmClient(api_key=os.environ["OPENAI_API_KEY"])

    tools = [
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "Get the current weather for a location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {"type": "string", "description": "City name"},
                    },
                    "required": ["location"],
                },
            },
        }
    ]

    response = await client.chat(
        model="openai/gpt-4o",
        messages=[{"role": "user", "content": "What is the weather in Berlin?"}],
        tools=tools,
    )

    choice = response.choices[0]
    if choice.message.tool_calls:
        for call in choice.message.tool_calls:
            print(f"Tool: {call.function.name}, Args: {call.function.arguments}")

asyncio.run(main())

Search and OCR

Search

import asyncio
import os
from liter_llm import LlmClient

async def main() -> None:
    client = LlmClient(api_key=os.environ["BRAVE_API_KEY"])
    response = await client.search(
        model="brave/web-search",
        query="What is Rust programming language?",
        max_results=5,
    )
    for result in response.results:
        print(f"{result.title}: {result.url}")

asyncio.run(main())

OCR

import asyncio
import os
from liter_llm import LlmClient

async def main() -> None:
    client = LlmClient(api_key=os.environ["MISTRAL_API_KEY"])
    response = await client.ocr(
        model="mistral/mistral-ocr-latest",
        document={"type": "document_url", "url": "https://example.com/invoice.pdf"},
    )
    for page in response.pages:
        print(f"Page {page.index}: {page.markdown[:100]}...")

asyncio.run(main())

Error Handling

Python

from liter_llm import LlmClient, LlmError

try:
    response = await client.chat(model="openai/gpt-4o", messages=[...])
except LlmError as e:
    # All liter-llm errors inherit from LlmError
    # Specific variants: AuthenticationError, RateLimitedError,
    # BadRequestError, ContextWindowExceededError, ContentPolicyError,
    # NotFoundError, ServerError, ServiceUnavailableError, LlmTimeoutError,
    # BudgetExceededError
    print(f"LLM error: {e}")

TypeScript

import { LlmClient } from "@kreuzberg/liter-llm";

try {
  const response = await client.chat({ model: "openai/gpt-4o", messages: [...] });
} catch (err) {
  // Errors are plain Error objects with bracketed category in the message:
  // "[Authentication] Invalid API key"
  // "[RateLimited] Too many requests"
  // "[BadRequest] Messages must not be empty"
  const msg = (err as Error).message;
  if (msg.startsWith("[RateLimited]")) {
    // back off and retry
  } else if (msg.startsWith("[Authentication]")) {
    // check API key
  }
  console.error(msg);
}

Rust

use liter_llm::{LlmClient, LiterLlmError};

match client.chat(request).await {
    Ok(response) => println!("{}", response.choices[0].message.content.as_deref().unwrap_or("")),
    Err(LiterLlmError::Authentication { message }) => eprintln!("Auth failed: {message}"),
    Err(LiterLlmError::RateLimited { message, retry_after }) => {
        eprintln!("Rate limited: {message}, retry after: {retry_after:?}");
    }
    Err(LiterLlmError::BadRequest { message }) => eprintln!("Bad request: {message}"),
    Err(LiterLlmError::ContextWindowExceeded { message }) => eprintln!("Too long: {message}"),
    Err(LiterLlmError::Timeout) => eprintln!("Request timed out"),
    Err(e) => eprintln!("Error: {e}"),
}

Hooks

Python

from liter_llm import LlmClient

class LoggingHook:
    def on_request(self, request):
        print(f"Sending request to {request['model']}")

    def on_response(self, request, response):
        print(f"Got response: {response.usage.total_tokens} tokens")

    def on_error(self, request, error):
        print(f"Error: {error}")

client = LlmClient(api_key="sk-...")
client.add_hook(LoggingHook())

TypeScript

import { LlmClient } from "@kreuzberg/liter-llm";

const client = new LlmClient({ apiKey: process.env.OPENAI_API_KEY! });
client.addHook({
  onRequest(req) {
    console.log(`Sending: ${req.model}`);
  },
  onResponse(req, res) {
    console.log(`Tokens: ${res.usage?.totalTokens}`);
  },
  onError(req, err) {
    console.error(`Error: ${err}`);
  },
});

Common Pitfalls

Python: all methods are async. You must use await and run inside an async context. Use asyncio.run(main()) at the top level. There are no synchronous methods.
Naming conventions differ by language. TypeScript, Node.js, WASM, Java, and C# use camelCase (chatStream, apiKey, maxRetries). Python, Rust, Ruby, Go, Elixir, and PHP use snake_case (chat_stream, api_key, max_retries).
Provider prefix is required. Always use "provider/model-name" format (e.g. "openai/gpt-4o", "anthropic/claude-sonnet-4-20250514"). Without the prefix, routing will fail unless model_hint is set.
API keys are wrapped in SecretString. Keys passed to the constructor are never logged, serialized, or included in error messages. Read keys from environment variables, never hardcode them.
Streaming: first/last chunks may have null content. Always check chunk.choices[0].delta.content (Python) or chunk.choices[0]?.delta?.content (TypeScript) for null/undefined before using the value.
Rust: DefaultClient::new requires ClientConfigBuilder. Build config with ClientConfigBuilder::new(api_key).build(), then pass to DefaultClient::new(config, model_hint).
Rust: chat is async. Use #[tokio::main] or call from an async context. The LlmClient trait defines async fn chat(...).
Budget enforcement modes. "hard" rejects requests that exceed the budget. "soft" logs a warning but allows the request through. Default is no budget enforcement.
Cache is per-client. Each LlmClient instance has its own cache. Cache keys are derived from the full request (model + messages + parameters).
Go: check error returns and nil pointers. Response fields like Content are pointers -- always nil-check before dereferencing.

CLI Installation

Homebrew

brew tap kreuzberg-dev/tap
brew install liter-llm

Cargo

cargo install liter-llm-cli

Docker

docker pull ghcr.io/kreuzberg-dev/liter-llm

Proxy Server

liter-llm includes an OpenAI-compatible API gateway with 22 endpoints. It acts as a drop-in replacement for litellm proxy, routing requests to 142+ LLM providers.

Features

22 OpenAI-compatible REST endpoints (chat, embeddings, images, audio, files, batches, responses, models)
Automatic model routing via provider prefixes
Virtual API keys for multi-tenant access control
Rate limiting (RPM/TPM per key or globally)
Cost tracking and budget enforcement
Response caching
SSE streaming
OpenAPI 3.1 spec at /openapi.json

Running the Proxy

liter-llm api --config liter-llm-proxy.toml

Docker Quickstart

docker run -p 4000:4000 \
  -e LITER_LLM_MASTER_KEY=sk-key \
  ghcr.io/kreuzberg-dev/liter-llm

The Docker image (ghcr.io/kreuzberg-dev/liter-llm) is a 35MB Chainguard-based image.

Proxy Configuration

The proxy uses TOML configuration with ${ENV_VAR} interpolation. It auto-discovers liter-llm-proxy.toml in the current directory.

[server]
host = "0.0.0.0"
port = 4000

[auth]
master_key = "${LITER_LLM_MASTER_KEY}"

[[virtual_keys]]
key = "sk-team-frontend"
models = ["openai/*", "anthropic/*"]
rpm = 60
tpm = 100000
budget = 50.0

[[providers]]
name = "openai"
api_key = "${OPENAI_API_KEY}"

[[providers]]
name = "anthropic"
api_key = "${ANTHROPIC_API_KEY}"

MCP Server

Running the MCP Server

# stdio transport (for Claude Desktop / Claude Code)
liter-llm mcp --transport stdio

# HTTP transport
liter-llm mcp --transport http --port 3001

MCP Tools

Additional Resources

Detailed reference files for specific topics:

Python API Reference -- All functions, config, types, error hierarchy
TypeScript API Reference -- All functions with camelCase conventions
Rust API Reference -- Traits, Tower middleware, feature flags
Other Language Bindings -- Go, Java, C#, Ruby, PHP, Elixir, WASM, C FFI
Configuration Reference -- All config options, cache backends, middleware
Provider Reference -- Routing, auth types, custom providers
Advanced Features -- Search, OCR, OpenDAL cache, Tower stack, tracing

Full documentation: https://docs.liter-llm.kreuzberg.dev GitHub: https://github.com/kreuzberg-dev/liter-llm

liter-llm

이 저장소의 다른 Skills

이 저장소의 다른 Skills

Liter-LLM Universal LLM Client

Installation

Python

TypeScript / Node.js

Rust

Go

Java

C# (.NET)

Ruby

PHP

Elixir

WebAssembly

C / FFI

Quick Start

Python (Async)

TypeScript

Rust

Go

Configuration

Python

TypeScript

Configuration Options

Configuration File

API Key Environment Variables

Provider Routing

Streaming

Python

TypeScript

Tool Calling

Python

Search and OCR

Search

OCR

Error Handling

Python

TypeScript

Rust

Hooks

Python

TypeScript

Common Pitfalls

CLI Installation

Homebrew

Cargo

Docker

Proxy Server

Features

Running the Proxy

Docker Quickstart

Proxy Configuration

MCP Server

Running the MCP Server

MCP Tools

Additional Resources

Liter-LLM Universal LLM Client

Installation

Python

TypeScript / Node.js

Rust

Go

Java

C# (.NET)

Ruby

PHP

Elixir

WebAssembly

C / FFI

Quick Start

Python (Async)

TypeScript

Rust

Go

Configuration

Python

TypeScript

Configuration Options

Configuration File