con un clic
ai-infrastructure-litellm
LiteLLM proxy server setup, TypeScript client patterns via OpenAI SDK, model routing, fallbacks, load balancing, spend tracking, virtual keys, and production deployment
Menú
LiteLLM proxy server setup, TypeScript client patterns via OpenAI SDK, model routing, fallbacks, load balancing, spend tracking, virtual keys, and production deployment
| name | ai-infrastructure-litellm |
| description | LiteLLM proxy server setup, TypeScript client patterns via OpenAI SDK, model routing, fallbacks, load balancing, spend tracking, virtual keys, and production deployment |
Quick Guide: LiteLLM is an OpenAI-compatible proxy (AI gateway) that routes requests to 100+ LLM providers. TypeScript clients connect via the standard OpenAI SDK with
baseURLpointed at the proxy. Configure models, fallbacks, load balancing, and budgets inconfig.yaml. Useprovider/model-nameformat inlitellm_params.model(e.g.,anthropic/claude-sonnet-4-20250514). Themodel_namein config is the user-facing alias clients request. Virtual keys require PostgreSQL. Master key must start withsk-.
<critical_requirements>
All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering,
import type, named constants)
(You MUST use the provider/model-name format in litellm_params.model -- e.g., anthropic/claude-sonnet-4-20250514, openai/gpt-4o, azure/my-deployment -- the provider prefix is how LiteLLM routes to the correct API)
(You MUST set model_name as the user-facing alias that clients request -- this is NOT the provider model ID, it is the name your TypeScript client passes as model)
(You MUST point the OpenAI SDK baseURL at the proxy URL (e.g., http://localhost:4000) and pass the proxy key as apiKey -- do NOT use provider API keys directly in client code)
(You MUST start master keys with sk- -- LiteLLM rejects master keys that do not follow this prefix convention)
(You MUST configure database_url pointing to PostgreSQL before using virtual keys, spend tracking, or team/user management -- these features require persistent storage)
</critical_requirements>
Auto-detection: LiteLLM, litellm, litellm_params, litellm_settings, LLM proxy, LLM gateway, model_list, master_key, virtual keys, model fallback, load balancing LLM, provider/model, anthropic/claude, openai/gpt, azure/, litellm --config, LITELLM_MASTER_KEY, LITELLM_SALT_KEY
When to use:
Key patterns covered:
When NOT to use:
LiteLLM Proxy is an AI gateway -- a single OpenAI-compatible endpoint that routes to 100+ LLM providers. TypeScript applications never talk to providers directly; they talk to the proxy using the standard OpenAI SDK.
Core principles:
baseURL and standard OpenAI SDK. Switching providers means changing config.yaml, not application code.model_name is what clients request (e.g., "claude-sonnet"). litellm_params.model is the actual provider routing (e.g., "anthropic/claude-sonnet-4-20250514"). This decouples client code from provider specifics.config.yaml. No application-level retry logic needed.The proxy needs a config.yaml with at least one model defined. model_name is client-facing; litellm_params.model is the provider route.
# config.yaml
model_list:
- model_name: claude-sonnet # What clients request
litellm_params:
model: anthropic/claude-sonnet-4-20250514 # Provider/model route
api_key: os.environ/ANTHROPIC_API_KEY # Never hardcode keys
- model_name: gpt-4o
litellm_params:
model: openai/gpt-4o
api_key: os.environ/OPENAI_API_KEY
Why good: Two-layer naming decouples clients from providers, os.environ/ syntax reads secrets from environment at runtime
# BAD: Missing provider prefix, hardcoded key
model_list:
- model_name: claude-sonnet-4-20250514 # Using provider model ID as name
litellm_params:
model: claude-sonnet-4-20250514 # No provider prefix -- routing fails
api_key: sk-ant-abc123 # Hardcoded API key
Why bad: Without anthropic/ prefix, LiteLLM cannot route to the correct provider; hardcoded keys are a security risk; using the provider model ID as model_name couples clients to provider naming
See: examples/core.md for complete config with general_settings, Docker setup
Connect to the proxy using the standard OpenAI SDK. Point baseURL at the proxy, use the proxy key as apiKey.
// lib/llm-client.ts
import OpenAI from "openai";
const PROXY_URL = "http://localhost:4000";
const client = new OpenAI({
baseURL: PROXY_URL,
apiKey: process.env.LITELLM_API_KEY, // Virtual key or master key
});
export { client };
// usage.ts
import { client } from "./lib/llm-client.js";
const completion = await client.chat.completions.create({
model: "claude-sonnet", // model_name from config.yaml, NOT provider model ID
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Explain TypeScript generics." },
],
});
console.log(completion.choices[0].message.content);
Why good: Standard OpenAI SDK, no custom dependencies; model name matches config.yaml model_name; proxy key keeps provider keys server-side
// BAD: Using provider model ID, provider API key
const client = new OpenAI({
baseURL: "http://localhost:4000",
apiKey: process.env.ANTHROPIC_API_KEY, // Wrong -- use proxy key
});
const completion = await client.chat.completions.create({
model: "anthropic/claude-sonnet-4-20250514", // Wrong -- use model_name alias
messages: [{ role: "user", content: "Hello" }],
});
Why bad: Provider API key bypasses proxy auth and virtual key controls; using provider model ID instead of alias couples client to provider naming and bypasses proxy routing logic
See: examples/core.md for streaming, metadata tagging
Configure model fallbacks so requests automatically retry on a different model when the primary fails.
# config.yaml
model_list:
- model_name: claude-sonnet
litellm_params:
model: anthropic/claude-sonnet-4-20250514
api_key: os.environ/ANTHROPIC_API_KEY
- model_name: gpt-4o
litellm_params:
model: openai/gpt-4o
api_key: os.environ/OPENAI_API_KEY
litellm_settings:
num_retries: 2 # Retries per model before fallback
fallbacks: [{ "claude-sonnet": ["gpt-4o"] }] # General fallback chain
context_window_fallbacks: [{ "gpt-4o": ["claude-sonnet"] }] # Context overflow fallback
default_fallbacks: ["gpt-4o"] # Catch-all for any model failure
Why good: Fallbacks use model_name aliases (not provider IDs), ordered chains tried sequentially, separate chains for context overflow vs general errors
See: examples/routing.md for content policy fallbacks, combining with load balancing
Multiple entries with the same model_name create a load-balanced group. The proxy distributes requests using the configured strategy.
model_list:
- model_name: gpt-4o
litellm_params:
model: azure/gpt-4o-eastus
api_base: https://eastus.openai.azure.com/
api_key: os.environ/AZURE_EASTUS_KEY
rpm: 100 # Requests per minute for this deployment
- model_name: gpt-4o
litellm_params:
model: azure/gpt-4o-westus
api_base: https://westus.openai.azure.com/
api_key: os.environ/AZURE_WESTUS_KEY
rpm: 100
router_settings:
routing_strategy: usage-based-routing # Route to deployment with lowest RPM/TPM usage
num_retries: 2
timeout: 30
Why good: Same model_name across entries creates automatic load balancing, rpm/tpm limits per deployment enable usage-aware routing
See: examples/routing.md for all five routing strategies, priority routing with order
Virtual keys let you distribute access with per-key budgets, rate limits, and model restrictions. Requires PostgreSQL.
# config.yaml
general_settings:
master_key: sk-litellm-master-key-change-me # Must start with sk-
database_url: os.environ/DATABASE_URL # PostgreSQL required
# Generate a virtual key via API
curl 'http://localhost:4000/key/generate' \
-H 'Authorization: Bearer sk-litellm-master-key-change-me' \
-H 'Content-Type: application/json' \
-d '{
"models": ["claude-sonnet", "gpt-4o"],
"max_budget": 50.0,
"duration": "30d",
"metadata": {"team": "backend", "project": "search"}
}'
# Returns: { "key": "sk-generated-key-abc123", ... }
Why good: Per-key model restrictions, budget caps, and expiry; metadata enables tag-based spend tracking; master key authentication protects key generation
See: examples/keys-and-spend.md for team management, spend queries, rate limit tiers
Attach metadata tags to requests for granular cost attribution. The proxy tracks spend automatically per key, user, team, and tag.
// Tag requests for cost attribution
const completion = await client.chat.completions.create({
model: "claude-sonnet",
messages: [{ role: "user", content: "Summarize this document." }],
// LiteLLM-specific: pass metadata for spend tracking
metadata: {
tags: ["project:search", "team:backend"],
trace_user_id: "user-123",
},
} as any); // metadata is a LiteLLM extension, not in OpenAI types
Why good: Tags enable cost attribution by project, team, or feature without changing model routing; cost appears in x-litellm-response-cost response header
When to use: When you need cost visibility across teams, projects, or features
See: examples/keys-and-spend.md for querying spend by tag, user, and team
<decision_framework>
Do you call multiple LLM providers?
+-- YES -> LiteLLM Proxy adds value (unified API, routing, fallbacks)
+-- NO -> Do you need budgets, rate limits, or virtual keys?
+-- YES -> LiteLLM Proxy (governance layer)
+-- NO -> Do you need fallbacks or load balancing?
+-- YES -> LiteLLM Proxy (reliability layer)
+-- NO -> Use the provider SDK directly (simpler)
What is your priority?
+-- Even distribution -> simple-shuffle (default)
+-- Minimize latency -> latency-based-routing
+-- Respect rate limits -> usage-based-routing
+-- Minimize cost -> cost-based-routing
+-- Handle concurrent load -> least-busy
Do you have multiple teams or users?
+-- YES -> Virtual keys (per-team budgets, model restrictions)
| Requires: PostgreSQL database
+-- NO -> Do you need spend tracking?
+-- YES -> Virtual keys (even for single user, enables spend logs)
| Requires: PostgreSQL database
+-- NO -> Master key only (simplest setup, no database needed)
</decision_framework>
<red_flags>
High Priority Issues:
litellm_params.model (e.g., claude-sonnet-4-20250514 instead of anthropic/claude-sonnet-4-20250514) -- proxy cannot route without the prefixos.environ/VAR_NAME -- security breach riskmodel_name -- couples all clients to provider naming, breaks when you switch providerssk- -- LiteLLM silently rejects itdatabase_url -- key generation failsMedium Priority Issues:
num_retries in litellm_settings -- defaults to 0, no retries on transient failuresmodel_name (client-facing alias) with litellm_params.model (provider route) -- most common config mistakerpm/tpm on deployments when using usage-based-routing -- routing strategy has no data to work withLITELLM_SALT_KEY in production -- virtual key credentials stored without encryptionCommon Mistakes:
anthropic/claude-sonnet-4-20250514 as the model parameter in TypeScript client code -- use the model_name alias insteadmetadata field to be typed in OpenAI SDK -- it is a LiteLLM extension, requires as any or extra_bodymodel_name aliases -- fallbacks reference model names, not provider routesconfig.yaml changes require proxy restart (or use the /config/update API endpoint)Gotchas & Edge Cases:
os.environ/ syntax in config.yaml (no $ prefix) is LiteLLM-specific -- not standard YAML environment variable substitutionmodel_name matching is exact -- "claude-sonnet" and "Claude-Sonnet" are different modelsdefault_fallbacks, they do NOT apply to ContentPolicyViolationError or ContextWindowExceededError -- use specialized fallback types for thoserpm/tpm limits in config are per-deployment, not per-model-group -- a model group with 3 deployments at rpm: 100 each gets 300 RPM totalspend field on a key may lag a few seconds behind actual usage/v1/ prefix on endpoints is optional -- both http://localhost:4000/chat/completions and http://localhost:4000/v1/chat/completions workhttp://localhost:4000/ui when the proxy is running</red_flags>
<critical_reminders>
All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering,
import type, named constants)
(You MUST use the provider/model-name format in litellm_params.model -- e.g., anthropic/claude-sonnet-4-20250514, openai/gpt-4o, azure/my-deployment -- the provider prefix is how LiteLLM routes to the correct API)
(You MUST set model_name as the user-facing alias that clients request -- this is NOT the provider model ID, it is the name your TypeScript client passes as model)
(You MUST point the OpenAI SDK baseURL at the proxy URL (e.g., http://localhost:4000) and pass the proxy key as apiKey -- do NOT use provider API keys directly in client code)
(You MUST start master keys with sk- -- LiteLLM rejects master keys that do not follow this prefix convention)
(You MUST configure database_url pointing to PostgreSQL before using virtual keys, spend tracking, or team/user management -- these features require persistent storage)
Failure to follow these rules will produce misconfigured proxies with broken routing, security issues, or missing spend data.
</critical_reminders>
Hugging Face Inference SDK patterns for TypeScript/Node.js — InferenceClient setup, chat completion, text generation, streaming, embeddings, image generation, audio transcription, translation, summarization, and Inference Endpoints
Serverless GPU compute platform for AI model deployment — web endpoints, GPU functions, model serving, and TypeScript client patterns
Local LLM inference with the Ollama JavaScript client -- chat, streaming, tool calling, vision, embeddings, structured output, model management, and OpenAI-compatible endpoint
Replicate SDK patterns for TypeScript/Node.js -- client setup, predictions, streaming, webhooks, file handling, model versioning, deployments, and training
Together AI SDK patterns for TypeScript — client setup, chat completions, streaming, structured output, function calling, embeddings, image generation, fine-tuning, and OpenAI-compatible endpoints
LLM observability with Langfuse — OpenTelemetry-based tracing, evaluations, prompt management, datasets, and production best practices