name: llm-gateway
description: LLM Gateway interface for all Anthropic API calls in v2. Use when writing any code that calls an LLM, adding new model providers, or understanding how LLM costs are tracked. Status: PLANNED — not yet built.
allowed-tools: [Read, Grep, Bash(python*)]
LLM Gateway
STATUS: BUILT (Phase 1.2, 2026-03-31) — backend/services/llm_gateway.py is live. Interface below is authoritative.
The Rule
NEVER call the Anthropic API directly in new v2 code. All LLM calls go through call_llm(). This is not optional.
client = anthropic.Anthropic(api_key=...)
response = client.messages.create(model="claude-haiku-4-5", ...)
from backend.services.llm_gateway import call_llm
response = await call_llm(prompt=..., model="haiku", user_id=..., surface="chat")
Gateway Interface
async def call_llm(
prompt: str,
model: str,
user_id: str,
surface: str,
system_prompt: str = None,
max_tokens: int = 1000,
trace_metadata: dict = None,
) -> LLMResponse:
"""
Single entry point for all LLM calls.
- Resolves model alias to provider + model_id
- Creates Langfuse trace with trace_id
- Records token usage + cost in token_usage table
- Returns LLMResponse with content, trace_id, usage
"""
LLMResponse Shape
@dataclass
class LLMResponse:
text: str
model_key: str
model_id: str
tokens_in: int
tokens_out: int
cost_usd: float
trace_id: Optional[str]
latency_ms: int
parsed: Optional[dict]
confidence: Optional[float]
canonical_label: Optional[str]
input_hash: str
@property
def is_high_confidence(self) -> bool: ...
@property
def cache_eligible(self) -> bool: ...
Why confidence and canonical_label are first-class (not buried in parsed):
- Tier 1 merchant cache needs
cache_eligible without parsing JSON
token_usage training data query: WHERE surface='classification' AND confidence >= 0.85
- Downstream consumers shouldn't need to know the response JSON schema
Model Registry
MODEL_REGISTRY = {
"haiku": {
"provider": "anthropic",
"model_id": "claude-haiku-4-5-20251001",
"input_cost_per_m": 0.25,
"output_cost_per_m": 1.25,
},
"sonnet": {
"provider": "anthropic",
"model_id": "claude-sonnet-4-6",
"input_cost_per_m": 3.00,
"output_cost_per_m": 15.00,
},
}
Surface Routing Guide
| surface | model | rationale |
|---|
classification | haiku | High volume, short input/output |
ambient | haiku | High frequency, 1-sentence output |
chat (simple) | haiku | Single-topic factual queries |
chat (complex) | sonnet | Multi-turn reasoning, analysis |
review | sonnet | Stakes are high, premium experience |
planning | sonnet | Actions that modify data |
Token Usage Table
Every call_llm() creates a row:
CREATE TABLE token_usage (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id TEXT NOT NULL,
surface TEXT NOT NULL,
model TEXT NOT NULL,
model_id TEXT NOT NULL,
tokens_in INTEGER NOT NULL,
tokens_out INTEGER NOT NULL,
cost_usd FLOAT NOT NULL,
trace_id TEXT,
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX ON token_usage (user_id, created_at DESC);
CREATE INDEX ON token_usage (user_id, surface, created_at DESC);
Usage Endpoint
GET /api/usage/me
→ {
"total_cost_usd": 0.42,
"by_surface": {
"classification": {"calls": 312, "cost_usd": 0.08},
"review": {"calls": 6, "cost_usd": 0.28},
"chat": {"calls": 14, "cost_usd": 0.06}
},
"this_month": {...},
"today": {...}
}
Adding a New Provider (Future)
- Add entry to
MODEL_REGISTRY with provider, model_id, and cost fields
- Add provider handler in
model_registry.py (AnthropicProvider, ModalProvider, etc.)
call_llm() dispatches to the right handler — callers don't change
Files to Build (Phase 1.2)
backend/services/llm_gateway.py — call_llm() entry point
backend/services/model_registry.py — Provider + model definitions
backend/utils/cost_tracker.py — Token → dollar computation
backend/utils/response_envelope.py — Standard API response wrapper
backend/middleware/tracing.py — Langfuse auto-tracing
backend/database/models/token_usage.py — Usage tracking model + migration
backend/api/routes/usage.py — GET /api/usage/me
tests/test_llm_gateway.py — Unit tests (write FIRST)
Acceptance Criteria (Phase 1.2)
Last Updated
2026-03-31 (Phase 0 scaffold)