| name | write-fixtures |
| description | Use when writing test fixtures for @copilotkit/aimock — mock LLM responses, tool call sequences, error injection, multi-turn agent loops, embeddings, structured output, sequential responses, or debugging fixture mismatches |
Writing aimock Test Fixtures
What aimock Is
aimock is a zero-dependency mock infrastructure for AI apps. Fixture-driven. Multi-provider (OpenAI, Anthropic, Gemini, Gemini Interactions, AWS Bedrock, Azure OpenAI, Vertex AI, Ollama, Cohere). Multimedia endpoints (image generation, text-to-speech, audio transcription, video generation). MCP, A2A, AG-UI, and vector DB mocking. Runs a real HTTP server on a real port — works across processes, unlike MSW-style interceptors. WebSocket support for OpenAI Responses/Realtime and Gemini Live APIs. Record-and-replay for all endpoints including multimedia. Chaos testing and Prometheus metrics.
Core Mental Model
- Fixtures = match criteria + response
- First-match-wins — order matters
- All providers share one fixture pool (provider adapters normalize to
ChatCompletionRequest)
- Fixtures are live — mutations after
start() take effect immediately
- Sequential responses are supported via
sequenceIndex (match count tracked per fixture)
Match Field Reference
| Field | Type | Matches Against |
|---|
userMessage | string | Substring of last role: "user" message text |
userMessage | RegExp | Pattern test on last role: "user" message text |
systemMessage | string | Substring of the concatenated text of every role: "system" message in the request. Use to gate a fixture on host-supplied context (persona, agent-context entries) so changes to that context cause the fixture to fall through instead of returning a stale baked response |
systemMessage | string[] | Array of substrings — ALL must be present in the joined system text (AND semantics). Use when the gate must combine multiple non-adjacent tokens whose serialisation order isn't stable |
systemMessage | RegExp | Pattern test on the concatenated system-message text |
inputText | string | Substring of embedding input text (concatenated if multiple inputs) |
inputText | RegExp | Pattern test on embedding input text |
toolName | string | Exact match on any tool in request's tools[] array (by function.name) |
toolCallId | string | Exact match on tool_call_id of last role: "tool" message |
model | string | Exact match on req.model |
model | RegExp | Pattern test on req.model |
responseFormat | string | Exact match on req.response_format.type ("json_object", "json_schema") |
sequenceIndex | number | Matches only when this fixture's match count equals the given index (0-based) |
turnIndex | number | Stateless conversation-depth matching. Counts role: "assistant" messages in the request; matches when that count equals the value. turnIndex: 0 = first turn (no prior assistant messages). Use instead of sequenceIndex for shared/deployed instances where stateful counters break under concurrency |
hasToolResult | boolean | Stateless tool-message presence matching. true matches when any role: "tool" message exists in the request; false matches when none exist. Provider-consistent across all aimock handlers (OpenAI, Claude, Gemini, Bedrock, Ollama, Cohere) |
endpoint | string | Restrict to endpoint type: "chat", "image", "speech", "transcription", "video", "embedding" |
predicate | (req: ChatCompletionRequest) => boolean | Custom function — full access to request |
AND logic: all specified fields must match. Empty match {} = catch-all.
Multi-part content (e.g., [{type: "text", text: "hello"}]) is automatically extracted — userMessage matching works regardless of content format.
When to Use Each Multi-turn Matching Approach
| Approach | Stateless? | Best For |
|---|
turnIndex | Yes | Shared/deployed instances; matches on conversation depth (count of assistant messages in request) |
hasToolResult | Yes | Simplest option for 2-step tool flows — boolean: are there tool results in the request? |
sequenceIndex | No | Single-client unit tests with repeated identical requests (server-side counter, breaks under concurrency) |
toolCallId | Yes | Matching specific tool result IDs in the conversation history |
Prefer stateless approaches (turnIndex, hasToolResult) for shared aimock instances (deployed via Docker, used by multiple test runners). Use sequenceIndex only in isolated single-client unit tests where the counter won't be corrupted by concurrent requests.
Multi-turn fixture examples
{"match": {"userMessage": "trip to mars", "turnIndex": 0}, "response": {"toolCalls": [{"id": "call_001", "name": "generate_steps", "arguments": "{}"}]}}
{"match": {"userMessage": "trip to mars", "turnIndex": 1}, "response": {"content": "Great choices! Proceeding."}}
{"match": {"userMessage": "trip to mars", "hasToolResult": false}, "response": {"toolCalls": [{"id": "call_001", "name": "generate_steps", "arguments": "{}"}]}}
{"match": {"userMessage": "trip to mars", "hasToolResult": true}, "response": {"content": "Great choices!"}}
Response Types
Text
{
content: "Hello!";
}
Tool Calls
{
toolCalls: [{ name: "get_weather", arguments: { city: "SF" } }];
}
{
toolCalls: [{ name: "get_weather", arguments: '{"city":"SF"}' }];
}
Both object and string forms are accepted for arguments. The fixture loader auto-stringifies objects via JSON.stringify(). Object form is preferred for readability.
Embedding
{
embedding: [0.1, 0.2, 0.3, -0.5, 0.8];
}
The embedding vector is returned for each input in the request. If no embedding fixture matches, deterministic embeddings are auto-generated from the input text hash — you only need fixtures when you want specific vectors.
Image
{
image: {
url: "https://example.com/generated.png"
}
}
{
images: [{ url: "https://example.com/1.png" }, { b64Json: "iVBOR..." }]
}
Use match: { endpoint: "image" } to prevent cross-matching with chat fixtures.
Speech (TTS)
{ audio: "base64-encoded-audio-data" }
{ audio: "base64-data", format: "opus" }
Transcription
{ transcription: { text: "Hello world" } }
{ transcription: { text: "Hello world", language: "en", duration: 2.5, words: [...], segments: [...] } }
Video
{ video: { id: "vid-1", status: "completed", url: "https://example.com/video.mp4" } }
Video uses async polling — POST /v1/videos creates, GET /v1/videos/{id} checks status.
Error
{ error: { message: "Rate limited", type: "rate_limit_error" }, status: 429 }
Chaos (Failure Injection)
The optional chaos field on a fixture enables probabilistic failure injection:
{
chaos?: {
dropRate?: number;
malformedRate?: number;
disconnectRate?: number;
}
}
Rates are evaluated per-request. When triggered, the chaos failure replaces the normal response.
Common Patterns
Basic text fixture
mock.onMessage("hello", { content: "Hi there!" });
Tool call → tool result → final response (3-step agent loop)
The most common pattern. Fixture 1 triggers the tool call, fixture 2 handles the tool result.
mock.onMessage("weather", {
toolCalls: [{ name: "get_weather", arguments: { city: "SF" } }],
});
mock.addFixture({
match: { predicate: (req) => req.messages.at(-1)?.role === "tool" },
response: { content: "It's 72°F in San Francisco." },
});
Why predicate, not userMessage? After a tool call, the client replays the same conversation with the tool result appended. The user message hasn't changed — userMessage: "weather" would match the SAME fixture again, creating an infinite loop.
Embedding fixture
mock.onEmbedding("search query", {
embedding: [0.1, 0.2, 0.3, 0.4, 0.5],
});
mock.onEmbedding(/product.*description/, {
embedding: [0.9, -0.1, 0.5, 0.3, 0.2],
});
Structured output / JSON mode
mock.onJsonOutput("extract entities", {
entities: [
{ name: "Acme Corp", type: "company" },
{ name: "Jane Doe", type: "person" },
],
});
mock.addFixture({
match: { userMessage: "extract entities", responseFormat: "json_object" },
response: { content: '{"entities":[...]}' },
});
Sequential responses (same match, different responses)
mock.on(
{ userMessage: "status", sequenceIndex: 0 },
{ toolCalls: [{ name: "check_status", arguments: {} }] },
);
mock.on({ userMessage: "status", sequenceIndex: 1 }, { content: "All systems operational." });
Match counts are tracked per fixture group and reset with reset() or resetMatchCounts().
Streaming physics (realistic timing)
mock.onMessage(
"tell me a story",
{ content: "Once upon a time..." },
{
streamingProfile: {
ttft: 200,
tps: 30,
jitter: 0.1,
},
},
);
Predicate-based routing (same user message, different context)
Common in supervisor/orchestrator patterns where the system prompt changes:
mock.addFixture({
match: {
predicate: (req) => {
const sys = req.messages.find((m) => m.role === "system")?.content ?? "";
return typeof sys === "string" && sys.includes("Flights found: false");
},
},
response: { toolCalls: [{ name: "search_flights", arguments: {} }] },
});
Catch-all (always add one)
Prevents unmatched requests from returning 404 and crashing the test:
mock.addFixture({
match: { predicate: () => true },
response: { content: "I understand. How can I help?" },
});
Tool result catch-all with prependFixture
Must go at the front so it matches before substring-based fixtures:
mock.prependFixture({
match: { predicate: (req) => req.messages.at(-1)?.role === "tool" },
response: { content: "Done!" },
});
Stream interruption simulation (v1.3.0+)
mock.onMessage(
"long response",
{ content: "This will be cut short..." },
{
truncateAfterChunks: 3,
disconnectAfterMs: 500,
},
);
Chaos testing (probabilistic failures)
mock.addFixture({
match: { userMessage: "flaky" },
response: { content: "Sometimes works!" },
chaos: { dropRate: 0.3 },
});
30% of requests matching this fixture will get a 500 error instead of the response. Can also use malformedRate (garbled JSON) or disconnectRate (connection dropped mid-stream).
Server-level chaos applies to ALL requests:
mock.setChaos({ dropRate: 0.1 });
mock.clearChaos();
Error injection (one-shot)
mock.nextRequestError(429, { message: "Rate limited", type: "rate_limit_error" });
JSON fixture files
{
"fixtures": [
{
"match": { "userMessage": "hello" },
"response": { "content": "Hi!" }
},
{
"match": { "userMessage": "weather" },
"response": {
"toolCalls": [
{
"name": "get_weather",
"arguments": { "city": "SF", "units": "fahrenheit" }
}
]
}
},
{
"match": { "inputText": "search query" },
"response": { "embedding": [0.1, 0.2, 0.3] }
},
{
"match": { "userMessage": "status", "sequenceIndex": 0 },
"response": { "content": "First response" }
}
]
}
JSON auto-stringify: In JSON fixture files, arguments and content can be objects — the loader auto-stringifies them with JSON.stringify(). The escaped-string form ("{\"city\":\"SF\"}") still works but objects are preferred for readability.
JSON files cannot use RegExp or predicate — those are code-only features. streamingProfile is supported in JSON fixture files.
Load with mock.loadFixtureFile("./fixtures/greetings.json") or mock.loadFixtureDir("./fixtures/").
API Endpoints
All providers share the same fixture pool — write fixtures once, they work for any endpoint.
| Endpoint | Provider | Protocol |
|---|
POST /v1/chat/completions | OpenAI | HTTP |
POST /v1/responses | OpenAI | HTTP + WS |
POST /v1/messages | Anthropic | HTTP |
POST /v1/embeddings | OpenAI | HTTP |
POST /v1beta/models/{model}:{method} | Google Gemini | HTTP |
POST /model/{modelId}/invoke | AWS Bedrock | HTTP |
POST /openai/deployments/{id}/chat/completions | Azure OpenAI | HTTP |
POST /openai/deployments/{id}/embeddings | Azure OpenAI | HTTP |
GET /health | — | HTTP |
GET /ready | — | HTTP |
POST /model/{modelId}/invoke-with-response-stream | AWS Bedrock | HTTP |
POST /model/{modelId}/converse | AWS Bedrock | HTTP |
POST /model/{modelId}/converse-stream | AWS Bedrock | HTTP |
POST /v1/projects/{p}/locations/{l}/publishers/google/models/{m}:generateContent | Vertex AI | HTTP |
POST /v1/projects/{p}/locations/{l}/publishers/google/models/{m}:streamGenerateContent | Vertex AI | HTTP |
POST /api/chat | Ollama | HTTP |
POST /api/generate | Ollama | HTTP |
GET /api/tags | Ollama | HTTP |
POST /v2/chat | Cohere | HTTP |
GET /metrics | — | HTTP |
GET /v1/models | OpenAI-compat | HTTP |
WS /v1/responses | OpenAI | WebSocket |
WS /v1/realtime | OpenAI | WebSocket |
WS /ws/google.ai...BidiGenerateContent | Gemini Live | WebSocket |
POST /v1/images/generations | OpenAI | HTTP |
POST /v1beta/models/{model}:predict | Gemini Imagen | HTTP |
POST /v1/audio/speech | OpenAI | HTTP |
POST /v1/audio/transcriptions | OpenAI | HTTP |
POST /v1/videos | OpenAI | HTTP |
GET /v1/videos/{id} | OpenAI | HTTP |
Response Template Overrides
Fixture responses can include optional override fields to control auto-generated envelope values. These are merged into the provider-specific response format (OpenAI, Claude, Gemini, Responses API).
| Field | Type | Default | Description |
|---|
id | string | auto-generated | Override response ID (e.g., chatcmpl-custom) |
created | number | Date.now()/1000 | Override Unix timestamp |
model | string | echoes request | Override model name in response |
usage | object | zeroed | Override token counts: { prompt_tokens, completion_tokens, total_tokens }. OpenAI Chat includes usage in response body; Responses API uses response.usage. When omitted, auto-computed from content length |
finishReason | string | "stop" / "tool_calls" | Override finish reason. Mappings: stop -> end_turn (Claude), STOP (Gemini); tool_calls -> tool_use (Claude), FUNCTION_CALL (Gemini); length -> max_tokens (Claude), MAX_TOKENS (Gemini); content_filter -> SAFETY (Gemini), failed (Responses API) |
role | string | "assistant" | Override message role |
systemFingerprint | string | (omitted) | Add system_fingerprint to response |
Example
mock.onMessage("hello", {
content: "Hi!",
model: "gpt-4-turbo-2024-04-09",
usage: { prompt_tokens: 10, completion_tokens: 5, total_tokens: 15 },
systemFingerprint: "fp_abc123",
});
In JSON fixtures
{
"match": { "userMessage": "hello" },
"response": {
"content": "Hi!",
"model": "gpt-4-turbo-2024-04-09",
"usage": { "prompt_tokens": 10, "completion_tokens": 5, "total_tokens": 15 },
"systemFingerprint": "fp_abc123"
}
}
These fields map correctly across all provider formats — for example, finishReason: "stop" becomes finish_reason: "stop" in OpenAI, stop_reason: "end_turn" in Claude, and finishReason: "STOP" in Gemini.
Provider Support Matrix
| Feature | OpenAI Chat | OpenAI Responses | Claude | Gemini | Gemini Int. | Bedrock | Azure | Ollama | Cohere |
|---|
| Text | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| Tool Calls | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| Content + Tool Calls | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| Streaming | SSE | SSE | SSE | SSE | SSE | Binary | SSE | NDJSON | SSE |
| Reasoning | Yes | Yes | Yes | Yes | -- | Yes | Yes | -- | -- |
| Web Searches | -- | Yes | -- | -- | -- | -- | -- | -- | -- |
| Response Overrides | Yes | Yes | Yes | Yes | Yes | -- | Yes | -- | -- |
Critical Gotchas
-
Order matters — first match wins. Specific fixtures before general ones. Use prependFixture() to force priority.
-
arguments accepts both objects and strings — "arguments": {"key":"value"} (preferred, auto-stringified) or "arguments": "{\"key\":\"value\"}" (legacy). The same applies to content fields that contain JSON. The fixture loader detects typeof === "object" and calls JSON.stringify() automatically.
-
Latency is per-chunk, not total — latency: 100 means 100ms between each SSE chunk, not 100ms total response time. Similarly, truncateAfterChunks and disconnectAfterMs are for simulating stream interruptions (added in v1.3.0).
-
streamingProfile takes precedence over latency — when both are set on a fixture, streamingProfile controls timing. Use one or the other.
-
Tool result messages don't change the user message — after a tool call, the client sends the same conversation + tool result. Matching on userMessage will hit the SAME fixture again → infinite loop. Always use predicate checking role === "tool" for tool results.
-
clearFixtures() preserves the array reference — uses .length = 0, not reassignment. The running server reads the same array object.
-
Journal records everything — including 404 "no match" responses. Use mock.getLastRequest() to debug mismatches.
-
All providers share fixtures — a fixture matching "hello" works whether the request comes via /v1/chat/completions (OpenAI), /v1/messages (Anthropic), Gemini, Bedrock, or Azure endpoints.
-
WebSocket uses the same fixture pool — no special setup needed for WebSocket-based APIs (OpenAI Responses WS, Realtime, Gemini Live).
-
Embeddings auto-generate if no fixture matches — deterministic vectors are generated from the input text hash. You don't need a catch-all for embedding requests.
-
Sequential response counts are tracked per fixture — counts reset with reset() or resetMatchCounts(). The count increments after each match of that fixture group (all fixtures sharing the same non-sequenceIndex match fields).
-
Bedrock uses Anthropic Messages format internally — the adapter normalizes Bedrock requests to ChatCompletionRequest, so the same fixtures work. Bedrock supports both non-streaming (/invoke, /converse) and streaming (/invoke-with-response-stream, /converse-stream) endpoints.
-
Azure OpenAI routes through the same handlers — /openai/deployments/{id}/chat/completions maps to the completions handler, /openai/deployments/{id}/embeddings maps to the embeddings handler. Fixtures work unchanged.
-
Ollama defaults to streaming — opposite of OpenAI. Set stream: false explicitly in the request for non-streaming responses.
-
Ollama tool call arguments is an object, not a JSON string — unlike OpenAI where arguments is a JSON string, Ollama sends and expects a plain object.
-
Bedrock streaming uses binary Event Stream format — not SSE. The invoke-with-response-stream and converse-stream endpoints use AWS Event Stream binary encoding.
-
Vertex AI routes to the same handler as consumer Gemini — the same fixtures work for both Vertex AI (/v1/projects/.../models/{m}:generateContent) and consumer Gemini (/v1beta/models/{model}:generateContent).
-
Cohere requires model field — returns 400 if model is missing from the request body.
Mount & Composition
mount() API
Mount additional mock services onto a running LLMock server. All services share one port, one health endpoint, and one request journal.
const llm = new LLMock({ port: 5555 });
llm.mount("/mcp", mcpMock);
llm.mount("/a2a", a2aMock);
llm.mount("/vector", vectorMock);
await llm.start();
Any object implementing the Mountable interface (a handleRequest method that returns boolean) can be mounted. Path prefixes are stripped before the service sees the request — /mcp/tools/list arrives as /tools/list.
createMockSuite()
Unified lifecycle for LLMock + mounted services:
import { createMockSuite } from "@copilotkit/aimock";
const suite = createMockSuite({
port: 0,
fixtures: "./fixtures",
services: { "/mcp": mcpMock, "/a2a": a2aMock },
});
await suite.start();
afterEach(() => suite.reset());
afterAll(() => suite.stop());
aimock CLI config file
The aimock CLI reads a JSON config and serves all services on one port:
aimock --config aimock.json --port 4010
Config format:
{
"llm": {
"fixtures": "./fixtures",
"latency": 0,
"metrics": true
},
"services": {
"/mcp": { "type": "mcp", "tools": "./mcp-tools.json" },
"/a2a": { "type": "a2a", "agents": "./a2a-agents.json" }
}
}
VectorMock
Mock vector database server for testing RAG pipelines. Supports Pinecone, Qdrant, and ChromaDB API formats.
import { VectorMock } from "@copilotkit/aimock";
const vector = new VectorMock();
vector.addCollection("docs", { dimension: 1536 });
vector.onQuery("docs", [
{ id: "doc-1", score: 0.95, metadata: { title: "Getting Started" } },
{ id: "doc-2", score: 0.87, metadata: { title: "API Reference" } },
]);
vector.upsert("docs", [
{ id: "v1", values: [0.1, 0.2, ...], metadata: { title: "Intro" } },
]);
vector.onQuery("docs", (query) => {
return [{ id: "result", score: 1.0, metadata: { topK: query.topK } }];
});
const url = await vector.start();
VectorMock endpoints
| Provider | Endpoints |
|---|
| Pinecone | POST /query, POST /vectors/upsert, POST /vectors/delete, GET /describe-index-stats |
| Qdrant | POST /collections/{name}/points/search, PUT /collections/{name}/points, POST /collections/{name}/points/delete |
| ChromaDB | POST /api/v1/collections/{id}/query, POST /api/v1/collections/{id}/add, GET /api/v1/collections, DELETE /api/v1/collections/{id} |
Service Mocks (Search / Rerank / Moderation)
Built-in mocks for common AI-adjacent services. Registered on the LLMock instance directly — no separate server needed.
Search (Tavily-compatible)
mock.onSearch("weather", [
{ title: "Weather Report", url: "https://example.com", content: "Sunny today" },
]);
mock.onSearch(/stock\s+price/i, [
{ title: "ACME Stock", url: "https://example.com", content: "$42", score: 0.95 },
]);
Rerank (Cohere-compatible)
mock.onRerank("machine learning", [
{ index: 0, relevance_score: 0.99 },
{ index: 2, relevance_score: 0.85 },
]);
Moderation (OpenAI-compatible)
mock.onModerate("violent", {
flagged: true,
categories: { violence: true, hate: false },
category_scores: { violence: 0.95, hate: 0.01 },
});
mock.onModerate(/.*/, { flagged: false, categories: {} });
Pattern matching
All three services use the same matching logic:
- String patterns — case-insensitive substring match
- RegExp patterns — full regex test
- First match wins — register specific patterns before catch-alls
Debugging Fixture Mismatches
When a fixture doesn't match:
- Inspect what the server received:
mock.getLastRequest() → check body.messages array
- Check fixture order:
mock.getFixtures() returns fixtures in registration order
- For
userMessage: match is against the LAST role: "user" message only, substring match (not exact)
- Check the journal:
mock.getRequests() shows all requests including which fixture matched (or null for 404)
E2E Test Setup Pattern
import { LLMock } from "@copilotkit/aimock";
const mock = new LLMock({ port: 0 });
mock.loadFixtureDir("./fixtures");
await mock.start();
process.env.OPENAI_BASE_URL = `${mock.url}/v1`;
afterEach(() => mock.reset());
afterAll(async () => await mock.stop());
Static factory shorthand
const mock = await LLMock.create({ port: 0 });
API Quick Reference
| Method | Purpose |
|---|
addFixture(f) | Append fixture (last priority) |
addFixtures(f[]) | Append multiple |
prependFixture(f) | Insert at front (highest priority) |
clearFixtures() | Remove all fixtures |
getFixtures() | Read current fixture list |
on(match, response, opts?) | Shorthand for addFixture |
onMessage(pattern, response, opts?) | Match by user message |
onEmbedding(pattern, response, opts?) | Match by embedding input text |
onJsonOutput(pattern, json, opts?) | Match by user message with responseFormat |
onToolCall(name, response, opts?) | Match by tool name in tools[] |
onToolResult(id, response, opts?) | Match by tool_call_id |
onTurn(turn, pattern, response, opts?) | Match by turn index + user message |
nextRequestError(status, body?) | One-shot error, auto-removes |
loadFixtureFile(path) | Load JSON fixture file |
loadFixtureDir(path) | Load all JSON files in directory |
start() | Start server, returns URL |
stop() | Stop server |
reset() | Clear fixtures + journal + match counts |
resetMatchCounts() | Clear sequence match counts only |
getRequests() | All journal entries |
getLastRequest() | Most recent journal entry |
clearRequests() | Clear journal only |
setChaos(opts) | Set server-level chaos rates |
clearChaos() | Remove server-level chaos |
onSearch(pattern, results) | Match search requests by query |
onRerank(pattern, results) | Match rerank requests by query |
onModerate(pattern, result) | Match moderation requests by input |
onImage(pattern, response) | Match image generation by prompt |
onSpeech(pattern, response) | Match TTS by input text |
onTranscription(response) | Match audio transcription |
onVideo(pattern, response) | Match video generation by prompt |
mount(path, handler) | Mount a Mountable (VectorMock, etc.) |
url / baseUrl | Server URL (throws if not started) |
port | Server port number |
Sequential responses use on() with sequenceIndex in the match — there is no dedicated convenience method.
Record-and-Replay (VCR Mode)
aimock supports a VCR-style record-and-replay workflow for ALL endpoints including multimedia (image, TTS, transcription, video): unmatched requests are proxied to real provider APIs, and the responses are saved as standard aimock fixture files for deterministic replay. Binary TTS responses are base64-encoded with format derived from Content-Type. Multimedia fixtures automatically include endpoint in their match criteria for correct routing on replay.
CLI usage
aimock --record \
--provider-openai https://api.openai.com \
--provider-anthropic https://api.anthropic.com \
-f ./fixtures
aimock --strict -f ./fixtures
--record enables proxy-on-miss. Requires at least one --provider-* flag.
--strict returns a 503 error when no fixture matches AND no proxy is configured (or the proxy attempt fails), instead of silently returning a 404. The proxy is still tried first when --record is set. Use this in CI to prevent unmatched requests from slipping through as silent 404s.
- Provider flags:
--provider-openai, --provider-anthropic, --provider-gemini, --provider-vertexai, --provider-bedrock, --provider-azure, --provider-ollama, --provider-cohere.
How it works
- Existing fixtures are served first — the router checks all loaded fixtures before considering the proxy.
- Misses are proxied — if no fixture matches and recording is enabled, the request is forwarded to the real provider API. Upstream URL path prefixes are preserved (e.g.,
https://gateway.company.com/llm/v1 correctly proxies to /llm/v1/chat/completions).
- All request headers are forwarded (auth headers NOT saved) — all client request headers are passed through to the upstream provider, except hop-by-hop headers and
host/content-length/cookie/accept-encoding. Auth headers (Authorization, x-api-key, api-key) are forwarded but stripped from the recorded fixture.
- Responses are saved as standard fixtures — recorded files land in
{fixturePath}/recorded/ and use the same JSON format as hand-written fixtures. Nothing special about them.
- Streaming responses are collapsed — SSE streams are collapsed into a single text or tool-call response for the fixture. The original streaming format is preserved in the live proxy response.
- Base64 embedding decoding — when the upstream returns base64-encoded embeddings (the default
encoding_format in Python's openai SDK), the recorder decodes them into float arrays so fixtures contain readable numeric data instead of opaque base64 strings.
- Loud logging — every proxy hit logs at
warn level so you can see exactly which requests are being forwarded.
Programmatic API
const mock = new LLMock({ port: 0 });
await mock.start();
mock.enableRecording({
providers: {
openai: "https://api.openai.com",
anthropic: "https://api.anthropic.com",
},
fixturePath: "./fixtures/recorded",
});
mock.disableRecording();
Workflow
- Bootstrap: Run your test suite with
--record and provider URLs. All requests that don't match existing fixtures are proxied and recorded.
- Review: Check the recorded fixtures in
{fixturePath}/recorded/. Edit or reorganize as needed.
- Lock down: Run your test suite with
--strict to ensure every request hits a fixture. No network calls escape.
- Maintain: When APIs change, delete stale fixtures and re-record.