| name | context-chef-middleware |
| description | Helps developers integrate @context-chef/ai-sdk-middleware into their Vercel AI SDK (v6+) projects. Use this skill when the user wants to add transparent context management to an AI SDK app, wrap a model with automatic history compression, truncate large tool results, manage token budgets, or inject dynamic state into AI SDK prompts. Also trigger when the user mentions 'context-chef middleware', 'AI SDK middleware', 'ai-sdk context', or asks about compressing history / truncating tool results / managing tokens in a Vercel AI SDK project. |
| argument-hint | [feature-focus] |
| allowed-tools | Read, Grep, Glob, Bash, Write, Edit |
Integrate @context-chef/ai-sdk-middleware
Help the developer add @context-chef/ai-sdk-middleware — a drop-in AI SDK middleware for transparent history compression, tool result truncation, and token budget management — into their existing Vercel AI SDK project.
The key selling point: zero code changes to existing generateText / streamText calls. Just wrap the model once.
Step 1: Analyze the developer's project
Before asking questions, silently inspect the project:
1. package.json → confirm they use `ai` (v6+) and an AI SDK provider (@ai-sdk/openai, @ai-sdk/anthropic, @ai-sdk/google, etc.)
2. Lock file → detect package manager (pnpm-lock.yaml / yarn.lock / package-lock.json / bun.lockb)
3. tsconfig.json → TypeScript or JavaScript?
4. Existing AI SDK usage → look for patterns like:
- generateText({ model, messages, ... })
- streamText({ model, messages, ... })
- openai('gpt-4o'), anthropic('claude-sonnet-4-20250514'), google('gemini-2.0-flash')
- wrapLanguageModel / middleware patterns
Use Glob + Grep to find these. This context shapes everything you generate.
Step 2: Ask about their needs
Based on what you found, present a brief summary of their setup and ask which features they need:
| Pain point | Middleware feature | Option |
|---|
| Conversations get too long, context window fills up | History compression via cheap model | compress |
| Tool outputs (terminal, API responses) are huge | Auto-truncation with head/tail preservation | truncate |
| Want cheaper compression before LLM summarization | Mechanical compaction (strip tool results/thinking) | compact |
| Need to inject task state for LLM attention | Dynamic state as XML in prompt | dynamicState |
| Want custom prompt manipulation (RAG, metadata) | Post-compression transform hook | transformContext |
| Want to know when compression happens | Compression callback | onCompress |
| Need control over what happens at budget limit | Budget exceeded hook | onBudgetExceeded |
If the developer is unsure, recommend starting with: compression + truncation — these solve the most common problems with minimal setup.
Step 3: Install
Generate the install command using their detected package manager:
npm install @context-chef/ai-sdk-middleware
pnpm add @context-chef/ai-sdk-middleware
yarn add @context-chef/ai-sdk-middleware
bun add @context-chef/ai-sdk-middleware
The middleware depends on ai (v6+) and @ai-sdk/provider (v3+) as peer dependencies — the developer should already have these.
Step 4: Generate integration code
The core pattern is always:
import { withContextChef } from '@context-chef/ai-sdk-middleware';
import { openai } from '@ai-sdk/openai';
import { generateText } from 'ai';
const model = withContextChef(openai('gpt-4o'), {
contextWindow: 128_000,
compress: { model: openai('gpt-4o-mini') },
truncate: { threshold: 5000, headChars: 500, tailChars: 1000 },
});
const result = await generateText({ model, messages, tools });
Configuration generation rules
Build the options object based on which features the developer selected:
History compression (compress):
Without a compression model — old messages are simply discarded:
const model = withContextChef(openai('gpt-4o'), {
contextWindow: 128_000,
});
With a compression model — old messages are summarized by a cheap model:
const model = withContextChef(openai('gpt-4o'), {
contextWindow: 128_000,
compress: {
model: openai('gpt-4o-mini'),
preserveRatio: 0.8,
},
});
Tool result truncation (truncate):
truncate: {
threshold: 5000,
headChars: 500,
tailChars: 1000,
}
Optionally persist original content via a storage adapter (so the LLM can retrieve it later via context://vfs/ URI):
import { FileSystemAdapter } from '@context-chef/core';
truncate: {
threshold: 5000,
headChars: 500,
tailChars: 1000,
storage: new FileSystemAdapter('.context_vfs'),
}
Mechanical compaction (compact):
Zero LLM cost — mechanically strips content before compression:
compact: {
clear: ['tool-result'],
}
Dynamic state injection (dynamicState):
dynamicState: {
getState: () => ({
currentStep: agent.step,
availableTools: agent.tools.map(t => t.name),
progress: `${completed}/${total}`,
}),
placement: 'last_user',
}
Transform hook (transformContext):
transformContext: (prompt) => {
return [{ role: 'system', content: ragContext }, ...prompt];
}
Budget exceeded hook (onBudgetExceeded):
onBudgetExceeded: (history, { currentTokens, limit }) => {
console.log(`Budget exceeded: ${currentTokens}/${limit} tokens`);
return null;
}
Compression callback (onCompress):
onCompress: (summary, truncatedCount) => {
console.log(`Compressed ${truncatedCount} messages into summary`);
}
Custom tokenizer (tokenizer):
tokenizer: (messages) => {
return messages.reduce((sum, m) => sum + encode(m.content).length, 0);
}
Integration patterns
Find the developer's existing AI SDK code and show exactly where to add the wrapper:
Pattern A: Simple generateText / streamText
const result = await generateText({
model: openai('gpt-4o'),
messages,
});
const model = withContextChef(openai('gpt-4o'), { contextWindow: 128_000, ... });
const result = await generateText({ model, messages });
Pattern B: Agent loop with tools
const model = withContextChef(openai('gpt-4o'), {
contextWindow: 128_000,
compress: { model: openai('gpt-4o-mini') },
truncate: { threshold: 5000 },
});
while (true) {
const result = await generateText({ model, messages, tools });
}
Pattern C: Using createMiddleware directly
For developers who want more control or already use wrapLanguageModel:
import { createMiddleware } from '@context-chef/ai-sdk-middleware';
import { wrapLanguageModel } from 'ai';
const middleware = createMiddleware({ contextWindow: 128_000, ... });
const model = wrapLanguageModel({ model: openai('gpt-4o'), middleware });
Pattern D: Using format converters directly
For developers who want to use @context-chef/core modules with AI SDK message formats:
import { fromAISDK, toAISDK } from '@context-chef/ai-sdk-middleware';
const irMessages = fromAISDK(aiSdkPrompt);
const aiSdkPrompt = toAISDK(irMessages);
Step 5: Verify and explain
After generating the code:
- Verify
ai (v6+) is in their dependencies — the middleware requires AI SDK v6
- Verify they have at least one
@ai-sdk/* provider installed
- Explain the processing pipeline briefly:
- Truncate large tool results (if configured)
- Mechanical compaction (if configured)
- History compression (if over token budget)
- Dynamic state injection (if configured)
- Custom transform (if configured)
- Emphasize: the wrapped model is stateful — it tracks token usage across calls to know when compression is needed. Create one wrapped model per conversation/session, not per call.
Common mistakes to prevent
- Don't create a new wrapped model per LLM call — reuse it across the conversation (it tracks token usage)
- Don't manually call
reportTokenUsage() — the middleware extracts it automatically from generateText / streamText responses
- The
compress.model should be a cheap, fast model (e.g. gpt-4o-mini, claude-haiku) — it's used for summarization, not the main task
withContextChef() returns a standard LanguageModelV3 — it works anywhere the original model works
- If using
compact with truncate together, truncate runs first (on the raw AI SDK prompt), then compact runs on the IR (after conversion)
When to recommend @context-chef/core instead
The middleware covers the most common use case: transparent compression and truncation. Recommend @context-chef/core directly when the developer needs:
- Tool pruning / namespace architecture (20+ tools)
- Persistent KV memory (cross-session)
- Snapshot & restore
- Multi-provider compilation (OpenAI / Anthropic / Gemini format switching)
- Manual control over the compilation pipeline
For these advanced use cases, refer the developer to the context-chef-core skill.
Reference files