| name | tool-chain-test |
| description | Skill for writing and running integration tests that chain multiple MCP tools together, simulating real user workflows from the PRD. Use when implementing or testing any of the 10 use cases (UC-1 through UC-10), verifying that MCP servers compose correctly, or building end-to-end test scenarios. MANDATORY TRIGGERS: "integration test", "use case test", "UC-1" through "UC-10", "tool chain", "end-to-end test", "workflow test", "compose tools", or any mention of testing a specific use case by name (receipt reconciliation, contract copilot, download triage, meeting pipeline, etc.).
|
Tool Chain Integration Testing Skill
What These Tests Do
Integration tests simulate the model's tool-calling behavior by directly invoking
MCP server tools in the sequence defined by each use case in the PRD. They verify
that the tool chain produces correct results end-to-end ā but they do NOT test the
model's ability to select the right tools (that's the model-behavior test suite).
Think of it this way:
- Unit tests verify each tool works alone
- Integration tests (this skill) verify tools compose correctly
- Model behavior tests verify the LLM picks the right tools
Before You Start
Read these to understand the test you're writing:
docs/PRD.md Section 6 ā the specific use case flow (UC-1 through UC-10)
docs/mcp-tool-registry.yaml ā tool signatures for the servers involved
.claude/commands/test-usecase.md ā the slash command reference for which servers each UC needs
Test Structure
Every UC integration test follows this structure:
import { describe, it, expect, beforeAll, afterAll } from 'vitest';
import { TestHarness } from '../helpers/test-harness';
describe('UC-<N>: <Use Case Name>', () => {
let harness: TestHarness;
beforeAll(async () => {
harness = await TestHarness.create({
servers: ['filesystem', 'ocr', 'data'],
fixtures: 'uc<N>',
});
});
afterAll(async () => {
await harness.teardown();
});
it('should complete the full workflow', async () => {
});
it('should handle step 3 when OCR confidence is low', async () => {
});
});
Test Harness
The test harness (tests/helpers/test-harness.ts) manages:
- Starting/stopping MCP servers as child processes
- Loading test fixtures from
tests/fixtures/
- Providing typed wrappers around tool calls
- Cleaning up temp files after tests
export class TestHarness {
private servers: Map<string, MCPServerProcess>;
private fixtureDir: string;
private tempDir: string;
static async create(config: { servers: string[], fixtures: string }): Promise<TestHarness>;
async callTool<T>(serverName: string, toolName: string, params: object): Promise<T>;
fixturePath(filename: string): string;
tempPath(filename: string): string;
async teardown(): Promise<void>;
}
Fixture Organization
tests/fixtures/
āāā uc1/ # Receipt Reconciliation
ā āāā receipts/
ā ā āāā receipt_coffee.jpg # Sample receipt image
ā ā āāā receipt_office.pdf # Sample receipt PDF
ā ā āāā invoice_vendor.pdf # Sample invoice
ā āāā expected/
ā āāā receipts.csv # Expected output CSV
ā
āāā uc2/ # Contract Copilot
ā āāā original_nda.pdf
ā āāā revised_nda.docx
ā āāā expected/
ā āāā diff_report.json
ā
āāā uc3/ # Security Steward
ā āāā sample_files/
ā ā āāā has_ssn.txt # Contains fake SSN for testing
ā ā āāā has_api_key.env # Contains fake API key
ā ā āāā clean_file.txt # No sensitive data
ā āāā expected/
ā āāā findings.json
ā
āāā uc4/ # Download Triage
ā āāā downloads/
ā ā āāā quarterly_report.pdf
ā ā āāā photo_vacation.jpg
ā ā āāā node-v20.pkg
ā ā āāā receipt_amazon.pdf
ā āāā expected/
ā āāā classification.json
ā
āāā uc6/ # Meeting Pipeline
ā āāā meeting_audio.wav # Short sample audio (30 sec)
ā āāā expected/
ā āāā transcript.json
ā āāā action_items.json
ā
āāā shared/ # Fixtures used across multiple UCs
āāā sample.pdf
āāā sample.docx
Example: UC-1 Receipt Reconciliation
Following the PRD Section 6, UC-1 flow step by step:
describe('UC-1: Receipt ā Reconciliation', () => {
let harness: TestHarness;
beforeAll(async () => {
harness = await TestHarness.create({
servers: ['filesystem', 'ocr', 'data', 'document'],
fixtures: 'uc1',
});
});
it('should process a receipt folder into structured CSV', async () => {
const receiptDir = harness.fixturePath('receipts');
const files = await harness.callTool<FileInfo[]>(
'filesystem', 'list_dir', { path: receiptDir }
);
expect(files.length).toBe(3);
const records = [];
for (const file of files) {
let ocrResult;
if (file.name.endsWith('.jpg')) {
ocrResult = await harness.callTool(
'ocr', 'extract_text_from_image', { path: file.path }
);
} else {
ocrResult = await harness.callTool(
'ocr', 'extract_text_from_pdf', { path: file.path }
);
}
expect(ocrResult.text.length).toBeGreaterThan(0);
const structured = await harness.callTool(
'ocr', 'extract_structured_data', {
text: ocrResult.text,
schema: {
type: 'object',
properties: {
vendor: { type: 'string' },
date: { type: 'string' },
amount: { type: 'number' },
category: { type: 'string' },
},
required: ['vendor', 'amount'],
},
}
);
expect(structured.data).toHaveProperty('vendor');
expect(structured.data).toHaveProperty('amount');
records.push(structured.data);
}
const deduped = await harness.callTool(
'data', 'deduplicate_records', {
data: records,
match_fields: ['vendor', 'amount', 'date'],
threshold: 0.85,
}
);
expect(deduped.unique.length).toBeGreaterThanOrEqual(1);
const csvPath = harness.tempPath('receipts_output.csv');
const csv = await harness.callTool(
'data', 'write_csv', {
data: deduped.unique,
output_path: csvPath,
}
);
expect(csv.rows).toBe(deduped.unique.length);
const outputContent = await harness.callTool(
'filesystem', 'read_file', { path: csvPath }
);
expect(outputContent.content).toContain('vendor');
expect(outputContent.content).toContain('amount');
});
});
Example: UC-3 Security Steward
describe('UC-3: Security & Privacy Steward', () => {
it('should detect PII and secrets in sample files', async () => {
const scanDir = harness.fixturePath('sample_files');
const files = await harness.callTool(
'filesystem', 'search_files', { path: scanDir, pattern: '*' }
);
const allFindings = [];
for (const file of files) {
const pii = await harness.callTool(
'security', 'scan_for_pii', { path: file.path }
);
const secrets = await harness.callTool(
'security', 'scan_for_secrets', { path: file.path }
);
allFindings.push(...pii.findings, ...secrets.findings);
}
expect(allFindings.some(f => f.type === 'ssn')).toBe(true);
expect(allFindings.some(f => f.type === 'api_key')).toBe(true);
const proposals = await harness.callTool(
'security', 'propose_cleanup', { findings: allFindings }
);
expect(proposals.actions.length).toBeGreaterThan(0);
});
});
Running Tests
/test-usecase UC-1
npx vitest run tests/integration/
npx vitest run tests/integration/ --reporter=verbose
Creating New UC Tests
When a new use case is ready for integration testing:
- Read the UC flow in
docs/PRD.md Section 6
- Identify all MCP servers involved (see
/test-usecase command for the mapping)
- Create test fixtures in
tests/fixtures/uc<N>/
- Write the test following the step-by-step flow from the PRD
- Verify each intermediate result, not just the final output
- Include edge cases (e.g., what happens if OCR confidence is low?)