with one click
integration-test
// [Testing] Use when you need to generate or review integration tests.
// [Testing] Use when you need to generate or review integration tests.
[HINT] Download the complete skill directory including SKILL.md and all related files
| name | integration-test |
| version | 2.0.0 |
| description | [Testing] Use when you need to generate or review integration tests. |
| execution-mode | subagent |
| context-budget | high |
[BLOCKING] Execute skill steps in declared order. NEVER skip, reorder, or merge steps without explicit user approval. [BLOCKING] Before each step or sub-skill call, update task tracking: set
in_progresswhen step starts, setcompletedwhen step ends. [BLOCKING] Every completed/skipped step MUST include brief evidence or explicit skip reason. [BLOCKING] If Task tools are unavailable, create and maintain an equivalent step-by-step plan tracker with the same status transitions.
Goal: Generate/review integration test files using real DI (no mocks). 5 modes: (1) from-changes ยท (2) from-prompt ยท (3) review ยท (4) diagnose ยท (5) verify-traceability.
Workflow: Detect mode โ Find targets โ Gather context โ Execute โ Report
Key Rules:
references/integration-test-patterns.md before writingQueries/ or Commands/ folders/integration-test-verify runs without DB resetPrerequisites โ MUST ATTENTION READ before executing:
references/integration-test-patterns.mdโ canonical test templates: collection attributes, base class usage, TC annotation format, async polling helpers, unique name generators, DB assertion patterns. Read before writing ANY test.
docs/specs/โ existing TCs by module: read to verify test-to-spec traceability and get TC IDs before generating. (read directly when relevant; do not rely on hook-injected conversation text)
references/integration-test-patterns.md โ canonical test templates (MUST READ before writing any test)docs/project-reference/domain-entities-reference.md โ domain entity catalog, relationships, cross-service sync (read directly when relevant; do not rely on hook-injected conversation text)docs/specs/ โ existing TCs by module (read before generating tests; verify test-to-spec traceability)CRITICAL: Search existing patterns FIRST. Before generating ANY test, grep existing integration test files in same service. Read โฅ1 existing test file to match conventions (namespace, usings, collection name, base class, helper usage). NEVER generate tests contradicting established codebase patterns.
CRITICAL: NO Smoke/Fake/Useless Tests. Every test MUST execute actual commands/handlers and verify DB data state. NO DI-resolution-only tests. NO exception-check-only tests. Before writing assertions: READ handler/entity/event source โ understand WHAT fields change, WHAT entities created/updated/deleted, WHAT event handlers fire. Assert specific field values.
CRITICAL: Async Polling for ALL Data Assertions. ALWAYS wrap data state assertions in async polling/retry helper. DEFAULT for ALL data verification โ not just async handlers. Data persistence may be delayed by event handlers, message bus consumers, background jobs, DB write latency. Rule: If asserting data in DB โ use async polling. No exceptions.
For test specifications and test case generation from PBIs, use
/tdd-specskill instead.
External Memory: Complex/lengthy work โ write findings to
plans/reports/โ prevents context loss.
Evidence Gate: MANDATORY IMPORTANT MUST ATTENTION โ every claim requires
file:lineproof or traced evidence with confidence percentage (>80% act, <80% verify first).
Before implementation, search codebase for patterns:
IntegrationTest, TestFixture, TestUserContext, IntegrationTestBaseMANDATORY IMPORTANT MUST ATTENTION plan task to READ
integration-test-reference.mdfor project-specific patterns and code examples. If not found, continue with search-based discovery.
Workflow:
Key Rules:
references/integration-test-patterns.md before writing any testOrders/OrderCommandIntegrationTests.*). NEVER create Queries/ or Commands/ folder.// TC-{FEATURE}-{NNN}: Description comment + test-spec annotation โ before method, outside body/tdd-spec firstALWAYS create and execute tasks in this exact order:
FIRST: Verify/upsert test specs in feature docs
docs/business-features/{App}/detailed-features/) for target domaindocs/specs/{App}/README.md) if existsTC-{FEATURE}-{NNN} existsMIDDLE: Implement integration tests
// TC-OM-001: Create valid order โ happy path
[Trait("TestSpec", "TC-OM-001")]
[Fact]
public async Task CreateOrder_WhenValidData_ShouldCreateSuccessfully()
FINAL: Verify bidirectional traceability
TC-{FEATURE}-{NNN} in feature doc Section 15 / specs docIntegrationTest field in feature doc TCs with {File}::{MethodName}| Module | Abbreviation | Test Folder |
|---|---|---|
| Order Management | OM | Orders/ |
| Inventory | INV | Inventory/ |
| User Profiles | UP | UserProfiles/ |
| Notification Management | NM | Notifications/ |
| Report Generation | RG | Reports/ |
| Feedback | FB | Feedback/ |
| Background Jobs | BJ | โ |
Creating new TC-{FEATURE}-{NNN} codes:
docs/business-features/{App}/detailed-features/ has existing codes. New codes must not collide.Args = command/query name (e.g., "/integration-test CreateOrderCommand")
โ FROM-PROMPT mode: generate tests for the specified command/query
No args (e.g., "/integration-test")
โ FROM-CHANGES mode: detect changed command/query files from git
Args = "review" (e.g., "/integration-test review Orders")
โ REVIEW mode: audit existing test quality, find flaky patterns, check best practices
Args = "diagnose" (e.g., "/integration-test diagnose OrderCommandIntegrationTests")
โ DIAGNOSE mode: analyze why tests fail โ determine test bug vs code bug
Args = "verify" (e.g., "/integration-test verify {Service}")
โ VERIFY-TRACEABILITY mode: check test code matches specs and feature docs
Run via Bash tool:
git diff --name-only; git diff --cached --name-only
Filter for command/query files using project naming conventions (e.g., *Command.*, *Query.*). Path patterns from docs/project-config.json โ modules or backendServices. Extract service from path:
| Path pattern | Service | Test project |
|---|---|---|
Per docs/project-config.json service path pattern | {Service} | {Service}.IntegrationTests (or project equivalent) |
Search codebase for existing *.IntegrationTests.* projects to find correct mapping.
If no test project exists: inform user "No integration test project for {service}. See CLAUDE.md Integration Testing section to create one."
If test file already exists: ask user overwrite or skip.
User specifies command/query name. Use Grep tool (NOT bash grep):
Grep pattern="class {CommandName}" path="." glob="*.cs"
For each target, read in parallel:
{Service}.IntegrationTests/**/*IntegrationTests.*, read โฅ1 for conventions (collection name, trait, namespace, usings, base class)class.*ServiceIntegrationTestBasereferences/integration-test-patterns.md โ canonical templates (adapt {Service} placeholders)For each target domain, read:
docs/business-features/{App}/detailed-features/ Section 15 (primary source)docs/specs/{App}/README.md (secondary reference)Build mapping: test case description โ TC code (e.g., "create valid order" โ TC-OM-001).
/tdd-spec first.File path: {project-test-dir}/{Service}.IntegrationTests/{Domain}/{CommandName}IntegrationTests{ext} (adapt path/extension per docs/project-config.json โ integrationTestVerify.testProjectPattern)
Folder = domain feature.
{Domain}= business domain (Orders, Inventory, Notifications, UserProfiles), NOT CQRS type. Command and query tests for same domain live in same folder.
Structure (C#/xUnit โ adapt to your framework):
#region
using FluentAssertions;
// ... service-specific usings (copy from existing tests)
#endregion
namespace {Service}.IntegrationTests.{Domain};
[Collection({Service}IntegrationTestCollection.Name)]
[Trait("Category", "Command")] // or "Query"
public class {CommandName}IntegrationTests : {Service}ServiceIntegrationTestBase
{
// Minimum 3 tests: happy path, validation failure, DB state verification
}
Test method naming: {CommandName}_When{Condition}_Should{Expectation}
Required patterns per command type:
| Command type | Required tests |
|---|---|
| Save/Create | Happy path + validation failure + DB state |
| Update | Create-then-update + verify updated fields in DB |
| Delete | Create-then-delete + AssertEntityDeletedAsync |
| Query | Filter returns results + pagination + empty result |
Build test project via project's build tool (see /integration-test-verify for config-driven build).
MUST ATTENTION verify ALL of the following:
// TC-{FEATURE}-{NNN}: Description comment + test-spec annotationSearch codebase for existing integration test files:
find . -name "*IntegrationTests.*" -type f
find . -name "*IntegrationTestBase.*" -type f
find . -name "*IntegrationTestFixture.*" -type f
| Pattern | Shows |
|---|---|
{Service}.IntegrationTests/{Domain}/*CommandIntegrationTests.* | Create + update + validation |
{Service}.IntegrationTests/{Domain}/*QueryIntegrationTests.* | Query with create-then-query |
{Service}.IntegrationTests/{Domain}/Delete*IntegrationTests.* | Delete + cascade |
{Service}.IntegrationTests/{Service}ServiceIntegrationTestBase.* | Service base class pattern |
Case: Generate tests from existing test specs (feature docs Section 15)
/integration-test CreateOrderCommand
โ Reads Section 15 TCs, generates test file with TC annotations
Case: Generate tests from git changes (default)
/integration-test
โ Detects changed command/query files, checks Section 15 for matching TCs, generates tests
Case: Generate tests after /tdd-spec created new TCs
/tdd-spec โ /integration-test
โ tdd-spec writes TCs to Section 15, then integration-test generates tests from those TCs
Case: Review existing tests for quality
/integration-test review Orders
โ Audits test quality, finds flaky patterns, checks best practices
Case: Diagnose test failures
/integration-test diagnose OrderCommandIntegrationTests
โ Analyzes failures, determines test bug vs code bug
Case: Verify test-spec traceability
/integration-test verify {Service}
โ Checks test code matches specs and feature docs bidirectionally
Mode = REVIEW: audit existing integration tests for quality, flaky patterns, best practices.
| Input type | Sub-agent | Why |
|---|---|---|
| Test file quality audit | integration-tester | Purpose-built for spec generation, TC traceability, and test patterns โ catches integration-specific issues code-reviewer misses |
| Security-sensitive test data (PII, auth fixtures) | security-auditor | Detects PII leakage in test fixtures |
MANDATORY: Integration test REVIEW mode spawns
integration-testersub-agent (subagent_type: "integration-tester"), NOTcode-reviewer. Rationale:integration-testerspecializes in test spec generation, TC traceability, CQRS test patterns,WaitUntilAsynccorrectness, and microservices integration context โ areascode-reviewerdoes not cover at depth.
Fresh Eyes Protocol: Run Round 1 inline. If findings are LOW confidence or contradictory โ spawn fresh integration-tester sub-agent (zero memory of Round 1) for Round 2. Main agent reads report, NEVER filters findings. Max 2 rounds, then escalate.
{Service}.IntegrationTests/{Domain}/**/*IntegrationTests.*Dimension 1: Reliability โ Think: What causes intermittent failures?
WaitUntilAsync() or equiv โ WILL flakeThread.Sleep(), Task.Delay() instead of condition-based pollingDateTime.Now without time abstractionDimension 2: Assertion Value โ Think: Does the test actually verify anything?
exception.Should().BeNull() alone โ HIGH severityDimension 3: Conventions โ Think: Does test follow project patterns?
[Trait("Category", "Command")] or equivDimension 4: Code Quality โ Think: Maintainability and isolation?
{Action}_When{Condition}_Should{Expectation}# Integration Test Quality Report โ {Domain}
## Summary
- Tests scanned: {N}
- Issues found: {N} (HIGH: {n}, MEDIUM: {n}, LOW: {n})
- Overall quality: {GOOD|NEEDS_WORK|CRITICAL}
## HIGH Severity Issues (Flaky Risk)
| Test | Issue | Fix |
| ------------ | ------------------------------------------------ | -------------------------------------- |
| {MethodName} | DB assertion without polling after async handler | Wrap in project's async polling helper |
## MEDIUM Severity Issues (Best Practice)
| Test | Issue | Fix |
| ---- | ----- | --- |
## LOW Severity Issues (Style)
| Test | Issue | Fix |
| ---- | ----- | --- |
## Recommendations
1. {Prioritized fix suggestions}
Mode = DIAGNOSE: analyze failing tests to determine test bug vs application code bug.
Test fails
โโโ Compilation error?
โ โโโ Missing type/method โ Code changed, test not updated โ TEST BUG
โ โโโ Wrong import/namespace โ TEST BUG
โโโ Timeout/hang?
โ โโโ Missing async/await โ TEST BUG
โ โโโ Deadlock in handler โ CODE BUG
โ โโโ Infrastructure down โ INFRA ISSUE
โโโ Assertion failure?
โ โโโ Expected value wrong?
โ โ โโโ Test hardcoded old behavior โ TEST BUG
โ โ โโโ Business logic changed โ CODE BUG (if unintended) or TEST BUG (if intended change)
โ โโโ Null/empty result?
โ โ โโโ Entity not found โ Check if create step succeeded โ TEST BUG (setup) or CODE BUG (handler)
โ โ โโโ Query returns empty โ Check filters/predicates โ CODE BUG
โ โโโ Intermittent (passes sometimes)?
โ โ โโโ Async assertion without polling โ TEST BUG (add async polling/retry)
โ โ โโโ Non-unique test data collision โ TEST BUG (use unique name generator)
โ โ โโโ Race condition in handler โ CODE BUG
โ โโโ Wrong count/order?
โ โโโ Test data leak from other tests โ TEST BUG (isolation)
โ โโโ Logic error in query โ CODE BUG
โโโ Validation error (expected success)?
โ โโโ Test sends invalid data โ TEST BUG
โ โโโ Validation rule too strict โ CODE BUG
โโโ Exception thrown?
โโโ Known exception type in handler โ CODE BUG
โโโ DI/config error โ INFRA ISSUE
# Test Failure Diagnosis โ {TestClass}
## Failing Tests
| Test Method | Error Type | Root Cause | Classification |
| ----------- | ----------------- | ------------- | --------------------------- |
| {Method} | {AssertionFailed} | {Description} | TEST BUG / CODE BUG / INFRA |
## Detailed Analysis
### {MethodName}
**Error:** {error message}
**Expected:** {what test expected}
**Actual:** {what happened}
**Root Cause:** {explanation with code evidence}
**Classification:** TEST BUG | CODE BUG | INFRA ISSUE
**Evidence:** `{file}:{line}` โ {what the code does}
**Recommended Fix:** {specific fix with code location}
## Summary
- Test bugs: {N} โ fix in test code
- Code bugs: {N} โ fix in application code
- Infra issues: {N} โ fix in configuration/environment
Mode = VERIFY: bidirectional traceability check between test code, test specs, feature docs.
| Scenario | Likely Correct Source | Action |
|---|---|---|
| Test passes, spec describes different behavior | Test (reflects current code) | Update spec to match test |
| Test fails, spec describes expected behavior | Spec (test is stale) | Update test to match spec |
| Test exists, no spec | Test (spec was never written) | Create spec from test |
| Spec exists, no test | Spec (test was never written) | Generate test from spec |
| Test and spec agree, but code behaves differently | Spec (code has regression) | Fix code or update spec+test |
MUST ATTENTION verify ALL of the following:
Status: Untested)docs/specs/ dashboard is in sync with feature doc Section 15# Traceability Report โ {Service}
## Summary
- TCs in feature docs: {N}
- Test methods with TC annotations: {N}
- Fully traced (both directions): {N}
- Orphaned tests (no matching TC): {N}
- Orphaned TCs (no matching test): {N}
- Mismatched behavior: {N}
## Traceability Matrix
| TC ID | Feature Doc? | Test Code? | Dashboard? | Status |
| --------- | ------------ | ---------- | ---------- | ------------ |
| TC-OM-001 | โ
| โ
| โ
| Traced |
| TC-OM-005 | โ
| โ | โ
| Missing test |
| TC-OM-010 | โ | โ
| โ | Missing spec |
## Orphaned Tests (no matching TC in docs)
| Test File | Method | Annotation | Action |
| --------- | -------- | ---------- | ------------------------ |
| {file} | {method} | TC-OM-010 | Create TC in feature doc |
## Orphaned TCs (no matching test)
| TC ID | Doc Location | Priority | Action |
| --------- | ------------ | -------- | ----------------------------------- |
| TC-OM-005 | Section 15 | P0 | Generate test via /integration-test |
## Behavior Mismatches
| TC ID | Doc Says | Test Does | Correct Source | Action |
| ----- | -------- | --------- | -------------- | ------ |
## Recommendations
1. {Prioritized actions}
| Pattern | When to Use | Example |
|---|---|---|
| Per-test inline | Simple tests, unique data | var order = new CreateOrderCommand { Name = UniqueName() } |
| Factory methods | Repeated entity creation | TestDataFactory.CreateValidOrder() |
| Builder pattern | Complex entities with many fields | new OrderBuilder().WithStatus(Active).WithItems(3).Build() |
| Shared fixture | Reference data needed by all tests | CollectionFixture.SeedReferenceData() |
Rules:
MANDATORY IMPORTANT MUST ATTENTION โ NO EXCEPTIONS: NOT in workflow?
AskUserQuestionโ do NOT decide complexity yourself. User decides:
test-to-integrationworkflow (Recommended) โ scout โ integration-test โ integration-test-review โ integration-test-verify โ test โ docs-update โ watzup โ workflow-end/integration-testdirectly โ standalone
IMPORTANT MUST ATTENTION: After generating/modifying integration tests, MUST:
- Run tests:
/integration-test-verify(readsquickRunCommandfromdocs/project-config.json)- If tests fail: Diagnose root cause โ (a) wrong test setup/assertions โ fix test, or (b) service bug โ report as finding
- NEVER mark done until tests pass. Unrun tests have zero value.
- Iterate: Fix โ rerun โ verify until all pass or failures confirmed as service bugs
MANDATORY IMPORTANT MUST ATTENTION โ NO EXCEPTIONS after completing, use AskUserQuestion to present:
| Skill | Relationship | When to Call |
|---|---|---|
/tdd-spec | Producer โ TCs in feature doc Section 15 are the source for test generation | Must run tdd-spec before integration-test (CREATE or UPDATE mode). TCs must exist before generating tests. |
/tdd-spec-review | Upstream reviewer โ validates TC quality before test generation | Run before integration-test to ensure TCs have real assertion value |
/tdd-spec [direction=sync] | Dashboard โ syncs QA dashboard after TCs are linked to test files | Run after integration-test to update IntegrationTest: fields in dashboard |
/feature-docs | TC host โ Section 15 of feature doc is where TCs live | If feature doc is missing or Section 15 is empty โ run /feature-docs first |
/spec-discovery | Upstream spec โ engineering spec is source of truth for what tests should assert | If tests diverge from spec โ check spec-discovery output for correct behavior |
/integration-test-review | Reviewer โ 6-gate quality audit of generated tests | Always call after generating integration tests |
/integration-test-verify | Runner โ executes tests and reports pass/fail | Always call after integration-test-review clears |
/docs-update | Orchestrator โ calls tdd-spec sync (Phase 4) with test traceability | Run for full doc sync after integration test files updated |
When called outside a workflow, follow this chain to complete the integration test authoring cycle.
integration-test (you are here)
โ
โโ PREREQUISITE: TCs must exist in feature doc Section 15
โ [REQUIRED] Verify: docs/business-features/{Module}/README.md Section 15 has TC-{FEATURE}-{NNN} entries
โ If empty โ run /tdd-spec [CREATE mode] first
โ
โโ [REQUIRED] โ /integration-test-review
โ 6-gate quality audit: assertion value, data state, repeatability, domain logic, traceability, three-way sync.
โ Never skip โ Gate 6 (three-way sync) is the only place where spec/code/test conflicts surface.
โ
โโ [REQUIRED] โ /integration-test-verify
โ Runs tests and reports pass/fail counts. Never mark complete without real runner output.
โ
โโ [REQUIRED] โ /tdd-spec [direction=sync]
โ Updates QA dashboard with IntegrationTest: file::method traceability links.
โ
โโ [RECOMMENDED] โ /docs-update
โ Updates feature doc evidence fields and version history if test coverage changed materially.
โ
โโ [RECOMMENDED] โ /tdd-spec-review
Re-run if integration-test-review (Gate 6) flagged TC issues requiring TC edits.
### Mode-Specific Chains
| Mode | Pre-step | Post-step |
|------|---------|-----------|
| from-changes | verify TCs updated (run /tdd-spec UPDATE first) | /integration-test-review โ /verify โ /sync |
| from-prompt | confirm TC exists for target feature | /integration-test-review โ /verify โ /sync |
| review | N/A (read-only) | report findings โ /tdd-spec UPDATE if TCs need fixes |
| diagnose | run /test to see failures first | fix identified issue โ re-run /integration-test-verify |
| verify-traceability | N/A (read-only) | if orphaned TCs: /tdd-spec UPDATE โ /integration-test [from-prompt] |
[IMPORTANT]
TaskCreateโ break ALL work into small tasks BEFORE starting. NEVER skip task creation.
AI Mistake Prevention โ Failure modes to avoid on every task:
Check downstream references before deleting. Deleting components causes documentation and code staleness cascades. Map all referencing files before removal. Verify AI-generated content against actual code. AI hallucinates APIs, class names, and method signatures. Always grep to confirm existence before documenting or referencing. Trace full dependency chain after edits. Changing a definition misses downstream variables and consumers derived from it. Always trace the full chain. Trace ALL code paths when verifying correctness. Confirming code exists is not confirming it executes. Always trace early exits, error branches, and conditional skips โ not just happy path. When debugging, ask "whose responsibility?" before fixing. Trace whether bug is in caller (wrong data) or callee (wrong handling). Fix at responsible layer โ never patch symptom site. Assume existing values are intentional โ ask WHY before changing. Before changing any constant, limit, flag, or pattern: read comments, check git blame, examine surrounding code. Verify ALL affected outputs, not just the first. Changes touching multiple stacks require verifying EVERY output. One green check is not all green checks. Holistic-first debugging โ resist nearest-attention trap. When investigating any failure, list EVERY precondition first (config, env vars, DB names, endpoints, DI registrations, data preconditions), then verify each against evidence before forming any code-layer hypothesis. Surgical changes โ apply the diff test. Bug fix: every changed line must trace directly to the bug. Don't restyle or improve adjacent code. Enhancement task: implement improvements AND announce them explicitly. Surface ambiguity before coding โ don't pick silently. If request has multiple interpretations, present each with effort estimate and ask. Never assume all-records, file-based, or more complex path.
Critical Thinking Mindset โ Apply critical thinking, sequential thinking. Every claim needs traced proof, confidence >80% to act. Anti-hallucination: Never present guess as fact โ cite sources for every claim, admit uncertainty freely, self-check output for errors, cross-reference independently, stay skeptical of own confidence โ certainty without evidence root of all hallucination.
Understand Code First โ HARD-GATE: Do NOT write, plan, or fix until you READ existing code.
- Search 3+ similar patterns (
grep/glob) โ citefile:lineevidence- Read existing files in target area โ understand structure, base classes, conventions
- Run
python .claude/scripts/code_graph trace <file> --direction both --jsonwhen.code-graph/graph.dbexists- Map dependencies via
connectionsorcallers_ofโ know what depends on your target- Write investigation to
.ai/workspace/analysis/for non-trivial tasks (3+ files)- Re-read analysis file before implementing โ never work from memory alone
- NEVER invent new patterns when existing ones work โ match exactly or document deviation
BLOCKED until:
- [ ]Read target files- [ ]Grep 3+ patterns- [ ]Graph trace (if graph.db exists)- [ ]Assumptions verified with evidence
Graph Impact Analysis โ When
.code-graph/graph.dbexists, runblast-radius --jsonto detect ALL files affected by changes (7 edge types: CALLS, MESSAGE_BUS, API_ENDPOINT, TRIGGERS_EVENT, PRODUCES_EVENT, TRIGGERS_COMMAND_EVENT, INHERITS). Compute gap: impacted_files - changed_files = potentially stale files. Risk: <5 Low, 5-20 Medium, >20 High. Usetrace --direction downstreamfor deep chains on high-impact files.
Infinitely Repeatable Tests โ Tests MUST run N times without failure. Like manual QC โ run 100 times, each run adds data. Verification is only PASS after the relevant suite/project passes 3 consecutive runs without DB reset.
- Unique data per run: Use project's unique ID generator for ALL entity IDs. NEVER hardcode IDs.
- Additive only: Tests create data, never delete/reset. Prior runs MUST NOT interfere.
- No schema rollback dependency: Tests work with current schema only. Never rely on rollback.
- Idempotent seeders: Fixture-level seeders use create-if-missing (check existence before insert). Test-level data uses unique IDs per execution.
- No cleanup required: No teardown, no DB reset between runs. Isolation by unique seed data, not cleanup.
- Unique names/codes: Entities requiring unique names/codes โ append unique suffix via project's ID generator.
Red Flag Stop Conditions โ STOP and escalate via AskUserQuestion when:
- Confidence drops below 60% on any critical decision
- Changes affect >20 files
- Cross-service boundary crossed
- Security-sensitive code (auth, crypto, PII)
- Breaking change detected (interface, API contract, DB schema)
- Test coverage would decrease
- Approach requires technology/pattern not in project
NEVER proceed past a red flag without explicit user approval.
Rationalization Prevention โ AI skips steps via these evasions. Recognize and reject:
Evasion Rebuttal "Too simple for a plan" Simple + wrong assumptions = wasted time. Plan anyway. "I'll test after" RED before GREEN. Write/verify test first. "Already searched" Show grep evidence with file:line. No proof = no search."Just do it" Still need TaskCreate. Skip depth, never skip tracking. "Just a small fix" Small fix in wrong location cascades. Verify file:line first. "Code is self-explanatory" Future readers need evidence trail. Document anyway. "Combine steps to save time" Combined steps dilute focus. Each step has distinct purpose.
Incremental Result Persistence โ MANDATORY for all sub-agents or heavy inline steps processing >3 files.
- Before starting: Create report file
plans/reports/{skill}-{date}-{slug}.md- After each file/section reviewed: Append findings to report immediately โ never hold in memory
- Return to main agent: Summary only (per SYNC:subagent-return-contract) with
Full report:path- Main agent: Reads report file only when resolving specific blockers
Why: Context cutoff mid-execution loses ALL in-memory findings. Each disk write survives compaction.
Report naming:
plans/reports/{skill-name}-{YYMMDD}-{HHmm}-{slug}.md
Sub-Agent Return Contract โ When this skill spawns a sub-agent, the sub-agent MUST return ONLY this structure. Main agent reads only this summary โ NEVER requests full sub-agent output inline.
## Sub-Agent Result: [skill-name] Status: โ PASS | โ ๏ธ PARTIAL | โ FAIL Confidence: [0-100]% ### Findings (Critical/High only โ max 10 bullets) - [severity] [file:line] [finding] ### Actions Taken - [file changed] [what changed] ### Blockers (if any) - [blocker description] Full report: plans/reports/[skill-name]-[date]-[slug].mdMain agent reads
Full reportONLY when: (a) resolving specific blocker, or (b) building fix plan. Sub-agent writes full report incrementally (per SYNC:incremental-persistence) โ not held in memory.
Sub-Agent Selection โ Full routing contract:
.claude/skills/shared/sub-agent-selection-guide.mdRule: NEVER usecode-reviewerfor specialized domains (architecture, security, performance, DB, E2E, integration-test, git).
Nested Task Expansion Contract โ For workflow-step invocation, the
[Workflow] ...row is only a parent container; the child skill still creates visible phase tasks.
- Call
TaskListfirst. If a matching active parent workflow row exists, setnested=trueand recordparentTaskId; otherwise run standalone.- Create one task per declared phase before phase work. When nested, prefix subjects
[N.M] $skill-name โ phase.- When nested, link the parent with
TaskUpdate(parentTaskId, addBlockedBy: [childIds]).- Orchestrators must pre-expand a child skill's phase list and link the workflow row before invoking that child skill or sub-agent.
- Mark exactly one child
in_progressbefore work andcompletedimmediately after evidence is written.- Complete the parent only after all child tasks are completed or explicitly cancelled with reason.
Blocked until:
TaskListdone, child phases created, parent linked when nested, first child markedin_progress.
Project Reference Docs Gate โ Run after task-tracking bootstrap and before target/source file reads, grep, edits, or analysis. Project docs override generic framework assumptions.
- Identify scope: file types, domain area, and operation.
- Required docs by trigger: always
docs/project-reference/lessons.md; doc lookupdocs-index-reference.md; reviewcode-review-rules.md; backend/CQRS/APIbackend-patterns-reference.md; domain/entitydomain-entities-reference.md; frontend/UIfrontend-patterns-reference.md; styles/designscss-styling-guide.md+design-system/README.md; integration testsintegration-test-reference.md; E2Ee2e-test-reference.md; feature docs/specsfeature-docs-reference.md; architecture/new areaproject-structure-reference.md.- Read every required doc that exists; skip absent docs as not applicable. Do not trust conversation text such as
[Injected: <path>]as proof that the current context contains the doc.- Before target work, state:
Reference docs read: ... | Missing/not applicable: ....Blocked until: scope evaluated, required docs checked/read,
lessons.mdconfirmed, citation emitted.
Task Tracking & External Report Persistence โ Bootstrap this before execution; then run project-reference doc prefetch before target/source work.
- Create a small task breakdown before target file reads, grep, edits, or analysis. On context loss, inspect the current task list first.
- Mark one task
in_progressbefore work andcompletedimmediately after evidence; never batch transitions.- For plan/review work, create
plans/reports/{skill}-{YYMMDD}-{HHmm}-{slug}.mdbefore first finding.- Append findings after each file/section/decision and synthesize from the report file at the end.
- Final output cites
Full report: plans/reports/{filename}.Blocked until: task breakdown exists, report path declared for plan/review work, first finding persisted before the next finding.
file:line.
blast-radius when graph.db exists. Flag impacted files NOT in changeset as potentially stale.
MUST ATTENTION apply critical thinking โ every claim needs traced proof, confidence >80% to act. Anti-hallucination: never present guess as fact.
MUST ATTENTION apply AI mistake prevention โ holistic-first debugging, fix at responsible layer, surface ambiguity before coding, re-read files after compaction.
plans/reports/ incrementally and synthesize from disk.Reference docs read: ....lessons.md; project conventions override generic defaults.[N.M] $skill-name โ phase prefixes and one-in_progress discipline.IMPORTANT MUST ATTENTION follow declared step order for this skill; NEVER skip, reorder, or merge steps without explicit user approval
IMPORTANT MUST ATTENTION for every step/sub-skill call: set in_progress before execution, set completed after execution
IMPORTANT MUST ATTENTION every skipped step MUST include explicit reason; every completed step MUST include concise evidence
IMPORTANT MUST ATTENTION if Task tools unavailable, maintain an equivalent step-by-step plan tracker with synchronized statuses
TaskCreate โ break ALL work into small tasks BEFORE startingAskUserQuestion โ validate decisions with user. NEVER auto-decide.references/integration-test-patterns.md BEFORE writing any testQueries/ or Commands/ folders โ organize by domain feature/integration-test-verifyfile:line before modifying anythingAnti-Rationalization:
| Evasion | Rebuttal |
|---|---|
| "Test is simple, skip TC lookup" | TC traceability = test value. Skip = untraceable test. |
| "Async polling not needed here" | ALL DB assertions need polling. Handler type irrelevant. |
| "Already searched patterns" | Show file:line evidence. No proof = no search. |
| "Smoke test is fine for now" | Smoke-only FORBIDDEN. Assert specific field values. |
| "Repo setup is faster" | Direct repository data hacks create invalid state. Use real use-case paths or valid seeded fixtures. |
| "One green run is enough" | Verification requires 3 consecutive passing runs without DB reset. |
| "REVIEW: one pass is enough" | Low confidence โ spawn fresh sub-agent. Never declare PASS after Round 1. |
| "Skip task creation, it's obvious" | TaskCreate is non-negotiable. Tracking prevents context loss. |