ワンクリックで
deep-crawl
// Exhaustive LLM-powered codebase investigation for optimal AI agent onboarding. Builds on X-Ray signals with unlimited investigation budget to produce the highest-quality onboarding document possible.
// Exhaustive LLM-powered codebase investigation for optimal AI agent onboarding. Builds on X-Ray signals with unlimited investigation budget to produce the highest-quality onboarding document possible.
AST-based Python codebase analysis. Use for exploring architecture, extracting interfaces, mapping dependencies, or generating onboarding documentation.
Quality assurance for AI onboarding documentation. Analyzes ONBOARD documents against X-Ray outputs and actual code to identify gaps, verify claims, and suggest improvements.
| name | deep-crawl |
| description | Exhaustive LLM-powered codebase investigation for optimal AI agent onboarding. Builds on X-Ray signals with unlimited investigation budget to produce the highest-quality onboarding document possible. |
Systematic codebase investigation producing comprehensive onboarding documents optimized for AI agent consumption via CLAUDE.md delivery with prompt caching. Depth over brevity — include everything that saves a downstream agent from opening files.
Core metric: File-reads saved per onboarding token. Every token in your output should reduce the number of files a downstream agent needs to open before it can confidently make changes.
/deep-crawl full when generation cost is not a constraint and you want
the highest-quality onboarding document for many future agent sessions@deep_crawl full as a sequential fallback if /deep-crawl is unavailable| Command | Mode | What It Does |
|---|---|---|
/deep-crawl full | Orchestrated (parallel sub-agents) | Plan + parallel investigation + assemble + cross-reference + validate + deliver |
@deep_crawl full | Sequential (single-agent fallback) | Same pipeline, sequential investigation |
@deep_crawl plan | Sequential | Generate investigation plan only |
@deep_crawl resume | Sequential | Continue from last checkpoint |
@deep_crawl validate | Sequential | QA an existing DEEP_ONBOARD.md |
@deep_crawl refresh | Sequential | Update for code changes |
@deep_crawl focus ./path | Sequential | Deep crawl a specific subsystem |
/deep-crawl full <github-url> | Orchestrated | Clone remote repo, auto-run xray, then full crawl pipeline |
When a GitHub URL (or gh:owner/repo shorthand) is passed as an argument to /deep-crawl full,
the skill clones the repository to a local temp directory and runs the full pipeline against it.
Supported URL formats:
https://github.com/owner/repohttps://github.com/owner/repo.gitgh:owner/repogit@github.com:owner/repo.gitKey differences from local crawl:
.deep_crawl/repo/DEEP_CRAWL_ROOT points to the clone directory instead of $(pwd){ROOT_PATH} is set to the clone directoryoutput/{repo-name}/ (same structure as local crawls)rm -rf .deep_crawl/repo/ when doneWhat stays the same: All six phases, quality gates, investigation protocols, evidence standards, and validation — the entire pipeline is identical.
| Tag | Standard | Example |
|---|---|---|
| [FACT] | Read specific code, cite file:line | "3x retry with backoff [FACT] (payments.ts:89 / stripe.py:89)" |
| [PATTERN] | Observed in >=3 examples, state count | "DI via init [PATTERN: 12/14 services]" |
| [ABSENCE] | Searched and confirmed non-existence | "No rate limiting [ABSENCE: grep — 0 hits]" |
No inferences or unverified signals in the output document.
Citation density floor: Assembled sections have tiered density requirements: high-evidence sections (impact, gotchas, contracts) >= 3.0, medium (module index, interfaces, playbooks, error handling) >= 2.0, narrative (critical paths, conventions) >= 1.0, structural (glossary, reading order) >= 0. Playbooks individually >= 3.0 per 100 words. Investigation findings (raw) should target >= 5.0 per 100 words.
| File | Purpose | Location |
|---|---|---|
| DEEP_ONBOARD.md | Onboarding document (unrestricted, value-driven) | output/{repo-name}/ |
| xray.md | Structural analysis companion | output/{repo-name}/ |
| CLAUDE.md update | Auto-delivery to all sessions (local only) | project root |
| crawl.log | Full pipeline debug log (gates, spawns, re-spawns, stats) | output/{repo-name}/data/ |
| All intermediate data | Findings, sections, plans, validation | output/{repo-name}/data/ |
Before starting, verify:
output/$REPO_NAME/data/xray.json and output/$REPO_NAME/xray.mdpython xray.py . --output both first."Goal: Establish working environment and context management strategy.
All intermediate state lives on disk, not in conversation context.
# === REMOTE REPOSITORY DETECTION ===
# If the user passed a GitHub URL, clone it and set DEEP_CRAWL_ROOT.
# Otherwise, DEEP_CRAWL_ROOT defaults to the current working directory.
DEEP_CRAWL_REMOTE="${1:-}" # first argument, if any
DEEP_CRAWL_ROOT="$(pwd)"
if [[ "$DEEP_CRAWL_REMOTE" =~ ^(https://github\.com/|git@github\.com:|gh:) ]]; then
REPO_DIR=".deep_crawl/repo"
# Clean previous clone
[ -d "$REPO_DIR" ] && rm -rf "$REPO_DIR"
# Normalize gh:owner/repo shorthand to full URL
if [[ "$DEEP_CRAWL_REMOTE" == gh:* ]]; then
DEEP_CRAWL_REMOTE="https://github.com/${DEEP_CRAWL_REMOTE#gh:}"
fi
echo "Cloning remote repository: $DEEP_CRAWL_REMOTE"
gh repo clone "$DEEP_CRAWL_REMOTE" "$REPO_DIR" 2>/dev/null \
|| git clone "$DEEP_CRAWL_REMOTE" "$REPO_DIR"
if [ ! -d "$REPO_DIR/.git" ]; then
echo "HALT: Clone failed. Check the URL and your access permissions."
exit 1
fi
DEEP_CRAWL_ROOT="$REPO_DIR"
echo "Remote repo cloned to: $DEEP_CRAWL_ROOT"
echo "DEEP_CRAWL_MODE=remote"
# Auto-run xray on cloned repo
# Locate xray.py: check skill source directory, then cwd, then PATH
XRAY_PY=""
SKILL_SOURCE=$(readlink -f ~/.claude/skills/deep-crawl 2>/dev/null || echo "")
if [ -n "$SKILL_SOURCE" ] && [ -f "$(dirname "$SKILL_SOURCE")/../../xray.py" ]; then
XRAY_PY="$(dirname "$SKILL_SOURCE")/../../xray.py"
elif [ -f "xray.py" ]; then
XRAY_PY="./xray.py"
elif command -v xray.py >/dev/null 2>&1; then
XRAY_PY="xray.py"
fi
if [ -n "$XRAY_PY" ]; then
echo "Running xray: python $XRAY_PY $DEEP_CRAWL_ROOT --output both --repo-name $REPO_NAME"
python "$XRAY_PY" "$DEEP_CRAWL_ROOT" --output both --repo-name "$REPO_NAME"
else
echo "HALT: Cannot find xray.py. Provide its path or run manually:"
echo " python /path/to/xray.py $DEEP_CRAWL_ROOT --output both"
fi
else
echo "DEEP_CRAWL_MODE=local"
fi
# Determine repo name and output directory
REPO_NAME=$(cd "$DEEP_CRAWL_ROOT" && git remote get-url origin 2>/dev/null | sed 's|.*/||; s|\.git$||' || basename "$DEEP_CRAWL_ROOT")
OUTPUT_DIR="output/$REPO_NAME"
echo "REPO_NAME=$REPO_NAME"
echo "OUTPUT_DIR=$OUTPUT_DIR"
# Create working directory structure
mkdir -p .deep_crawl/findings/{traces,modules,cross_cutting,conventions,impact,playbooks,calibration} \
.deep_crawl/batch_status \
.deep_crawl/sections \
.deep_crawl/agent_logs/prompts \
.deep_crawl/agent_logs/results \
"$OUTPUT_DIR/data"
# Initialize crawl log
CRAWL_LOG="$OUTPUT_DIR/data/crawl.log"
echo "=== Deep Crawl Log ===" > "$CRAWL_LOG"
echo "Started: $(date -Iseconds)" >> "$CRAWL_LOG"
echo "Repo: $REPO_NAME" >> "$CRAWL_LOG"
echo "Root: $DEEP_CRAWL_ROOT" >> "$CRAWL_LOG"
echo "Mode: ${DEEP_CRAWL_MODE:-local}" >> "$CRAWL_LOG"
echo "" >> "$CRAWL_LOG"
# Initialize orchestrator event log
ORCH_LOG=".deep_crawl/agent_logs/orchestrator.log"
FILE_COUNT=$(find "$DEEP_CRAWL_ROOT" -type f -not -path "*/.git/*" -not -path "*/node_modules/*" -not -path "*/dist/*" 2>/dev/null | wc -l)
echo "$(date -Iseconds) PHASE 0 SETUP repo=$REPO_NAME files=$FILE_COUNT" > "$ORCH_LOG"
# Verify xray output exists
test -f "$OUTPUT_DIR/data/xray.json" && echo "READY" || echo "Run: python xray.py $DEEP_CRAWL_ROOT --output both"
# Clean stale data from previous crawl
if [ -f .deep_crawl/CRAWL_PLAN.md ]; then
PREV_HASH=$(head -1 .deep_crawl/CRAWL_PLAN.md | grep -oP '[a-f0-9]{7,}' || echo "unknown")
echo "⚠️ PREVIOUS CRAWL FOUND (hash: $PREV_HASH)"
echo "Cleaning stale data for fresh crawl..."
rm -rf .deep_crawl/findings/* .deep_crawl/batch_status/* .deep_crawl/sections/*
rm -f .deep_crawl/CRAWL_PLAN.md .deep_crawl/SYNTHESIS_INPUT.md
rm -f .deep_crawl/DRAFT_ONBOARD.md .deep_crawl/DEEP_ONBOARD.md
rm -f .deep_crawl/VALIDATION_REPORT.md .deep_crawl/REFINE_LOG.md
fi
# Check for existing crawl state (resumability)
if [ -f .deep_crawl/CRAWL_PLAN.md ]; then
echo "PREVIOUS CRAWL FOUND"
head -5 .deep_crawl/CRAWL_PLAN.md
git -C "$DEEP_CRAWL_ROOT" log --oneline -1
echo "If hashes match, run @deep_crawl resume"
echo "If not, this is a stale crawl — starting fresh"
fi
DEEP_CRAWL_ROOT usage: All subsequent phases use $DEEP_CRAWL_ROOT wherever the codebase
root is referenced. For local crawls this equals $(pwd) (backward compatible). For remote
crawls it points to .deep_crawl/repo/. Sub-agent prompts must set {ROOT_PATH} to
the value of $DEEP_CRAWL_ROOT.
Context management rules:
@deep_crawl resumeAgent logging protocol (apply at every spawn point):
Before spawning any sub-agent:
.deep_crawl/agent_logs/prompts/{agent_id}.mdecho "$(date -Iseconds) SPAWN {agent_id} prompt=agent_logs/prompts/{agent_id}.md" >> "$ORCH_LOG"After agent completion:
.deep_crawl/agent_logs/results/{agent_id}.mdecho "$(date -Iseconds) DONE {agent_id} files={count} words={total} facts={total}" >> "$ORCH_LOG"After any quality gate:
echo "$(date -Iseconds) GATE {agent_id} {PASS|FAIL} {metrics}" >> "$ORCH_LOG"After any re-spawn:
echo "$(date -Iseconds) RESPAWN {agent_id} reason=\"{reason}\"" >> "$ORCH_LOG"Agent IDs follow this naming convention:
cal_a, cal_b, cal_cP1.1, P2.3, P4.2)S1, S2, S3a, S3b, S4, S5, S6phase4_crossrefphase5_validategap_{name}Goal: Catch stale inputs and repo characteristics before investing tokens in investigation.
# === PRE-FLIGHT DIAGNOSTICS ===
echo "=== Pre-Flight ==="
# Xray freshness — robust hash comparison
XRAY_HASH=$(python3 -c "
import json
d = json.load(open('$OUTPUT_DIR/data/xray.json'))
print(d.get('git_commit', d.get('commit_hash', '?')))
" 2>/dev/null)
HEAD=$(git -C "$DEEP_CRAWL_ROOT" rev-parse HEAD 2>/dev/null || echo "no-git")
if [ "$XRAY_HASH" != "?" ] && [ "$HEAD" != "no-git" ]; then
# Prefix match handles short-vs-full hash
if [[ ! "$HEAD" == "$XRAY_HASH"* ]] && [[ ! "$XRAY_HASH" == "$HEAD"* ]]; then
echo "HALT: Xray is stale (xray: ${XRAY_HASH:0:7}, HEAD: ${HEAD:0:7}). Re-run xray."
else
echo "Xray matches HEAD: ${HEAD:0:7}"
fi
else
[ "$XRAY_HASH" = "?" ] && echo "WARNING: Xray has no git_commit field. Proceeding without freshness check."
fi
# Repo characteristics — detect language
PY_COUNT=$(find "$DEEP_CRAWL_ROOT" -name "*.py" -not -path "*/.git/*" -not -path "*/node_modules/*" 2>/dev/null | wc -l)
TS_COUNT=$(find "$DEEP_CRAWL_ROOT" \( -name "*.ts" -o -name "*.tsx" -o -name "*.js" -o -name "*.jsx" \) -not -path "*/.git/*" -not -path "*/node_modules/*" -not -path "*/dist/*" 2>/dev/null | wc -l)
if [ "$TS_COUNT" -gt "$PY_COUNT" ]; then
CODE_LANG="typescript"
FILE_COUNT=$TS_COUNT
TEST_COUNT=$(find "$DEEP_CRAWL_ROOT" \( -name "*.test.ts" -o -name "*.spec.ts" -o -name "*.test.js" -o -name "*.test.tsx" \) -not -path "*/node_modules/*" 2>/dev/null | wc -l)
echo "Language: TypeScript/JavaScript | Files: $FILE_COUNT | Test files: $TEST_COUNT"
else
CODE_LANG="python"
FILE_COUNT=$PY_COUNT
TEST_COUNT=$(find "$DEEP_CRAWL_ROOT" -name "test_*.py" -o -name "*_test.py" | wc -l)
echo "Language: Python | Files: $FILE_COUNT | Test files: $TEST_COUNT"
fi
echo "CODE_LANG=$CODE_LANG"
[ "$FILE_COUNT" -gt 5000 ] && echo "WARNING: Very large repo. Consider focused crawl."
[ "$TEST_COUNT" -eq 0 ] && echo "WARNING: No test files detected. Testing section may be thin. (Note: in-source testing via import.meta.vitest won't be detected by file naming.)"
# Framework detection
if [ "$CODE_LANG" = "typescript" ]; then
for fw in express next nestjs fastify koa hapi commander yargs oclif react vue angular jest vitest mocha prisma typeorm sequelize mongoose graphql trpc zod valibot; do
grep -rl "\"$fw\"" --include="*.json" "$DEEP_CRAWL_ROOT" 2>/dev/null | grep -v node_modules | head -1 >/dev/null 2>&1 && echo "Framework: $fw"
done
else
for fw in fastapi flask django click typer torch tensorflow airflow dagster numpy scipy asyncio aiohttp paramiko boto3 pluggy stevedore; do
grep -rl "$fw" --include="*.py" "$DEEP_CRAWL_ROOT" 2>/dev/null | head -1 >/dev/null 2>&1 && echo "Framework: $fw"
done
fi
Halt condition: If xray hash doesn't match HEAD, stop and tell user to re-run xray. Other warnings are logged to .deep_crawl/PREFLIGHT.md for context during investigation planning.
Save all pre-flight output to .deep_crawl/PREFLIGHT.md for Phase 1 consumption. Also append pre-flight output to $CRAWL_LOG under a ## Pre-flight header.
echo "## Phase 1: PLAN — $(date -Iseconds)" >> "$CRAWL_LOG"
echo "$(date -Iseconds) PHASE 1 PLAN" >> "$ORCH_LOG"
Input: X-Ray JSON output including investigation_targets.
output/$REPO_NAME/xray.md for orientationinvestigation_targets from output/$REPO_NAME/data/xray.jsongit.function_churn and git.velocity from output/$REPO_NAME/data/xray.json. Files with accelerating velocity or high function-level churn should be prioritized for investigation..claude/skills/deep-crawl/configs/domain_profiles.json. A repo can match multiple facets (e.g., a Django app with Celery workers and a CLI gets web_api + async_service + cli_tool). Union their additional_investigation tasks into the crawl plan. If no facet matches, use library as default. Read .deep_crawl/PREFLIGHT.md for framework detection results to guide facet matching.For each matched facet:
additional_investigation tasks to the crawl plan at P4 priority (cross-cutting concerns)additional_output_sections to the assembly plan for the appropriate sub-agent (S3 or S4)grep_patterns to Protocol C investigation prompts for related cross-cutting agentsDomain Facets: web_api, async_service, cli_toolIf facet investigation tasks push any batch beyond its max agent limit, split into sub-batches.
Use the facet's primary_entity to determine the adversarial simulation scenario in Phase 5.
If multiple facets match, the adversarial simulation should use the primary facet's entity.
4. Produce a prioritized crawl plan using the template at .claude/skills/deep-crawl/templates/CRAWL_PLAN.md.template
5. Save to .deep_crawl/CRAWL_PLAN.md
6. Module coverage pre-check (mandatory). After saving the plan, verify P2+P3 task count meets the coverage target:
# === MODULE COVERAGE PRE-CHECK ===
P23_COUNT=$(grep -c '^\- \[ \] P[23]\.' .deep_crawl/CRAWL_PLAN.md 2>/dev/null || echo 0)
TARGET=$(python3 -c "
import json
d = json.load(open('$OUTPUT_DIR/data/xray.json'))
file_count = len(d.get('files', d.get('file_list', [])))
print(max(10, file_count // 40))
" 2>/dev/null)
echo "P2+P3 tasks: $P23_COUNT, coverage target: $TARGET"
TOTAL_TASKS=$(grep -c '^\- \[ \]' .deep_crawl/CRAWL_PLAN.md 2>/dev/null || echo 0)
FACETS=$(head -20 .deep_crawl/CRAWL_PLAN.md | grep -oP 'Domain Facets: .*' | sed 's/Domain Facets: //' || echo "none")
echo "$(date -Iseconds) PHASE 1 PLAN tasks=$TOTAL_TASKS facets=$FACETS" >> "$ORCH_LOG"
If P23_COUNT < TARGET:
a. Read xray.json import graph: identify all modules with imported_by count >= 3 that are NOT already P2 or P3 tasks
b. Sort by imported_by count descending
c. Add the top (TARGET - P23_COUNT) modules as new P3 tasks to CRAWL_PLAN.md
d. Hop promotion: Any module that appears as an intermediate hop in a P1 trace task description AND does not already have a P2/P3 task gets added as a P3 task (these are modules the traces pass through — they deserve deep-reads)
e. Update the Progress table's P3 total
f. Log: Coverage pre-check: {P23_COUNT} tasks → {new_count} (added {added} modules, target {TARGET})
g. Barrel file analysis (TypeScript only): If CODE_LANG is typescript, identify index.ts files that are mostly re-exports (high export count, low logic). These are architectural chokepoints — if they create circular imports, many modules break. Add the top 3 barrel files as P3 tasks if not already present.
as any / @ts-ignore density should be prioritized (type safety gaps indicate under-documented behavior)ts_specific.module_augmentations are coupling risks — add as P3 tasksts_specific.any_density in xray.json — if explicit_any + as_any_assertions > 20, add a P4 cross-cutting task: "Audit type safety escape hatches and document which are intentional vs technical debt"Prioritization logic (by information density):
| Priority | Task Type | Rationale |
|---|---|---|
| 1 | Request traces | Highest file-reads-saved per output token |
| 2 | High-uncertainty module deep reads | Name and signature tell you nothing |
| 3 | Pillar behavioral summaries | Most-depended-on modules |
| 4 | Cross-cutting concerns | Learn once, apply everywhere |
| 5 | Conventions and patterns | Prevents style violations |
| 6 | Gap investigation | Catches what xray missed |
Within each priority level, order tasks by information density: request traces by estimated hop count descending (longer traces reveal more cross-module behavior). Module deep reads by uncertainty score descending (highest-uncertainty modules first).
Use extended thinking here to reason about investigation priorities for this specific codebase.
Goal: Before bulk investigation, produce repo-specific quality exemplars by investigating a small number of high-value targets at elevated depth. Their output becomes the quality reference for all Phase 2 sub-agents, replacing static exemplars.
Scope by repo size (from Phase 0b FILE_COUNT):
| Files | Calibration |
|---|---|
| <= 10 | Skip — use structural templates only |
| 11-30 | 1 target (CAL_B only) |
| >= 31 | Full 3 targets |
Target selection (deterministic, no LLM judgment):
Elevated quality floor (calibration must exceed normal investigation floor):
Thresholds are defined in .claude/skills/deep-crawl/configs/quality_gates.json under calibration_findings.
Procedure:
echo "$(date -Iseconds) PHASE 1b CALIBRATE" >> "$ORCH_LOG"
calibration_findings thresholds (400w, 10 FACT, 6.0/100w).claude/skills/deep-crawl/configs/exemplar_templates.md only (calibration findings don't exist yet).deep_crawl/findings/calibration/cal_{a|b|c}.mdtouch .deep_crawl/batch_status/cal_{a|b|c}.done.deep_crawl/agent_logs/prompts/cal_{a|b|c}.md and log the SPAWN event..deep_crawl/agent_logs/results/cal_{a|b|c}.md and log the DONE event.# === CALIBRATION QUALITY GATE ===
CAL_GATE=true
for f in .deep_crawl/findings/calibration/cal_*.md; do
[ -f "$f" ] || continue
WORDS=$(wc -w < "$f")
FACTS=$(grep -c '\[FACT' "$f" 2>/dev/null || echo 0)
[ "$WORDS" -eq 0 ] && continue
DENSITY=$((FACTS * 100 / WORDS))
if [ "$WORDS" -lt 400 ] || [ "$FACTS" -lt 10 ] || [ "$DENSITY" -lt 6 ]; then
echo "CAL FAIL: $(basename $f) — ${WORDS}w, ${FACTS} FACT, ${DENSITY}/100w density"
echo " (elevated floor: 400w, 10 FACT, 6.0/100w)"
CAL_GATE=false
fi
done
# Log calibration gate results
echo "## Calibration Gate: $([ "$CAL_GATE" = true ] && echo PASS || echo FAIL)" >> "$CRAWL_LOG"
for f in .deep_crawl/findings/calibration/cal_*.md; do
[ -f "$f" ] || continue
AGENT_ID=$(basename "$f" .md)
WORDS=$(wc -w < "$f"); FACTS=$(grep -c '\[FACT' "$f" 2>/dev/null || echo 0)
echo " $(basename $f): ${WORDS}w, ${FACTS} FACT" >> "$CRAWL_LOG"
[ "$WORDS" -eq 0 ] && continue
DENSITY=$((FACTS * 100 / WORDS))
if [ "$WORDS" -ge 400 ] && [ "$FACTS" -ge 10 ] && [ "$DENSITY" -ge 6 ]; then
echo "$(date -Iseconds) GATE $AGENT_ID PASS words=$WORDS facts=$FACTS density=${DENSITY}/100w" >> "$ORCH_LOG"
else
echo "$(date -Iseconds) GATE $AGENT_ID FAIL words=$WORDS facts=$FACTS density=${DENSITY}/100w" >> "$ORCH_LOG"
fi
done
echo "" >> "$CRAWL_LOG"
Failure handling:
Integration:
[ -f .deep_crawl/findings/calibration/cal_a.md ] && \
cp .deep_crawl/findings/calibration/cal_a.md .deep_crawl/findings/traces/00_calibration.md
[ -f .deep_crawl/findings/calibration/cal_b.md ] && \
cp .deep_crawl/findings/calibration/cal_b.md .deep_crawl/findings/modules/00_calibration.md
[ -f .deep_crawl/findings/calibration/cal_c.md ] && \
cp .deep_crawl/findings/calibration/cal_c.md .deep_crawl/findings/cross_cutting/00_calibration.md
[x] in CRAWL_PLAN.md so Phase 2 doesn't re-investigate them.deep_crawl/findings/calibration/cal_{type}.md as quality exemplars instead of static exemplarsecho "## Phase 2: CRAWL — $(date -Iseconds)" >> "$CRAWL_LOG"
echo "$(date -Iseconds) PHASE 2 CRAWL" >> "$ORCH_LOG"
Phase 2 uses parallel sub-agents to investigate the codebase. You act as the orchestrator — spawn investigation agents, monitor completion, and verify coverage. Do NOT perform investigation yourself (except as fallback).
Step 1: Parse and batch. Read CRAWL_PLAN.md. Group tasks into batches:
| Batch | Tasks | Protocol | Max Agents | Dependencies |
|---|---|---|---|---|
| 1 | All P1 (request traces) | A | 5 | None |
| 2 | All P2 + P3 (modules + pillars) | B | one per task | None |
| 3 | All P4 (cross-cutting concerns incl. async boundaries) | C | 6 | None |
| 4 | All P5 + P6 (conventions + gaps) | D + mixed | 4 | Batches 1-3 |
| 5 | P7 (change impact) + Coverage gaps | E + Mixed | 8 | Batches 1-4 |
| 6 | Change scenarios | F | 3 | Batches 1-5 |
Batches 1-3 are independent — launch them concurrently (all in a single message with multiple Agent tool calls). Batch 4 waits for 1-3 because convention detection and gap investigation benefit from earlier findings being on disk. Batch 5 (P7 impact + coverage gaps) waits for Batches 1-4 because impact analysis reads module findings. Batch 6 (change scenarios) waits for Batches 1-5 because playbooks reference impact findings.
If a batch has more tasks than its max agents, split into sequential sub-batches of max agents each. All sub-batches within a batch are independent.
Batch 2 unbatching rule: Spawn one agent per P2/P3 task — never batch multiple modules into one agent. Each module gets its own investigation context for richer, more distinct findings that map cleanly to individual ### subsections during assembly. If total P2+P3 tasks exceed 15, use sequential sub-batches of 15.
Batch 2 elevated quality floor: Because each unbatched agent investigates a single module with full context, findings must be deeper than the default floor. Batch 2 sub-agent prompts MUST use these thresholds instead of the default:
findings/modules/ (excluding 00_calibration.md) uses the elevated floor.P7 relocation rationale: Change impact analysis (Protocol E) moved from Batch 2 to Batch 5 because impact analysis reads reverse dependency data enriched by module deep-reads. Running P7 after Batches 1-4 ensures impact agents have full module findings on disk.
Multi-facet batch adjustment: If domain facets added investigation tasks to P4, Batch 3 may exceed its max of 6 agents. Split Batch 3 into sub-batches of 6 each. Facet investigation tasks have no dependencies on Batch 1-2 and can run in any Batch 3 sub-batch. The orchestrator must include the facet's grep_patterns in the Protocol C prompt for facet-specific cross-cutting investigations.
Step 2: Spawn sub-agents. For each task in a batch, spawn a sub-agent using the Agent tool. Before spawning each agent, apply the agent logging protocol: write the prompt to .deep_crawl/agent_logs/prompts/{task_id}.md and log the SPAWN event to orchestrator.log. After each agent completes, write the return text to .deep_crawl/agent_logs/results/{task_id}.md and log the DONE event. Each sub-agent prompt must be self-contained with these sections:
You are investigating [CODEBASE] at [ROOT_PATH] for an onboarding document.
## Your Task
[Specific task from crawl plan — e.g., "Trace the primary CLI entry point from invocation to terminal side effect"]
## Investigation Protocol
[Full text of the relevant protocol (A, B, C, or D) copied verbatim from below]
## Evidence Standards
- [FACT]: Read specific code, cite file:line. Example: "retries 3x (payments.ts:89)" or "retries 3x (stripe.py:89)"
- [PATTERN]: Observed in >=3 examples, state count. Example: "DI via __init__ (12/14 services)"
- [ABSENCE]: Searched and confirmed non-existence. Example: "No rate limiting (grep — 0 hits)"
- Gotchas must be [FACT] claims with file:line.
- Never include inferences or unverified signals.
- ACTIVELY SEARCH for [ABSENCE] evidence. At each investigation step, ask: "what should exist here but doesn't?" Missing error handling, missing validation, missing tests, missing docs — these are high-value findings. Target: at least 1 [ABSENCE] per 5 [FACT] citations.
## Quality Floor (mechanically checked after you finish)
- Minimum 200 words
- Minimum 5 [FACT] citations
- Target density: >= 5.0 [FACT] per 100 words
If your file fails the check, you will be re-spawned to investigate deeper.
For format guidance, see .claude/skills/deep-crawl/configs/exemplar_templates.md.
For quality reference from this repo, see .deep_crawl/findings/calibration/cal_{type}.md.
## Output
Write findings to: [EXACT PATH — e.g., .deep_crawl/findings/traces/01_cli_run.md]
When done, write a sentinel: touch .deep_crawl/batch_status/[TASK_ID].done
## Constraints
- Read-only: never modify source code
- Do NOT spawn sub-agents yourself
- X-Ray output available at output/$REPO_NAME/data/xray.json and output/$REPO_NAME/xray.md for reference
Use run_in_background: true for all sub-agents within a batch to maximize parallelism. Note: If sub-agents run sequentially despite run_in_background: true, each sub-agent still gets a full context window for its investigation task, which is strictly better than sequential investigation in a single context.
Step 3: Monitor completion. After launching a batch, check for sentinel files:
ls .deep_crawl/batch_status/*.done 2>/dev/null | wc -l
When all expected sentinels exist, the batch is complete.
Step 3a: SNAPSHOT (before spawning batch). Record existing findings files so the quality gate only checks new ones. Log the batch spawn event:
# === FINDINGS SNAPSHOT (before spawning batch) ===
ls .deep_crawl/findings/{traces,modules,cross_cutting,conventions,impact,playbooks}/*.md \
2>/dev/null | sort > .deep_crawl/_pre_batch_files.txt
# Log batch spawn
BATCH_AGENT_COUNT=$(echo "$BATCH_TASKS" | wc -w) # set BATCH_TASKS to the task list for this batch
echo "## Batch $BATCH_NUM: Spawning $BATCH_AGENT_COUNT agents" >> "$CRAWL_LOG"
echo " Tasks: $BATCH_TASKS" >> "$CRAWL_LOG"
echo " Time: $(date -Iseconds)" >> "$CRAWL_LOG"
Step 3b: FINDINGS QUALITY GATE (batch-scoped, mandatory after each batch). After confirming all sentinel files exist, check only files produced in this batch:
# === FINDINGS QUALITY GATE (batch-scoped) ===
ls .deep_crawl/findings/{traces,modules,cross_cutting,conventions,impact,playbooks}/*.md \
2>/dev/null | sort > .deep_crawl/_post_batch_files.txt
GATE_PASS=true
while IFS= read -r f; do
WORDS=$(wc -w < "$f")
FACTS=$(grep -c '\[FACT' "$f" 2>/dev/null || echo 0)
if [ "$WORDS" -lt 200 ] || [ "$FACTS" -lt 5 ]; then
echo "FAIL: $(basename $f) — ${WORDS}w, ${FACTS} FACT (min: 200w, 5 FACT)"
GATE_PASS=false
fi
done < <(comm -13 .deep_crawl/_pre_batch_files.txt .deep_crawl/_post_batch_files.txt)
# Log batch completion and gate result
echo " Completed: $(date -Iseconds)" >> "$CRAWL_LOG"
echo "## Findings Gate (Batch $BATCH_NUM): $([ "$GATE_PASS" = true ] && echo PASS || echo FAIL)" >> "$CRAWL_LOG"
BATCH_PASSED=0; BATCH_TOTAL=0
while IFS= read -r f; do
AGENT_ID=$(basename "$f" .md)
WORDS=$(wc -w < "$f"); FACTS=$(grep -c '\[FACT' "$f" 2>/dev/null || echo 0)
echo " $(basename $f): ${WORDS}w, ${FACTS} FACT" >> "$CRAWL_LOG"
BATCH_TOTAL=$((BATCH_TOTAL + 1))
if [ "$WORDS" -ge 200 ] && [ "$FACTS" -ge 5 ]; then
echo "$(date -Iseconds) GATE $AGENT_ID PASS words=$WORDS facts=$FACTS" >> "$ORCH_LOG"
BATCH_PASSED=$((BATCH_PASSED + 1))
else
echo "$(date -Iseconds) GATE $AGENT_ID FAIL words=$WORDS facts=$FACTS" >> "$ORCH_LOG"
fi
done < <(comm -13 .deep_crawl/_pre_batch_files.txt .deep_crawl/_post_batch_files.txt)
echo "$(date -Iseconds) BATCH_DONE batch${BATCH_NUM} passed=${BATCH_PASSED}/${BATCH_TOTAL}" >> "$ORCH_LOG"
echo "" >> "$CRAWL_LOG"
# If any file fails: re-spawn the failing sub-agent with corrective instructions.
# Do NOT proceed to next batch until all findings files pass.
Thresholds are defined in .claude/skills/deep-crawl/configs/quality_gates.json under investigation_findings. If a file fails, log the re-spawn event and re-spawn the sub-agent:
echo " RE-SPAWN: $(basename $f .md) — reason: findings gate failed (${WORDS}w < 200w min, ${FACTS} < 5 FACT min)" >> "$CRAWL_LOG"
echo "$(date -Iseconds) RESPAWN $(basename $f .md) reason=\"density ${WORDS}w/${FACTS}FACT below floor\"" >> "$ORCH_LOG"
Re-spawn with: "Your output had {WORDS}w and {FACTS} [FACT] citations. Minimum is 200w and 5 [FACT]. Investigate deeper — read more source files, trace more hops, grep for more patterns."
Step 4: Checkpoint (mandatory). After each batch completes and passes the findings quality gate, update CRAWL_PLAN.md to mark completed tasks with [x]. Sub-agents never write to CRAWL_PLAN.md (concurrent writes would corrupt it).
# Mark completed tasks for batch N — the orchestrator does this, never sub-agents
# Replace P1/P2/P3/P4 with the actual priority prefix for the completed batch
for sentinel in .deep_crawl/batch_status/*.done; do
TASK_ID=$(basename "$sentinel" .done)
sed -i "s/^- \[ \] ${TASK_ID}/- [x] ${TASK_ID}/" .deep_crawl/CRAWL_PLAN.md
done
COMPLETED=$(grep -c '^\- \[x\]' .deep_crawl/CRAWL_PLAN.md)
echo "Checkpoint: $COMPLETED tasks completed"
echo "## Checkpoint (Batch $BATCH_NUM): $COMPLETED total completed" >> "$CRAWL_LOG"
Step 5: Handle failures. After a batch completes:
Step 6: Coverage check. After all batches complete, verify the stopping criteria and coverage checks below. If coverage is insufficient, spawn Batch 5 with targeted gap-filling tasks.
If the Agent tool is unavailable or all Batch 1 spawns fail, execute the crawl plan sequentially using the protocols below directly. This produces shallower results but ensures the pipeline completes. Log a warning: "Running in sequential fallback mode — investigation depth will be reduced."
Sub-agents receive these protocol instructions verbatim:
1. Read the entry point function (full source)
2. Identify the first call to another module
3. Read that function (full source)
4. Repeat until terminal side effect (no hop limit — follow the full chain)
5. Record in .deep_crawl/findings/traces/{NN}_{name}.md:
entry_function (file:line)
→ called_function (file:line) — what it does in 1 sentence
→ next_function (file:line) — what it does in 1 sentence
→ [SIDE EFFECT: db.commit] (file:line)
6. Note branching (error paths, conditional logic) at each hop
7. Note data transformations between hops
8. Note gotchas discovered during tracing
9. [ABSENCE] check: at each hop, note expected-but-missing elements — missing error handling, missing validation, missing logging, missing auth checks, missing null checks. Tag each as [ABSENCE] with grep confirmation.
**TypeScript-specific trace guidance:**
When tracing through TypeScript/JavaScript codebases:
- **Async boundaries:** Note where sync→async transitions occur (missing `await`, `.then()` chains). Flag any function that starts a Promise chain without error handling.
- **Middleware chains:** In Express/Fastify/Koa, trace through the middleware stack in order. Note where `next()` is called, where it's skipped (early return), and where error middleware breaks the chain.
- **DI resolution:** In NestJS/Angular, trace from `@Injectable()` declaration through `@Inject()` sites. Note lifecycle hooks (`onModuleInit`, `onApplicationBootstrap`) that run at startup.
- **Barrel file hops:** When a trace passes through an `index.ts` that only re-exports, note this but don't count it as a meaningful hop. The real target is the source module.
- **Type narrowing at branches:** Note where `if (x instanceof Foo)` or discriminated union checks (`if (x.type === 'bar')`) gate code paths — these are the real branching points.
- **Event-driven flows:** When the trace reaches `emit()`, `on()`, or `subscribe()`, follow the event to ALL listeners. Note if multiple handlers exist for the same event.
1. Read the entire module
2. Write to .deep_crawl/findings/modules/{module_name}.md:
- What this module does (1-2 sentences of BEHAVIOR)
- What's non-obvious about it
- What breaks if you change it (blast radius)
- What it depends on at runtime (not just imports)
- For each public function/method:
- 1-sentence behavioral description
- Preconditions
- Side effects
- Error behavior
2b. [ABSENCE] check: for each public function, note expected-but-missing elements — missing docstring on complex method, missing type hints on public API, missing validation on inputs from external boundaries, missing error handling on I/O operations. Tag each as [ABSENCE].
2c. Check for corresponding test files:
- Python: Look for tests/test_{module_name}.py, tests/unit/**/test_{module_name}.py,
tests/{module_name}_test.py, and any test file importing from this module
- TypeScript/JavaScript: Look for {module_name}.test.ts, {module_name}.spec.ts,
__tests__/{module_name}.ts, and in-source tests via `import.meta.vitest` blocks
within the module itself (colocated testing pattern)
- If found, scan test function names and docstrings/descriptions
- Note which public functions have test coverage and which don't
- Add to findings:
Test coverage: {tested_count}/{total_public} public functions tested.
Tested: {list}. Untested: {list}.
Test file: {path} ({N} test functions)
3. Record any gotchas
**TypeScript-specific module investigation:**
When deep-reading a TypeScript module, also document:
- **Exported types:** List all `export interface`, `export type`, `export enum` — these are the module's contract. Note which are used externally (via xray import graph).
- **Decorator metadata:** For classes with decorators (`@Entity`, `@Injectable`, `@Controller`), extract the decorator arguments — they encode runtime configuration that isn't visible from the type signature alone.
- **Generic constraints:** Note generic type parameters with constraints (`T extends SomeBase`) — these restrict what callers can pass and are a common source of type errors when changed.
- **`any` escape hatches:** Count `as any`, `@ts-ignore`, `@ts-expect-error` in the module. Each is a place where the type system has been deliberately bypassed — potential bug hiding spots.
- **Re-export surface:** If the module is a barrel file (mostly re-exports), note what it re-exports and whether it adds any transformation.
1. Grep for relevant patterns across the entire codebase
2. Categorize the results
3. Read representative examples in full — at least max(3, result_count / 5) examples, sampling from different subsystems
4. Identify the dominant strategy
5. Flag deviations from the dominant strategy
6. [ABSENCE] check: for the concern being investigated, grep for expected-but-missing implementations. E.g., for error handling: modules with no try/except around I/O; for config: env vars read without defaults; for auth: routes without auth middleware. Tag each as [ABSENCE] with grep evidence.
7. Write to .deep_crawl/findings/cross_cutting/{concern_name}.md
Grep patterns for common concerns (language-adaptive):
Use $CODE_LANG from Phase 0b pre-flight to select patterns. All grep commands should target $DEEP_CRAWL_ROOT.
Python patterns (use when CODE_LANG=python):
grep -rn "except " --include="*.py" "$DEEP_CRAWL_ROOT" | head -40
grep -rn "except.*pass" --include="*.py" "$DEEP_CRAWL_ROOT"
grep -rn "retry\|backoff\|fallback" --include="*.py" "$DEEP_CRAWL_ROOT"
grep -rn "os.getenv\|os.environ" --include="*.py" "$DEEP_CRAWL_ROOT"
grep -rn "config\[" --include="*.py" "$DEEP_CRAWL_ROOT"
grep -rn "^[a-z_].*= " --include="*.py" "$DEEP_CRAWL_ROOT" | grep -v "def \|class \|#\|import " | head -30
grep -rn "_instance\|_cache\|_registry\|_pool" --include="*.py" "$DEEP_CRAWL_ROOT"
grep -rn "global " --include="*.py" "$DEEP_CRAWL_ROOT"
grep -rn "asyncio\.run\|loop\.run_until_complete" --include="*.py" "$DEEP_CRAWL_ROOT"
grep -rn "async def " --include="*.py" "$DEEP_CRAWL_ROOT" | wc -l
grep -rn "pickle\.loads\|pickle\.load\|yaml\.load\|marshal\.loads" --include="*.py" "$DEEP_CRAWL_ROOT"
grep -rn "class.*Exception\|class.*Error" --include="*.py" "$DEEP_CRAWL_ROOT"
grep -rn "except.*pass\|except:$" --include="*.py" "$DEEP_CRAWL_ROOT"
grep -rn "raise " --include="*.py" "$DEEP_CRAWL_ROOT" | head -40
TypeScript/JavaScript patterns (use when CODE_LANG=typescript):
grep -rn "catch" --include="*.ts" --include="*.tsx" "$DEEP_CRAWL_ROOT" | grep -v node_modules | head -40
grep -rn "catch\s*{}" --include="*.ts" "$DEEP_CRAWL_ROOT" | grep -v node_modules # empty catch — high value
grep -rn "retry\|backoff\|fallback" --include="*.ts" "$DEEP_CRAWL_ROOT" | grep -v node_modules
grep -rn "process\.env" --include="*.ts" "$DEEP_CRAWL_ROOT" | grep -v node_modules | head -30
grep -rn "config\.\|getConfig\|loadConfig" --include="*.ts" "$DEEP_CRAWL_ROOT" | grep -v node_modules | head -20
grep -rn "^let \|^var " --include="*.ts" "$DEEP_CRAWL_ROOT" | grep -v node_modules | head -30
grep -rn "cache\|singleton\|_instance\|getInstance" --include="*.ts" "$DEEP_CRAWL_ROOT" | grep -v node_modules | head -20
grep -rn "async function\|async (" --include="*.ts" "$DEEP_CRAWL_ROOT" | grep -v node_modules | wc -l
grep -rn "new Promise\|Promise\.all" --include="*.ts" "$DEEP_CRAWL_ROOT" | grep -v node_modules | head -20
grep -rn "eval(\|new Function(" --include="*.ts" "$DEEP_CRAWL_ROOT" | grep -v node_modules
grep -rn "JSON\.parse" --include="*.ts" "$DEEP_CRAWL_ROOT" | grep -v node_modules | head -20
grep -rn "as any\|@ts-ignore\|@ts-expect-error" --include="*.ts" "$DEEP_CRAWL_ROOT" | grep -v node_modules | head -20
grep -rn "extends Error" --include="*.ts" "$DEEP_CRAWL_ROOT" | grep -v node_modules
grep -rn "throw " --include="*.ts" "$DEEP_CRAWL_ROOT" | grep -v node_modules | head -40
grep -rn "process\.exit" --include="*.ts" "$DEEP_CRAWL_ROOT" | grep -v node_modules
grep -rn "readFile\|readdir\|createReadStream" --include="*.ts" "$DEEP_CRAWL_ROOT" | grep -v node_modules | head -20
TypeScript deep patterns (run after the basic patterns above when time allows):
# Promise error handling gaps
grep -rn "\.then(" --include="*.ts" "$DEEP_CRAWL_ROOT" | grep -v node_modules | grep -v "\.catch" | head -20 # .then() without .catch()
grep -rn "Promise\.all\b" --include="*.ts" "$DEEP_CRAWL_ROOT" | grep -v node_modules # bulk operations — check error handling
grep -rn "new Promise" --include="*.ts" "$DEEP_CRAWL_ROOT" | grep -v node_modules | head -15 # manual Promise construction — often buggy
# Barrel file analysis (circular import risk)
for f in $(find "$DEEP_CRAWL_ROOT" -name "index.ts" -not -path "*/node_modules/*" 2>/dev/null); do
EXPORTS=$(grep -c "^export" "$f" 2>/dev/null || echo 0)
LOGIC=$(grep -cvE "^export|^import|^$|^//" "$f" 2>/dev/null || echo 0)
if [ "$EXPORTS" -gt 3 ] && [ "$LOGIC" -lt 5 ]; then echo "BARREL: $f ($EXPORTS re-exports)"; fi
done
# Type safety escape hatches (aggregate)
echo "=== Type safety signals ==="
echo "as any: $(grep -rn 'as any' --include='*.ts' "$DEEP_CRAWL_ROOT" | grep -v node_modules | wc -l)"
echo "@ts-ignore: $(grep -rn '@ts-ignore' --include='*.ts' "$DEEP_CRAWL_ROOT" | grep -v node_modules | wc -l)"
echo "@ts-expect-error: $(grep -rn '@ts-expect-error' --include='*.ts' "$DEEP_CRAWL_ROOT" | grep -v node_modules | wc -l)"
echo "non-null assertion (!.): $(grep -rn '\!\.' --include='*.ts' "$DEEP_CRAWL_ROOT" | grep -v node_modules | wc -l)"
# AbortController / cancellation patterns
grep -rn "AbortController\|AbortSignal\|signal:" --include="*.ts" "$DEEP_CRAWL_ROOT" | grep -v node_modules | head -10
# Module augmentation (global type patching — coupling risk)
grep -rn "declare module\|declare global" --include="*.ts" --include="*.d.ts" "$DEEP_CRAWL_ROOT" | grep -v node_modules
# Workspace/monorepo structure
ls "$DEEP_CRAWL_ROOT"/packages/*/package.json 2>/dev/null && echo "MONOREPO: packages/* detected"
ls "$DEEP_CRAWL_ROOT"/apps/*/package.json 2>/dev/null && echo "MONOREPO: apps/* detected"
cat "$DEEP_CRAWL_ROOT/package.json" 2>/dev/null | grep -A5 '"workspaces"' | head -8
Exception taxonomy investigations (Protocol C):
When the crawl plan includes an exception taxonomy task, Protocol C agents should:
except: pass (Python), empty catch {} (TypeScript),
log-and-swallow — these are high-value gotcha candidates1. Read examples of the pattern from xray pillar/hotspot list — at least max(5, pillar_count / 3) examples, covering different architectural layers
2. Identify the common structure
3. State the convention as a directive ("always X", "never Y")
4. Read each flagged deviation
5. Assess: intentional variation or oversight?
6. [ABSENCE] check: for each convention identified, grep for expected adherence in untested modules. Note modules that should follow the convention but don't. Tag as [ABSENCE].
7. Write to .deep_crawl/findings/conventions/patterns.md
**TypeScript-specific conventions to document:**
When documenting conventions for TypeScript codebases, check for these patterns:
- **Import organization:** Are imports grouped (external → internal → types)? Are path aliases used consistently? Are barrel files the standard import target?
- **Null safety style:** Optional chaining (`?.`) vs explicit null checks? Non-null assertions (`!`) — banned or accepted?
- **Error class hierarchy:** One base `AppError` or ad-hoc `throw new Error()`? Error codes or just messages?
- **Async patterns:** Always `async/await` or mixed with `.then()`? Raw Promises or wrapped (e.g., `Result<T, E>` monads)?
- **Type annotation style:** Return types always explicit or inferred? `interface` vs `type` preference? `unknown` vs `any` for untyped data?
- **Component patterns (if React/Vue):** Function components vs class? Props interface naming (`FooProps`)? Hook extraction conventions?
- **Decorator usage (if NestJS/Angular):** Custom decorator conventions? Metadata patterns?
1. Read xray reverse dependency data for the assigned hub module cluster:
- .imports.graph[module]["imported_by"] — list of importer modules
- .imports.distances.hub_modules — sorted by connection count
- .calls.reverse_lookup[function] — {callers, caller_count, module_count, impact_rating}
- .calls.high_impact — pre-filtered high-impact functions
2. For each hub module in the cluster, read 2-3 representative callers (full source)
3. Identify:
- Which callers depend on specific function signatures
- Which callers depend on specific return value shapes
- Which callers depend on specific side effects or state mutations
4. Classify changes as safe vs dangerous:
- Safe: internal refactoring, adding optional parameters, performance changes
- Dangerous: signature changes, return type changes, side effect changes, exception changes
5. [ABSENCE] check: for each hub module, note missing safeguards — callers that don't handle the hub's error cases, importers with no tests covering their usage of the hub, type contracts that aren't enforced at the boundary. Tag as [ABSENCE].
6. Write to .deep_crawl/findings/impact/{cluster_name}.md:
For each hub module:
- Module name and importer count
- High-impact functions with caller counts and impact ratings
- Signature-change consequences (which callers break and how)
- Behavior-change consequences (which callers produce wrong results)
- Safe changes list
- Dangerous changes list with specific blast radius
1. Read Protocol E impact findings for context on hub modules and blast radii
2. Read Protocol B module findings for behavioral detail
3. Read conventions findings for coding patterns to follow
4. Derive scenarios from domain profile's primary_entity + Extension Points findings:
- "Add new {primary_entity}" (always include this scenario)
- "Modify existing {primary_entity} behavior" (always include — traces the change path through existing wiring, identifies what to update vs what to leave alone)
- "Change a data model field" (always include if >=3 domain entities detected — shows the full propagation: model → serialization → consumers → tests → migrations)
- "Modify {top hub module} behavior" (top 3 hub modules by connection count)
- "Add new external dependency" (if >3 external deps detected)
5. For each scenario, build a step-by-step checklist:
- Ordered steps with file:line targets
- What to create/modify at each step
- Validation commands (test commands, grep checks)
- Common mistakes (derived from gotchas and impact analysis)
- [ABSENCE] check: for each step, note missing safeguards — steps where there's no test to verify correctness, no migration script, no rollback path. Tag as [ABSENCE].
6. Write to .deep_crawl/findings/playbooks/{scenario_name}.md
Checkpoint discipline: After each batch completes, update CRAWL_PLAN.md to mark completed tasks. In sequential fallback mode, update after every 5 completed tasks.
When to stop crawling — ALL must be true:
Coverage check — ALL must be true: (Note: This check verifies the Phase 1 coverage pre-check was effective. If gaps remain, the pre-check heuristic needs improvement.)
echo "## Phase 3: ASSEMBLE — $(date -Iseconds)" >> "$CRAWL_LOG"
echo "$(date -Iseconds) PHASE 3 ASSEMBLE" >> "$ORCH_LOG"
Phase 3 delegates assembly to parallel sub-agents, each producing specific template sections from their assigned subset of findings. The orchestrator's role is: spawn agents, monitor sentinels, concatenate section files. Do NOT read findings content yourself.
Phase 4 may merge Tier 3-4 entries that describe the same fact, but must not drop distinct findings. When in doubt, keep both. Phase 4 may NEVER cut Tier 1-2 content.
cat .deep_crawl/findings/calibration/*.md \
.deep_crawl/findings/traces/*.md \
.deep_crawl/findings/modules/*.md \
.deep_crawl/findings/cross_cutting/*.md \
.deep_crawl/findings/conventions/*.md \
.deep_crawl/findings/impact/*.md \
.deep_crawl/findings/playbooks/*.md \
> .deep_crawl/SYNTHESIS_INPUT.md 2>/dev/null
wc -w .deep_crawl/SYNTHESIS_INPUT.md
Record the total word count — you will use it to verify retention in Step 8.
No sub-agent needed. Extract formal gotcha sections from all findings so S5 has them consolidated:
for f in .deep_crawl/findings/{traces,modules,cross_cutting,conventions,impact,playbooks}/*.md; do
gotcha_line=$(grep -n "^## Gotcha\|^### Gotcha" "$f" | head -1 | cut -d: -f1)
if [ -n "$gotcha_line" ]; then
echo "### From: $(basename "$f" .md)"
tail -n +"$gotcha_line" "$f"
echo ""
fi
done > .deep_crawl/sections/gotcha_extracts.md
No sub-agent needed. Mechanically grep trace findings for state transition patterns and generate mermaid state diagrams if found:
# Look for state transition patterns in trace findings
STATE_PATTERNS=$(grep -rn "state\|status\|phase\|stage\|transition\|→.*→" \
.deep_crawl/findings/traces/*.md 2>/dev/null | head -20)
if [ -n "$STATE_PATTERNS" ]; then
echo "## State Diagrams" > .deep_crawl/sections/state_diagrams.md
echo "" >> .deep_crawl/sections/state_diagrams.md
echo "State transitions extracted from request traces:" >> .deep_crawl/sections/state_diagrams.md
echo "" >> .deep_crawl/sections/state_diagrams.md
echo '```mermaid' >> .deep_crawl/sections/state_diagrams.md
echo "stateDiagram-v2" >> .deep_crawl/sections/state_diagrams.md
# Assembly agent S1 will populate the actual transitions
echo " [*] --> TODO_POPULATE_FROM_TRACES" >> .deep_crawl/sections/state_diagrams.md
echo '```' >> .deep_crawl/sections/state_diagrams.md
echo "State patterns found — S1 assembly agent will include in Critical Paths section header."
else
echo "No state transition patterns found in traces."
touch .deep_crawl/sections/state_diagrams.md # empty file
fi
If state_diagrams.md has content, assembly agent S1 includes it at the top of the Critical Paths section.
Sub-agents spawn with fresh context and read CLAUDE.md files automatically. If the project's CLAUDE.md references "compressed intelligence" or similar framing, sub-agents absorb "my job is compression" before reading their task prompt. Temporarily rename CLAUDE.md files to prevent this:
# Rename project CLAUDE.md (and parent directory's if it exists)
ROOT_PATH="${DEEP_CRAWL_ROOT:-$(pwd)}"
[ -f "$ROOT_PATH/CLAUDE.md" ] && mv "$ROOT_PATH/CLAUDE.md" "$ROOT_PATH/CLAUDE.md.assembly_save"
PARENT_PATH=$(dirname "$ROOT_PATH")
[ -f "$PARENT_PATH/CLAUDE.md" ] && mv "$PARENT_PATH/CLAUDE.md" "$PARENT_PATH/CLAUDE.md.assembly_save"
IMPORTANT: Restore these files after all sub-agents complete (after Step 9). See Step 9 completion.
Before spawning assembly agents, mechanically generate a skeleton of prescribed ### headers from investigation findings. This skeleton locks section structure — assembly agents fill content under prescribed headers but cannot merge or remove them.
# === STRUCTURAL SKELETON GENERATION ===
SKEL=".deep_crawl/sections/_skeleton.md"
echo "# Assembly Skeleton — Prescribed Section Structure" > $SKEL
echo "# Assembly agents MUST produce every ### listed below for their section." >> $SKEL
echo "" >> $SKEL
# S1: Critical Paths — one ### per trace finding
echo "## S1: Critical Paths" >> $SKEL
N=1
for f in .deep_crawl/findings/traces/*.md; do
[ -f "$f" ] || continue
[[ "$(basename "$f")" == "00_calibration.md" ]] && continue
TITLE=$(head -1 "$f" | sed 's/^#* *//' | cut -c1-80)
echo "### Path $N: $TITLE" >> $SKEL
N=$((N+1))
done
echo "" >> $SKEL
# S2: Module Behavioral Index — one ### per module finding
echo "## S2: Module Behavioral Index" >> $SKEL
for f in .deep_crawl/findings/modules/*.md; do
[ -f "$f" ] || continue
[[ "$(basename "$f")" == "00_calibration.md" ]] && continue
# Extract module path from first line or filename
MOD=$(head -3 "$f" | grep -oP '`[^`]+\.py`' | head -1)
[ -z "$MOD" ] && MOD="\`$(basename "$f" .md).py\`"
echo "### $MOD -- Detailed Behavioral Analysis" >> $SKEL
done
echo "" >> $SKEL
# S3b: Error Handling — one ### per strategy/pattern in cross-cutting findings
echo "## S3b: Error Handling" >> $SKEL
if [ -f .deep_crawl/findings/cross_cutting/error_handling.md ]; then
grep '^## \|^### ' .deep_crawl/findings/cross_cutting/error_handling.md \
| sed 's/^## /### /' | sed 's/^### ### /### /' >> $SKEL
fi
echo "" >> $SKEL
# S5: Gotchas — domain cluster headers from module directory groupings
echo "## S5: Gotchas" >> $SKEL
ls .deep_crawl/findings/modules/*.md 2>/dev/null \
| xargs -I{} head -3 {} \
| grep -oP '`[^`/]+/' \
| sort -u | tr -d '`' \
| while read -r pkg; do
# Convert package dir to readable cluster name
CLUSTER=$(echo "$pkg" | sed 's|/$||' | sed 's|_| |g' | sed 's|\b\(.\)|\u\1|g')
echo "### ${CLUSTER} Gotchas" >> $SKEL
done
# Always include a cross-cutting cluster
echo "### Cross-Cutting Gotchas" >> $SKEL
echo "" >> $SKEL
echo "Skeleton generated: $(grep -c '^### ' $SKEL) subsections prescribed"
The skeleton is informational, not rigid for every section. S1 and S2 MUST follow it exactly (one ### per finding file). S3b and S5 use it as guidance — they may adjust ### headers based on content but should maintain similar granularity.
test -f .deep_crawl/sections/_skeleton.md || { echo "HALT: Skeleton not generated. Run Step 1c first."; exit 1; }
SKEL_H3=$(grep -c "^### " .deep_crawl/sections/_skeleton.md)
echo "Skeleton has $SKEL_H3 prescribed subsections (min: 25)"
[ "$SKEL_H3" -lt 25 ] && echo "WARN: Skeleton has fewer than 25 subsections — investigation may be shallow"
Do NOT proceed to Step 2 if the skeleton is missing. This prevents assembly agents from producing unstructured output.
Before spawning each assembly agent, apply the agent logging protocol: write the prompt to .deep_crawl/agent_logs/prompts/S{N}.md and log the SPAWN event. After each assembly agent completes, write the return text to .deep_crawl/agent_logs/results/S{N}.md and log the DONE event with output file stats.
Launch 5 assembly sub-agents simultaneously (all with run_in_background: true):
| Agent | Sections | Input Files | Est. Input |
|---|---|---|---|
| S1 | Critical Paths | findings/traces/*.md | ~11K words |
| S2 | Module Behavioral Index, Domain Glossary | findings/modules/*.md | ~25K words (use chunked writes — see protocol below) |
| S3a | Key Interfaces | findings/modules/*.md (public API extraction) + findings/cross_cutting/agent_communication.md | ~12K words |
| S3b | Error Handling, Shared State | findings/cross_cutting/{error_handling,initialization,shared_state,database_storage,async_boundaries,exception_taxonomy}.md + findings/modules/*.md (grep for error/exception/retry patterns) | ~16K words |
| S4 | Configuration Surface, Conventions | findings/cross_cutting/{configuration,env_dependencies,llm_providers}.md + findings/modules/config.md + findings/conventions/*.md | ~16K words |
Note: The file lists above are illustrative examples. Adjust filenames to match actual findings on disk. Use ls .deep_crawl/findings/{category}/ to discover actual filenames before constructing prompts.
Each sub-agent prompt must be self-contained:
You are assembling investigation findings into onboarding document sections. Include every finding — do not summarize, condense, or compress.
## Your Task
Produce these template sections: [SECTION NAMES]
## Input
Read ONLY these files: [FILE PATHS]
## Template Format
[Copy the relevant section templates verbatim from .claude/skills/deep-crawl/templates/DEEP_ONBOARD.md.template]
## Structural Contract
Read .deep_crawl/sections/_skeleton.md for your section's prescribed ### headers.
- For Module Behavioral Index and Critical Paths: each prescribed ### MUST appear in your output exactly as listed. Each findings file maps to one ###. Do not merge.
- For Error Handling and Gotchas: use the skeleton's ### headers as guidance. Maintain similar granularity but adjust header wording based on actual content.
- For other sections: no skeleton constraint — derive structure from content.
- You MAY add additional ### subsections beyond the skeleton's prescriptions.
## Section Hierarchy Rule
Your output MUST use exactly one `## ` header per template section assigned to you (e.g., `## Error Handling Strategy`, `## Module Behavioral Index`). All subdivisions within a section MUST use `### ` or deeper. Never promote subsection content to `## ` level — a flat document with many `## ` headers destroys navigability. If your template section has subsections, they are `### `. If those subsections have further divisions, they are `#### `.
## S2 Historical Risk Annotation
If you are S2 (Module Behavioral Index): for each module ### being assembled, check if it appears in `git.risk`, `git.function_churn`, or `git.velocity` from `output/$REPO_NAME/data/xray.json`. If it does, append a brief **Historical Risk** note at the end of that module's subsection with: risk score, most-volatile functions, and trend direction. Format: `> **Git risk:** 0.88 — volatile functions: load_config (8 commits, 2 hotfixes). Trend: stable.`
## S3b Depth Directive
If you are S3b (Error Handling, Shared State): the Error Handling Strategy section must document the dominant error pattern, per-subsystem deviations, retry strategies, exception hierarchies, and recovery paths. Target: >= 3,500 words for Error Handling alone. Read module findings for error/exception/retry patterns in addition to cross-cutting findings — every module's error handling deviations contribute to this section. If `findings/cross_cutting/exception_taxonomy.md` exists, include it as a subsection under Error Handling: inheritance tree of custom exceptions, raise/catch mapping, uncaught paths, and silent failure patterns (`except: pass`, bare except, log-and-swallow). Silent failures are high-value gotcha candidates — flag them in your gotchas output file.
## Rules
- INCLUDE EVERY FINDING. The output context window is 1M tokens. Your section will consume less than 5% of available context. There is no reason to drop, summarize, or condense any finding.
- The ONLY reason to exclude a finding is if it is literally stated elsewhere in your section or is derivable from the module's file name alone.
- Place each finding at full fidelity with its original [FACT], [PATTERN], or [ABSENCE] tag and file:line citation.
- When multiple findings describe the same code from different angles, include ALL of them — each angle provides context the others miss.
- Write conventions as directives — "Always X", "Never Y".
- Your output should be 80-100% of your input word count. If you produce less than 80%, you have dropped findings. Re-read your input and find what you missed.
## Quality Floor (mechanically checked after you finish)
- Citation density floor varies by section type — see quality_gates.json density_tiers.
High-evidence sections (impact, gotchas, contracts): 3.0/100w.
Medium (module index, interfaces, playbooks): 2.0/100w.
Narrative (critical paths, conventions): 1.0/100w.
Structural (glossary, reading order): no floor.
- Word count: >= 80% of your input findings word count
If your section fails, you will be re-spawned.
For format guidance, see .claude/skills/deep-crawl/configs/exemplar_templates.md.
For quality reference from this repo, see .deep_crawl/findings/calibration/cal_{type}.md.
## Secondary Output
Collect every gotcha/warning/danger note encountered → write to:
.deep_crawl/sections/gotchas_from_S{N}.md
## Output
Write each section to its own file: .deep_crawl/sections/{section_name}.md
(e.g., critical_paths.md, module_index.md, domain_glossary.md)
Sentinel: touch .deep_crawl/sections/S{N}.done
## Large Output — Chunked Write Protocol
If your assembled section exceeds ~25,000 words, you MUST write it in chunks to avoid
hitting output token limits:
1. Write the first ~20,000 words to `.deep_crawl/sections/{section_name}.md` (creates the file)
2. Write each subsequent ~20,000 word chunk by reading back the current file content's
last few lines, then appending via a Bash command:
`cat >> .deep_crawl/sections/{section_name}.md << 'CHUNK_EOF'`
followed by the chunk content and `CHUNK_EOF`
3. After all chunks are written, touch the sentinel file
This is critical for S2 (Module Behavioral Index) which often exceeds 25K words on large repos.
Do NOT attempt to write the entire section in a single Write call if it exceeds 25K words.
## Execution Log (assembly agents only)
Before your FINAL CHECK, write a brief execution log to `.deep_crawl/agent_logs/results/S{N}.md`:
- Input: {count} files read, {total_words} words of input
- Output: {list of files written with word counts}
- Sections produced: {list}
- Key structural decisions (e.g., "merged 3 small modules into one ### subsection")
- Issues encountered (e.g., "finding file X was empty", "skeleton prescribed ### for module not in findings")
## FINAL CHECK (do this before writing sentinel)
Before touching your sentinel file, verify your output:
1. Run: `grep -c "^## " <your_output_file>` — MUST be exactly 1 per section you wrote
2. Run: `grep -c "^# " <your_output_file>` — MUST be exactly 0
If either check fails, fix your headers before writing the sentinel.
A section with multiple `## ` headers or any `# ` headers will be rejected by the quality gate.
## Constraints
- Read-only: never modify source code
- Do NOT spawn sub-agents yourself
Check for sentinel files. All 5 must complete before proceeding:
ls .deep_crawl/sections/S{1,2,3a,3b,4}.done 2>/dev/null | wc -l # expect 5
S2 output-limit recovery: If S2 fails with "exceeded output token maximum" or produces no sentinel after other agents complete, the module index is too large for a single agent. Split into S2a + S2b:
findings/modules/*.md files (alphabetically) → module_index.mdmodule_index.md via cat >> .deep_crawl/sections/module_index.mddomain_glossary.md (small, derived from all module findings)"Append your output to the existing module_index.md file using: cat >> .deep_crawl/sections/module_index.md"
Re-spawn with run_in_background: true. Wait for both S2a.done and S2b.done.Before spawning S5 and S6, apply the agent logging protocol: write prompts to .deep_crawl/agent_logs/prompts/S5.md and .deep_crawl/agent_logs/prompts/S6.md, log SPAWN events. After completion, write return text to results and log DONE events.
Launch S5 and S6 simultaneously (both with run_in_background: true):
| Agent | Sections | Input Files | Est. Input |
|---|---|---|---|
| S5 | Gotchas, Hazards, Extension Points, Reading Order | sections/gotcha_extracts.md + sections/gotchas_from_S{1,2,3a,3b,4}.md + CRAWL_PLAN.md (for structure) | ~14K words |
| S6 | Change Impact Index, Data Contracts, Change Playbooks | findings/impact/*.md + findings/playbooks/*.md + findings/modules/*.md (grep for Pydantic/dataclass) + sections/state_diagrams.md | ~15K words (use chunked writes if output exceeds 25K words) |
S5's prompt follows the same template as S1-S4, with these additions for the Gotchas section:
Organize gotchas into domain-cluster ### subsections derived from the investigation's subsystem structure (e.g., "Agent System", "Data Models", "Execution Engine", "LLM Integration", "Configuration"). Derive cluster names from the module findings directory — group findings/modules/*.md by top-level package directory. Within each cluster, order entries by severity (critical first). Prefix each entry with severity tag: [CRITICAL], [HIGH], or [MEDIUM]. Target: one ### per 5-15 gotchas. Never put more than 15 gotchas under a single ### heading.
Severity calibration (mandatory): Read .claude/skills/deep-crawl/configs/quality_gates.json field gotcha_severity_criteria before assigning severity tags. Apply these rules strictly:
After tagging all gotchas, run a self-check: count CRITICAL gotchas. If CRITICAL count > total_gotchas * 0.05, review each CRITICAL and demote any that are "potential problem" rather than "active problem" to HIGH. Log: Severity calibration: {N} CRITICAL, {M} HIGH, {K} MEDIUM ({N/total}% CRITICAL).
Root-cause clustering (mandatory): After severity tagging, identify gotchas that share a common root cause — same design decision, same pattern, same underlying assumption. For each cluster of >=2 related gotchas:
*Root cause: {1-sentence description}* after the cluster's ### heading(related: #{N} — same root cause)This surfaces patterns like "these 5 gotchas all stem from the dual Pydantic/SQLAlchemy model pattern" rather than treating each as independent.
S5's template sections remain: Gotchas, Hazards — Do Not Read, Extension Points, and Reading Order from DEEP_ONBOARD.md.template.
TypeScript-specific gotcha categories (ensure these are checked during gotcha clustering):
When organizing gotchas for TypeScript projects, ensure these categories are represented if evidence exists:
as any casts, @ts-ignore directives, non-null assertions (!) hiding potential null access.catch() on Promise chains, unhandled rejection paths, try/catch not wrapping awaitindex.ts) creating import cycles, runtime undefined from circular dependenciesconst enum inlining differences, path alias resolution mismatches between IDE and runtimelet used as cache without invalidation, singleton services with mutable state, React useState without cleanupJSON.parse() returning any, API response types asserted but not validated at runtime, process.env values used without undefined checksS6's prompt for Data Contracts:
investigation_targets.domain_entities.ts_interfaces and ts_type_aliases data.Field-level deep trace (mandatory for top contracts): For the top 3-5 contracts by importer count (from xray import graph), produce a Cross-Boundary Flow Analysis subsection with field-level specificity:
to_dict() emits 8)[FACT] citations at each boundary: created at {file}:{line}, transformed at {file}:{line}, consumed at {file}:{line}This depth is what allows a downstream agent to implement a new consumer without reading source. Summary-level tables ("Model X has fields a, b, c") are insufficient — the value is in the boundary transformations.
S6's prompt for Change Impact Index: organize impact findings by hub module cluster. Each cluster gets a table showing importers, high-impact functions, signature-change consequences, behavior-change consequences, safe vs dangerous changes. Also read git.coupling_clusters from output/$REPO_NAME/data/xray.json. For each cluster, add a Hidden Coupling row to the relevant hub module's impact table. Format: When modifying {file_a}, also verify {file_b} — {count} historical co-changes with no import relationship.
S6's prompt for Change Playbooks: organize playbook findings into step-by-step checklists with validation commands and common mistakes. Protocol F now produces both "add new" and "modify existing" playbooks — ensure both types are assembled. Modification playbooks are especially valuable because they trace through existing wiring (what to change, what to leave alone, what breaks silently) rather than creating from scratch.
S6's prompt must include this Playbook Quality Floor (mechanically checked after you finish):
## Playbook Quality Floor (each playbook individually)
- Minimum 800 words
- Minimum 30 [FACT] citations
- Minimum 8 common mistakes with behavioral explanations
- Citation density: >= 3.0 [FACT] per 100 words
If any playbook fails, you will be re-spawned.
For format guidance, see .claude/skills/deep-crawl/configs/exemplar_templates.md.
For quality reference from this repo, see .deep_crawl/findings/calibration/cal_{type}.md.
Thresholds are defined in .claude/skills/deep-crawl/configs/quality_gates.json under playbooks.
S6's prompt for Change Playbooks quality checks:
ls .deep_crawl/sections/S{5,6}.done 2>/dev/null | wc -l # expect 2
After all assembly sub-agents (S1-S6) complete, run this mechanical check. The orchestrator MUST NOT write section content itself — if a section is missing or below quality, re-spawn the appropriate assembly sub-agent. The only sections the orchestrator writes directly are the small metadata sections listed in Step 6.
# === SECTION QUALITY GATE (mandatory before concatenation) ===
# (a) Existence check
MISSING=""
for section in critical_paths module_index change_impact_index \
key_interfaces data_contracts error_handling shared_state \
config_surface conventions gotchas hazards extension_points; do
[ -f ".deep_crawl/sections/${section}.md" ] || MISSING="$MISSING $section"
done
[ -n "$MISSING" ] && echo "GATE FAIL — missing sections:$MISSING"
# (b) Citation density check — tiered by section type
get_density_floor() {
case "$1" in
change_impact_index|gotchas|data_contracts) echo 3 ;;
module_index|key_interfaces|shared_state|change_playbooks|error_handling|hazards) echo 2 ;;
critical_paths|config_surface|conventions|extension_points) echo 1 ;;
*) echo 0 ;; # domain_glossary, reading_order, environment_bootstrap, etc.
esac
}
for f in .deep_crawl/sections/*.md; do
[ -f "$f" ] || continue
SECTION=$(basename "$f" .md)
WORDS=$(wc -w < "$f")
FACTS=$(grep -c '\[FACT' "$f" 2>/dev/null || echo 0)
[ "$WORDS" -eq 0 ] && continue
FLOOR=$(get_density_floor "$SECTION")
[ "$FLOOR" -eq 0 ] && continue # no density requirement for structural sections
DENSITY=$((FACTS * 100 / WORDS))
[ "$DENSITY" -lt "$FLOOR" ] && echo "FAIL: $SECTION — density ${DENSITY}/100w (floor: ${FLOOR})"
done
# Log section quality gate results
echo "## Section Quality Gate" >> "$CRAWL_LOG"
[ -n "$MISSING" ] && echo " Missing sections:$MISSING" >> "$CRAWL_LOG"
for f in .deep_crawl/sections/*.md; do
[ -f "$f" ] || continue
SECTION=$(basename "$f" .md)
[[ "$SECTION" == _* || "$SECTION" == S* || "$SECTION" == gotchas_from_* || "$SECTION" == header || "$SECTION" == footer || "$SECTION" == document_map ]] && continue
WORDS=$(wc -w < "$f"); FACTS=$(grep -c '\[FACT' "$f" 2>/dev/null || echo 0)
echo " $SECTION: ${WORDS}w, ${FACTS} FACT" >> "$CRAWL_LOG"
FLOOR=$(get_density_floor "$SECTION")
if [ "$FLOOR" -gt 0 ] && [ "$WORDS" -gt 0 ]; then
DENSITY=$((FACTS * 100 / WORDS))
if [ "$DENSITY" -ge "$FLOOR" ]; then
echo "$(date -Iseconds) GATE $SECTION PASS words=$WORDS facts=$FACTS density=${DENSITY}/100w" >> "$ORCH_LOG"
else
echo "$(date -Iseconds) GATE $SECTION FAIL words=$WORDS facts=$FACTS density=${DENSITY}/100w floor=${FLOOR}" >> "$ORCH_LOG"
fi
fi
done
echo "" >> "$CRAWL_LOG"
Thresholds are defined in .claude/skills/deep-crawl/configs/quality_gates.json under assembly_sections.density_tiers.
If missing sections found: spawn the appropriate assembly sub-agent (S1-S6 based on which section is missing).
If density fails: re-spawn the section agent with "Your section has {DENSITY} [FACT]/100w. Floor for {TIER} sections is {FLOOR}. Re-assemble with more citations from your findings input."
# === PLAYBOOK QUALITY GATE (mandatory after S6 — per playbook, not aggregate) ===
if [ -f .deep_crawl/sections/change_playbooks.md ]; then
# Split by ### headings into individual playbook files
mkdir -p .deep_crawl/_pb_check
csplit -z .deep_crawl/sections/change_playbooks.md \
'/^### /' '{*}' \
--prefix=.deep_crawl/_pb_check/pb_ \
--suffix-format='%03d.md' 2>/dev/null
PB_GATE=true
for pb in .deep_crawl/_pb_check/pb_*.md; do
[ -s "$pb" ] || continue
TITLE=$(head -1 "$pb" | sed 's/^### //')
WORDS=$(wc -w < "$pb")
FACTS=$(grep -c '\[FACT' "$pb" 2>/dev/null || echo 0)
MISTAKES=$(grep -ci 'common mistake\|^\*\*[0-9]' "$pb" 2>/dev/null || echo 0)
if [ "$WORDS" -lt 800 ] || [ "$FACTS" -lt 30 ] || [ "$MISTAKES" -lt 8 ]; then
echo "PLAYBOOK FAIL: '$TITLE' — ${WORDS}w, ${FACTS} FACT, ${MISTAKES} mistakes"
echo " (floor: 800w, 30 FACT, 8 mistakes)"
PB_GATE=false
fi
done
# Log playbook quality gate results
echo "## Playbook Quality Gate: $([ "$PB_GATE" = true ] && echo PASS || echo FAIL)" >> "$CRAWL_LOG"
for pb in .deep_crawl/_pb_check/pb_*.md; do
[ -s "$pb" ] || continue
TITLE=$(head -1 "$pb" | sed 's/^### //')
WORDS=$(wc -w < "$pb"); FACTS=$(grep -c '\[FACT' "$pb" 2>/dev/null || echo 0)
echo " $TITLE: ${WORDS}w, ${FACTS} FACT" >> "$CRAWL_LOG"
done
echo "" >> "$CRAWL_LOG"
rm -rf .deep_crawl/_pb_check
if [ "$PB_GATE" = false ]; then
echo "Re-spawn S6 with corrective instructions for failing playbooks."
echo " RE-SPAWN: S6 — reason: playbook quality gate failed" >> "$CRAWL_LOG"
fi
fi
Playbook thresholds are defined in .claude/skills/deep-crawl/configs/quality_gates.json under playbooks.
# === SEVERITY CALIBRATION GATE (mandatory after S5 — prevents CRITICAL inflation) ===
if [ -f .deep_crawl/sections/gotchas.md ]; then
TOTAL_GOTCHAS=$(grep -c '^\([0-9]\+\.\|[0-9]\+)\) \[' .deep_crawl/sections/gotchas.md 2>/dev/null || echo 0)
CRITICAL_COUNT=$(grep -c '\[CRITICAL\]' .deep_crawl/sections/gotchas.md 2>/dev/null || echo 0)
HIGH_COUNT=$(grep -c '\[HIGH\]' .deep_crawl/sections/gotchas.md 2>/dev/null || echo 0)
MEDIUM_COUNT=$(grep -c '\[MEDIUM\]' .deep_crawl/sections/gotchas.md 2>/dev/null || echo 0)
echo "Severity distribution: $CRITICAL_COUNT CRITICAL, $HIGH_COUNT HIGH, $MEDIUM_COUNT MEDIUM ($TOTAL_GOTCHAS total)"
# Log severity gate results
echo "## Severity Gate" >> "$CRAWL_LOG"
echo " Distribution: $CRITICAL_COUNT CRITICAL, $HIGH_COUNT HIGH, $MEDIUM_COUNT MEDIUM ($TOTAL_GOTCHAS total)" >> "$CRAWL_LOG"
if [ "$TOTAL_GOTCHAS" -gt 0 ]; then
CRIT_PCT=$((CRITICAL_COUNT * 100 / TOTAL_GOTCHAS))
if [ "$CRIT_PCT" -gt 5 ]; then
echo "SEVERITY WARN: ${CRIT_PCT}% CRITICAL exceeds 5% ceiling. Re-spawn S5 with:"
echo " 'Review each [CRITICAL] against quality_gates.json gotcha_severity_criteria."
echo " CRITICAL = active production problem NOW. Demote potential-but-dormant issues to HIGH.'"
echo " RE-SPAWN: S5 — reason: severity gate failed (${CRIT_PCT}% CRITICAL > 5% ceiling)" >> "$CRAWL_LOG"
fi
fi
echo "" >> "$CRAWL_LOG"
fi
Severity thresholds are defined in .claude/skills/deep-crawl/configs/quality_gates.json under gotcha_severity_criteria.
Mechanically verify every completed investigation task has content in at least one assembled section file:
# === TRACEABILITY GATE ===
TRACE_FAILS=0
while IFS= read -r task; do
KEY=$(echo "$task" | grep -oP '`[^`]+`' | head -1 | tr -d '`')
[ -z "$KEY" ] && continue
if ! grep -ql "$KEY" .deep_crawl/sections/*.md 2>/dev/null; then
echo "TRACE FAIL: $KEY not found in assembled sections"
# Identify the source finding
FINDING=$(grep -rl "$KEY" .deep_crawl/findings/*/*.md 2>/dev/null | head -1)
[ -n "$FINDING" ] && echo " Source: $FINDING"
TRACE_FAILS=$((TRACE_FAILS + 1))
fi
done < <(grep '^\- \[x\]' .deep_crawl/CRAWL_PLAN.md)
echo "Traceability: $TRACE_FAILS tasks missing from output"
# Log traceability gate results
echo "## Traceability Gate" >> "$CRAWL_LOG"
echo " Failures: $TRACE_FAILS" >> "$CRAWL_LOG"
echo "" >> "$CRAWL_LOG"
If TRACE_FAILS > 0:
findings/traces/ → S1, findings/modules/ → S2, findings/cross_cutting/agent_communication* → S3a, findings/cross_cutting/{error,init,shared,database,async}* → S3b, findings/conventions/ or findings/cross_cutting/{config,env,llm}* → S4Log: Step 5c: {total} tasks traced, {TRACE_FAILS} gaps, {recovered} recovered
The orchestrator writes these directly (no sub-agent needed — they are small, <500 words total):
config_surface.md + cross-cutting findings for external systems. Produce bootstrap checklist with:
pytest, make test, or the project's health check).sections/environment_bootstrap.md.Write to .deep_crawl/sections/header.md, .deep_crawl/sections/identity.md, .deep_crawl/sections/gaps.md, .deep_crawl/sections/environment_bootstrap.md, .deep_crawl/sections/footer.md.
Mechanically normalize all section files to enforce the hierarchy rule: each section file gets exactly one ## header. This prevents raw investigation headers from leaking into the final document (the exact failure mode that caused n8n's 68 ## headers).
# === HEADER NORMALIZATION ===
for f in .deep_crawl/sections/*.md; do
[ -f "$f" ] || continue
SECTION=$(basename "$f" .md)
# Skip non-content files
[[ "$SECTION" == _* || "$SECTION" == S* || "$SECTION" == gotchas_from_* ]] && continue
H2_COUNT=$(grep -c "^## " "$f" 2>/dev/null || echo 0)
H1_COUNT=$(grep -c "^# " "$f" 2>/dev/null || echo 0)
if [ "$H1_COUNT" -gt 0 ]; then
echo "NORMALIZE: $SECTION has $H1_COUNT h1 headers — demoting all # to ##, ## to ###, ### to ####"
sed -i 's/^#### /##### /g; s/^### /#### /g; s/^## /### /g; s/^# /## /g' "$f"
elif [ "$H2_COUNT" -gt 1 ]; then
echo "WARN: $SECTION has $H2_COUNT h2 headers (expected 1) — demoting extra ## to ###"
# Keep first ## as-is, demote the rest
awk 'BEGIN{seen=0} /^## /{seen++; if(seen>1){sub(/^## /,"### ")}} {print}' "$f" > "${f}.tmp" && mv "${f}.tmp" "$f"
fi
done
# Verify: each section file should now have exactly 1 ## header
VIOLATIONS=0
for f in .deep_crawl/sections/*.md; do
[ -f "$f" ] || continue
SECTION=$(basename "$f" .md)
[[ "$SECTION" == _* || "$SECTION" == S* || "$SECTION" == gotchas_from_* || "$SECTION" == header || "$SECTION" == footer ]] && continue
H2=$(grep -c "^## " "$f" 2>/dev/null || echo 0)
[ "$H2" -ne 1 ] && echo "HEADER VIOLATION: $SECTION has $H2 ## headers (expected 1)" && VIOLATIONS=$((VIOLATIONS + 1))
done
echo "Header normalization: $VIOLATIONS violations remaining"
[ "$VIOLATIONS" -gt 0 ] && echo "FIX REQUIRED: Manually inspect and correct violating section files before proceeding."
# Log header normalization to orchestrator.log
for f in .deep_crawl/sections/*.md; do
[ -f "$f" ] || continue
SECTION=$(basename "$f" .md)
[[ "$SECTION" == _* || "$SECTION" == S* || "$SECTION" == gotchas_from_* || "$SECTION" == header || "$SECTION" == footer ]] && continue
H1=$(grep -c "^# " "$f" 2>/dev/null || echo 0)
H2=$(grep -c "^## " "$f" 2>/dev/null || echo 0)
RESULT="PASS"
[ "$H2" -ne 1 ] && RESULT="FAIL"
echo "$(date -Iseconds) HEADER_NORM $SECTION h1=$H1 h2=$H2 $RESULT" >> "$ORCH_LOG"
done
Generate a navigational Document Map section from the actual sections on disk:
echo "## Document Map" > .deep_crawl/sections/document_map.md
echo "" >> .deep_crawl/sections/document_map.md
echo "| Section | What It Covers |" >> .deep_crawl/sections/document_map.md
echo "|---------|---------------|" >> .deep_crawl/sections/document_map.md
for f in critical_paths module_index change_impact_index key_interfaces \
data_contracts error_handling shared_state domain_glossary config_surface \
conventions gotchas hazards extension_points change_playbooks reading_order \
environment_bootstrap; do
[ -f ".deep_crawl/sections/${f}.md" ] && \
echo "| $(head -1 .deep_crawl/sections/${f}.md | sed 's/^## //; s/ (see.*//') | $(wc -w < .deep_crawl/sections/${f}.md) words |"
done >> .deep_crawl/sections/document_map.md
Concatenate all section files in template order:
cat .deep_crawl/sections/header.md \
.deep_crawl/sections/document_map.md \
.deep_crawl/sections/identity.md \
.deep_crawl/sections/critical_paths.md \
.deep_crawl/sections/module_index.md \
.deep_crawl/sections/change_impact_index.md \
.deep_crawl/sections/key_interfaces.md \
.deep_crawl/sections/data_contracts.md \
.deep_crawl/sections/error_handling.md \
.deep_crawl/sections/shared_state.md \
.deep_crawl/sections/domain_glossary.md \
.deep_crawl/sections/config_surface.md \
.deep_crawl/sections/conventions.md \
.deep_crawl/sections/gotchas.md \
.deep_crawl/sections/hazards.md \
.deep_crawl/sections/extension_points.md \
.deep_crawl/sections/change_playbooks.md \
.deep_crawl/sections/reading_order.md \
.deep_crawl/sections/environment_bootstrap.md \
.deep_crawl/sections/gaps.md \
.deep_crawl/sections/footer.md \
> .deep_crawl/DRAFT_ONBOARD.md
wc -w .deep_crawl/SYNTHESIS_INPUT.md .deep_crawl/DRAFT_ONBOARD.md
If the draft is under 80% of SYNTHESIS_INPUT.md word count, an assembly agent dropped findings. Identify which section agent under-produced by comparing each section's word count against its input word count, and re-spawn that agent with: "Your previous output was N words from M words of input. Include every finding — you dropped content. The context window is 1M tokens and there is no reason to exclude findings."
This is a mechanical step — no LLM needed. Extract all file:line citations that appear with [FACT] tags in SYNTHESIS_INPUT.md and verify each appears in DRAFT_ONBOARD.md.
# Extract unique file:line citations from [FACT]-tagged lines in findings
grep '\[FACT' .deep_crawl/SYNTHESIS_INPUT.md \
| grep -oP '[\w/.-]+\.(py|ts|tsx|js|jsx):\d+' | sort -u \
> .deep_crawl/_synthesis_fact_citations.txt
# Extract unique file:line citations from assembled draft
grep -oP '[\w/.-]+\.(py|ts|tsx|js|jsx):\d+' .deep_crawl/DRAFT_ONBOARD.md \
| sort -u > .deep_crawl/_draft_citations.txt
# Find dropped citations
comm -23 .deep_crawl/_synthesis_fact_citations.txt \
.deep_crawl/_draft_citations.txt \
> .deep_crawl/_dropped_citations.txt
TOTAL=$(wc -l < .deep_crawl/_synthesis_fact_citations.txt)
RETAINED=$(comm -12 .deep_crawl/_synthesis_fact_citations.txt .deep_crawl/_draft_citations.txt | wc -l)
DROPPED=$(wc -l < .deep_crawl/_dropped_citations.txt)
echo "Fact citations: $RETAINED/$TOTAL retained, $DROPPED dropped"
If DROPPED > 0, recover the dropped facts:
# For each dropped citation, extract the full [FACT] line from SYNTHESIS_INPUT.md
while IFS= read -r cite; do
grep "$cite" .deep_crawl/SYNTHESIS_INPUT.md | grep '\[FACT'
done < .deep_crawl/_dropped_citations.txt > .deep_crawl/_dropped_facts.txt
Classify each dropped fact by the section it belongs to (using the finding file it came from):
findings/traces/ → append to Critical Paths sectionfindings/modules/ → append to Module Behavioral Index sectionfindings/cross_cutting/ → append to the corresponding section (Error Handling, Shared State, etc.)findings/conventions/ → append to Conventions sectionAppend dropped facts to the appropriate section files in .deep_crawl/sections/, then re-concatenate DRAFT_ONBOARD.md using the Step 7 concatenation command.
Log: Step 8b: {RETAINED}/{TOTAL} fact citations retained ({PCT}%). Recovered {N} dropped facts.
If DROPPED == 0: Step 8b: {TOTAL}/{TOTAL} fact citations retained (100%). No recovery needed.
Log fact-level retention to .deep_crawl/REFINE_LOG.md:
## Fact-Level Retention
Citations in findings: {N}
Citations in draft: {M}
Retained: {R}/{N} ({PCT}%)
Dropped and recovered: {D}
Phase 4 and Phase 5 are delegated to separate sub-agents to prevent self-validation bias. The cross-referencer never sees the validator's output, and the validator cannot see the raw findings — only the cross-referenced document and the codebase.
Sub-agent 1: Cross-referencer
Before spawning, apply the agent logging protocol: write the prompt to .deep_crawl/agent_logs/prompts/phase4_crossref.md and log the SPAWN event. After completion, write the return text to .deep_crawl/agent_logs/results/phase4_crossref.md and log the DONE event.
Spawn a sub-agent for Phase 4 only:
You are cross-referencing a DEEP_ONBOARD.md draft for an AI agent onboarding document. This is an additive-only process — you may NOT delete, merge, summarize, or compress any content.
## Input Files
- .deep_crawl/DRAFT_ONBOARD.md — the assembled draft to cross-reference
- .deep_crawl/CRAWL_PLAN.md — the investigation plan (for coverage context)
- .claude/skills/deep-crawl/configs/compression_targets.json — for min_tokens floor check
## Your Task
Execute Phase 4 (CROSS-REFERENCE) — additive only, no deletions.
Write the cross-referenced document to .deep_crawl/DEEP_ONBOARD.md.
Write the log to .deep_crawl/REFINE_LOG.md.
When done: touch .deep_crawl/REFINE.done
[Copy Phase 4 instructions verbatim from SKILL.md]
Use run_in_background: true.
Cross-referencer completion gate: After REFINE.done appears, verify:
# Artifacts must exist
test -f .deep_crawl/DEEP_ONBOARD.md && test -f .deep_crawl/REFINE_LOG.md && echo "PASS" || echo "FAIL"
# Cross-referencing is strictly additive — final must be >= draft
wc -w .deep_crawl/DRAFT_ONBOARD.md .deep_crawl/DEEP_ONBOARD.md
# If DEEP_ONBOARD word count < DRAFT_ONBOARD, re-spawn with: "Your output lost words. Cross-referencing is additive only — you may NOT delete content. Re-execute all 4 steps."
Sub-agent 2: Validator
Before spawning, apply the agent logging protocol: write the prompt to .deep_crawl/agent_logs/prompts/phase5_validate.md and log the SPAWN event. After completion, write the return text to .deep_crawl/agent_logs/results/phase5_validate.md and log the DONE event.
After the cross-referencer completes and passes its gate, spawn a separate sub-agent for Phase 5:
You are validating a DEEP_ONBOARD.md document for an AI agent onboarding document.
## Input Files
- .deep_crawl/DEEP_ONBOARD.md — the refined document to validate
- .deep_crawl/CRAWL_PLAN.md — the investigation plan (for coverage targets only)
- Codebase at {ROOT_PATH} — read source files to verify claims
## NOT Available to You
You do NOT have access to SYNTHESIS_INPUT.md, findings/, or xray.md.
You validate the document as a standalone artifact — if information is missing, it's a gap.
## Your Task
Execute Phase 5 (VALIDATE) from the deep-crawl skill instructions below. Phase 5 ONLY.
You may read codebase source files to verify claims and check coverage.
You may NOT modify DEEP_ONBOARD.md — report gaps in VALIDATION_REPORT.md only.
Write the validation report to .deep_crawl/VALIDATION_REPORT.md.
When done: touch .deep_crawl/VALIDATE.done
[Copy Phase 5 instructions verbatim from SKILL.md]
Use run_in_background: true.
Validator completion gate: After VALIDATE.done appears, verify:
# Report must exist with required sections
test -f .deep_crawl/VALIDATION_REPORT.md && echo "PASS" || echo "FAIL"
grep -c "^### Q[0-9]" .deep_crawl/VALIDATION_REPORT.md # expect 12
grep -c "^### Spot Check" .deep_crawl/VALIDATION_REPORT.md # expect 10
grep -c "Adversarial Simulation" .deep_crawl/VALIDATION_REPORT.md # expect >= 1
Orchestrator spot-check (before remediation): The orchestrator reads 5 [FACT] citations from DEEP_ONBOARD.md and verifies them by reading the actual source file. Priority: new sections > modified sections > unchanged sections. Log:
# For each spot-check:
# Claim: "{quoted claim}" [FACT] ({file}:{line})
# Source: {what the actual file says}
# Verdict: CONFIRMED | WRONG_LINE | WRONG_CONTENT | FILE_MISSING
If >= 2/5 spot-checks fail, re-spawn S6 with corrective instructions before proceeding.
Remediation loop: After the validator completes, parse VALIDATION_REPORT.md for gaps:
# Check for NO or PARTIAL standard questions
grep -E "^\*\*Rating:\*\* (NO|PARTIAL)" .deep_crawl/VALIDATION_REPORT.md
If any standard question is NO or PARTIAL, or if the adversarial simulation is PARTIAL/FAIL:
Step R1: Map gaps to investigation tasks. For each gap, determine the investigation protocol and target:
| Gap Type | Protocol | Target | Finding Output |
|---|---|---|---|
| Q1 PURPOSE unanswerable | B | Entry point modules | findings/modules/gap_purpose.md |
| Q2 ENTRY missing entry points | A | Untraced entry points | findings/traces/gap_entry.md |
| Q3 FLOW incomplete path | A | Untraceable critical paths | findings/traces/gap_flow.md |
| Q4 HAZARDS not documented | B | Large/generated files from xray | findings/modules/gap_hazards.md |
| Q5 ERRORS incomplete | C | Error handling patterns | findings/cross_cutting/gap_errors.md |
| Q6 EXTERNAL missing systems | C | External system patterns (grep: requests, httpx, boto, redis) | findings/cross_cutting/gap_external.md |
| Q7 STATE incomplete | C | Shared state patterns (grep: global, _instance, _cache) | findings/cross_cutting/gap_state.md |
| Q8 TESTING thin | D | Testing conventions (Python: conftest.py, pytest.ini; TypeScript: jest.config, vitest.config, in-source import.meta.vitest) | findings/conventions/gap_testing.md |
| Q9 GOTCHAS insufficient | (review) | Re-scan existing findings for uncollected gotchas | findings/cross_cutting/gap_gotchas.md |
| Q10 EXTENSION missing | B | Primary entity base class + extension patterns | findings/modules/gap_extension.md |
| Q11 IMPACT missing | E | Hub modules from xray | findings/impact/gap_impact.md |
| Q12 BOOTSTRAP missing | C | Setup patterns (Python: requirements.txt, Dockerfile, Makefile; TypeScript: package.json, tsconfig.json, Dockerfile) | findings/cross_cutting/gap_bootstrap.md |
| Adversarial step N failed | B | Module referenced in failing step | findings/modules/gap_adversarial_{N}.md |
Step R2: Spawn targeted investigation agents. For each gap, spawn one sub-agent with the appropriate protocol. Use run_in_background: true for parallel execution. Apply the agent logging protocol: write prompts to .deep_crawl/agent_logs/prompts/gap_{name}.md, log SPAWN events, and after completion write return text to .deep_crawl/agent_logs/results/gap_{name}.md and log DONE events. Sub-agent prompts follow the standard Phase 2 format:
You are investigating {CODEBASE} at {ROOT_PATH} to fill a specific gap in the onboarding document.
## Gap Being Filled
{Question text} was rated {NO/PARTIAL} because: {validator's explanation}
## Investigation Protocol
{Protocol A/B/C/D text, copied verbatim}
## Specific Target
{What to investigate — e.g.:
Python: "Read conftest.py, 3 representative test files, pytest.ini. Document fixture patterns, mocking strategies, coverage configuration, marker usage."
TypeScript: "Read jest.config.ts/vitest.config.ts, 3 representative .test.ts files, test setup files. Document test runner config, mocking patterns, fixture strategies."}
## Evidence Standards
- [FACT]: Read specific code, cite file:line. Example: "retries 3x (payments.ts:89)" or "retries 3x (stripe.py:89)"
- [PATTERN]: Observed in >=3 examples, state count. Example: "DI via __init__ (12/14 services)"
- [ABSENCE]: Searched and confirmed non-existence. Example: "No rate limiting (grep — 0 hits)"
## Output
Write findings to: .deep_crawl/findings/{category}/gap_{name}.md
When done: touch .deep_crawl/batch_status/gap_{name}.done
Step R3: Wait for completion. Monitor sentinel files:
ls .deep_crawl/batch_status/gap_*.done 2>/dev/null | wc -l
Step R4: Patch DEEP_ONBOARD.md. For each new finding:
Log to REFINE_LOG.md: Gap closure: Investigated {N} gaps, added {M} words to {K} sections
Step 5: Re-validate. Re-spawn the validator to check ONLY the previously-failed questions:
You are re-validating specific questions that previously failed.
## Questions to Re-check
{List of previously NO/PARTIAL questions}
## Input
- .deep_crawl/DEEP_ONBOARD.md (updated with gap investigation results)
- Codebase at {ROOT_PATH}
Append results to .deep_crawl/VALIDATION_REPORT.md under "## Gap Closure Re-validation"
When done: touch .deep_crawl/REVALIDATE.done
Step R6: Accept or deliver. If all re-checked questions are now YES, proceed to delivery. If any remain NO/PARTIAL after one investigation cycle, note in Gaps section and deliver. Maximum 1 gap-closure cycle to prevent infinite loops.
When re-investigating specific sections (rather than a full crawl), the same quality standards apply. The orchestrator MUST NOT assemble or validate manually.
Rules:
= 60% of the corresponding findings word count. If below 60%, the orchestrator must direct S6 to include more detail.
Partial re-investigation follows this sequence:
wc -w > 500Step 10: Restore CLAUDE.md files.
After all sub-agents (S1-S6, cross-referencer, validator) have completed:
ROOT_PATH="${DEEP_CRAWL_ROOT:-$(pwd)}"
[ -f "$ROOT_PATH/CLAUDE.md.assembly_save" ] && mv "$ROOT_PATH/CLAUDE.md.assembly_save" "$ROOT_PATH/CLAUDE.md"
PARENT_PATH=$(dirname "$ROOT_PATH")
[ -f "$PARENT_PATH/CLAUDE.md.assembly_save" ] && mv "$PARENT_PATH/CLAUDE.md.assembly_save" "$PARENT_PATH/CLAUDE.md"
If the Agent tool is unavailable, execute Phase 4 then Phase 5 inline (sequential fallback). Never interleave — complete Phase 4 fully before starting Phase 5.
If the Agent tool is unavailable, process findings in category groups sequentially:
findings/traces/*.md → write critical_paths.mdfindings/modules/*.md → write module_index.md + domain_glossary.mdWrite each section to disk before reading the next group. Do NOT read SYNTHESIS_INPUT.md monolithically — that is the failure mode this design prevents.
echo "## Phase 4: CROSS-REFERENCE — $(date -Iseconds)" >> "$CRAWL_LOG"
echo "$(date -Iseconds) PHASE 4 CROSSREF" >> "$ORCH_LOG"
This phase adds cross-references between independently-assembled sections. It may NOT delete, merge, summarize, or compress any content. Every operation is strictly additive.
Step 1: Measure.
wc -w .deep_crawl/DRAFT_ONBOARD.md | awk '{printf "~%d tokens\n", $1 * 1.3}'
Read min_tokens from .claude/skills/deep-crawl/configs/compression_targets.json. If below floor, the assembly agents dropped findings — re-spawn the undersized section agents with explicit "include every finding" instructions. Do NOT proceed until the floor is met.
Log to .deep_crawl/REFINE_LOG.md:
## Phase 4 Cross-Reference Log
Step 1: PASS — draft is ~{N} tokens (floor: {min_tokens})
Step 2: Add cross-references between sections. The draft was assembled from 5 independent agents who couldn't reference each other's output. Add links:
(see Module Index).(see Critical Path N).(see Module Index: {module}).(see Module Index) links.(see Gotcha #N) links.(see Data Contracts) links.(see Configuration Surface) links.Step 2b: Normalize voice across sections. The draft was assembled by 5-7 independent agents with different writing styles. Scan for these inconsistencies and fix them (additive rewording only — do not remove content):
[FACT] ({file}:{line}) format — some agents may use [FACT]: {file}:{line} or [FACT] {file}:{line}.Log: Step 2: Added {N} cross-references ({M} gotcha clusters linked)
Step 3: Verify completeness. Two checks:
Log: Step 3: {N}/10 questions answerable, {gaps noted}
Step 4: Verify caching structure. Confirm stable sections before volatile sections.
Completion gate: REFINE_LOG.md must have entries for Steps 1-3. Word count of DEEP_ONBOARD.md must be >= word count of DRAFT_ONBOARD.md (cross-references only add words, never remove).
Write final version to .deep_crawl/DEEP_ONBOARD.md.
wc -w .deep_crawl/DRAFT_ONBOARD.md .deep_crawl/DEEP_ONBOARD.md
# DEEP_ONBOARD >= DRAFT_ONBOARD (strictly additive)
echo "## Phase 5: VALIDATE — $(date -Iseconds)" >> "$CRAWL_LOG"
echo "$(date -Iseconds) PHASE 5 VALIDATE" >> "$ORCH_LOG"
For each of the 12 standard questions, attempt to answer using ONLY DEEP_ONBOARD.md. Write each answer to VALIDATION_REPORT.md in this format:
### Q1. PURPOSE: What does this codebase do?
**Rating:** YES / NO / PARTIAL
**Answer:** [1-2 sentence answer derived from the document — proving you read it, not just grepped]
**Source section:** [which section answered this]
If rating is NO or PARTIAL, also include:
**Gap:** [1-sentence description of what's missing]
**Investigation needed:** [Protocol A/B/C/D] targeting [specific files or patterns]
**Expected output:** [what the investigation should produce]
Example:
### Q8. TESTING: What are the testing conventions?
**Rating:** PARTIAL
**Answer:** Conventions 14-20 cover test structure, organization, markers, fixtures, mocking, helpers, pytest config. But the document self-identifies testing as a gap.
**Source section:** Conventions sections 14-20, Gaps section
**Gap:** Fixture patterns and mocking strategies are surface-level only. No examples of how conftest.py layers fixtures or how tests mock LLM calls.
**Investigation needed:** Protocol D targeting conftest.py, tests/unit/agents/test_research_director.py, tests/unit/core/test_llm.py
**Expected output:** Detailed testing conventions with fixture hierarchy, LLM mocking patterns, database isolation strategies
TypeScript example:
### Q8. TESTING: What are the testing conventions?
**Rating:** PARTIAL
**Answer:** Testing section covers test runner, file patterns, and basic structure, but lacks fixture/mock detail.
**Source section:** Testing Conventions
**Gap:** Test setup patterns and mocking strategies are surface-level only. No examples of vitest/jest configuration or how tests mock external services.
**Investigation needed:** Protocol D targeting vitest.config.ts or jest.config.ts, 3 representative *.test.ts files, test setup files
**Expected output:** Detailed testing conventions with mock patterns, fixture strategies, test isolation approach
The 12 standard questions:
Q1. PURPOSE: What does this codebase do? Q2. ENTRY: Where does a request/command enter the system? Q3. FLOW: What's the critical path from input to output? Q4. HAZARDS: What files should I never read? Q5. ERRORS: What happens when the main operation fails? Q6. EXTERNAL: What external systems does this talk to? Q7. STATE: What shared state exists that could cause bugs? Q8. TESTING: What are the testing conventions? Q9. GOTCHAS: What are the 3 most counterintuitive things? Q10. EXTENSION: If I need to add a new [primary entity], where do I start? Q11. IMPACT: If I change the most-connected module, what files are affected? Q12. BOOTSTRAP: How do I set up a dev environment and run tests from scratch?
If any question is NO or PARTIAL, fix the gap in DEEP_ONBOARD.md before continuing.
| Metric | Target | Actual |
|---|---|---|
| Subsystems with >= 1 documented module | 100% | {N}/{T} |
| Xray pillars in Module Index | 100% | {N}/{T} |
| Entry points with traces | 100% | {N}/{T} |
| Cross-cutting concerns from crawl plan | 100% | {N}/{T} |
| Module Index entries vs core files | >= 25% | {N}/{T} |
If any metric is below target, return to Phase 3 findings and add content.
Select 10 [FACT] claims from DEEP_ONBOARD.md. Priority: Gotchas first, then Critical Paths. For each claim, read the referenced file:line in the actual codebase and verify accuracy.
Write each check to VALIDATION_REPORT.md:
### Spot Check {N}
**Claim:** "{quoted claim}" ({file}:{line})
**Actual code:** {what the code actually says}
**Verdict:** CONFIRMED / INACCURATE / STALE
**Action:** {none / corrected in document / noted as gap}
Identity verification (mandatory, not counted toward 10 spot checks): Read the Identity section. For each technology/framework mentioned in the stack description, verify it is actively used (not just imported) by checking for: mounted routes (web frameworks), registered commands (CLI frameworks), configured connections (databases), active service endpoints. If a framework is mentioned but not actively used, flag as INACCURATE.
For each section: "Is this content literally duplicated from another section in this document?" If yes, cut the duplicate. Do NOT cut synthesized information just because the raw data is grepable — the synthesis (cross-module patterns, behavioral descriptions, contextual annotations) is the value.
WARNING: Do NOT flag content as "redundant" because it could be derived from source code. The purpose of this document is to save agents from reading source. Content is redundant ONLY if literally duplicated within this document.
Step 1: Determine the most common modification task for this domain:
Step 2: Using ONLY DEEP_ONBOARD.md, write a concrete 5-step implementation plan.
Write the plan to VALIDATION_REPORT.md under ### Adversarial Simulation.
Step 3: Read actual codebase files to verify each step would produce correct code. For each step, note whether the document gave correct, incorrect, or missing guidance. If incorrect or missing, include:
**Missing info:** [what the document should have said]
**Source module:** [which module contains the correct information]
**Investigation needed:** Protocol B targeting {module path}
Step 4: Score: PASS (5/5 correct), PARTIAL (3-4/5), FAIL (<=2/5). If PARTIAL or FAIL, identify what's missing from DEEP_ONBOARD.md and add it.
Confirm stable sections (Identity, Critical Paths, Module Index) come before volatile sections (Gotchas, Gaps). This maximizes prompt cache prefix hits.
Phase 5 completion gate: VALIDATION_REPORT.md must contain all of:
# Verify report completeness
grep -c "^### Q[0-9]" .deep_crawl/VALIDATION_REPORT.md # expect 12
grep -c "^### Spot Check" .deep_crawl/VALIDATION_REPORT.md # expect 10
grep -c "Adversarial Simulation" .deep_crawl/VALIDATION_REPORT.md # expect >= 1
Write validation results to .deep_crawl/VALIDATION_REPORT.md.
echo "## Phase 6: DELIVER — $(date -Iseconds)" >> "$CRAWL_LOG"
echo "$(date -Iseconds) PHASE 6 DELIVER output=$OUTPUT_DIR/" >> "$ORCH_LOG"
Step 1: Deliver to unified output directory (output/{repo-name}/):
# OUTPUT_DIR was set in Phase 0: output/{repo-name}
mkdir -p "$OUTPUT_DIR/data"
# Final reports at output root
cp .deep_crawl/DEEP_ONBOARD.md "$OUTPUT_DIR/deep_onboard.md"
# xray.md and xray.json are already in OUTPUT_DIR from the xray run
# All intermediate data under data/
for f in CRAWL_PLAN.md SYNTHESIS_INPUT.md DRAFT_ONBOARD.md VALIDATION_REPORT.md \
PREFLIGHT.md REFINE_LOG.md; do
cp ".deep_crawl/$f" "$OUTPUT_DIR/data/" 2>/dev/null
done
cp -r .deep_crawl/findings "$OUTPUT_DIR/data/" 2>/dev/null
cp -r .deep_crawl/sections "$OUTPUT_DIR/data/" 2>/dev/null
cp -r .deep_crawl/batch_status "$OUTPUT_DIR/data/" 2>/dev/null
cp -r .deep_crawl/agent_logs "$OUTPUT_DIR/data/" 2>/dev/null
echo "Output delivered to: $OUTPUT_DIR/"
echo " $OUTPUT_DIR/deep_onboard.md"
echo " $OUTPUT_DIR/xray.md"
echo " $OUTPUT_DIR/data/agent_logs/ (agent prompts, results, orchestrator.log)"
echo " $OUTPUT_DIR/data/crawl.log (full pipeline debug log)"
echo " $OUTPUT_DIR/data/ (all intermediate artifacts)"
# Log final delivery stats
echo "## Delivery" >> "$CRAWL_LOG"
echo " Final document: $(wc -w < .deep_crawl/DEEP_ONBOARD.md 2>/dev/null || echo 0) words" >> "$CRAWL_LOG"
echo " Total findings: $(ls .deep_crawl/findings/*/*.md 2>/dev/null | wc -l) files" >> "$CRAWL_LOG"
echo " Total sections: $(ls .deep_crawl/sections/*.md 2>/dev/null | wc -l) files" >> "$CRAWL_LOG"
echo " FACT citations: $(grep -c '\[FACT' .deep_crawl/DEEP_ONBOARD.md 2>/dev/null || echo 0)" >> "$CRAWL_LOG"
echo "Completed: $(date -Iseconds)" >> "$CRAWL_LOG"
Step 2: Configure CLAUDE.md for automatic delivery (local repos only — skip for remote):
if [ "${DEEP_CRAWL_MODE:-local}" = "remote" ]; then
echo "Skipping CLAUDE.md update (remote repo — read-only analysis)"
else
ONBOARD_PATH="$OUTPUT_DIR/deep_onboard.md"
if [ -f CLAUDE.md ]; then
grep -q "DEEP_ONBOARD" CLAUDE.md || cat >> CLAUDE.md << ONBOARD_EOF
# Codebase Onboarding
Read $ONBOARD_PATH before starting any task. It contains verified behavioral documentation, critical paths, gotchas, and conventions for this codebase.
## Onboarding Document Change Tracking
If you modify code that may affect claims in $ONBOARD_PATH, append to $OUTPUT_DIR/data/.onboard_changes.log:
{ISO_TIMESTAMP} | {FILE:LINE} | {SECTION_PATH} | {BRIEF_DESCRIPTION}
Section path uses document headings: \`{## Section}\` or \`{## Section} / {### Subsection}\`.
Do not manually edit DEEP_ONBOARD.md — it is a generated artifact.
ONBOARD_EOF
else
cat > CLAUDE.md << CLEOF
# Project Instructions
## Codebase Onboarding
Read $ONBOARD_PATH before starting any task. It contains verified behavioral documentation, critical paths, gotchas, and conventions for this codebase.
## Onboarding Document Change Tracking
If you modify code that may affect claims in $ONBOARD_PATH, append to $OUTPUT_DIR/data/.onboard_changes.log:
{ISO_TIMESTAMP} | {FILE:LINE} | {SECTION_PATH} | {BRIEF_DESCRIPTION}
Section path uses document headings: \`{## Section}\` or \`{## Section} / {### Subsection}\`.
Do not manually edit DEEP_ONBOARD.md — it is a generated artifact.
CLEOF
fi
fi
Step 3: Validation report is already in $OUTPUT_DIR/data/VALIDATION_REPORT.md from Step 1.
Step 4: Report to user:
Deep Crawl Complete
===================
Codebase: {files} files, ~{tokens} tokens
Document: ~{doc_tokens} tokens covering {tokens} token codebase
Crawl: {tasks_completed}/{tasks_planned} tasks
Questions answerable: {score}/12
Claims verified: {verified}/10
Adversarial test: {PASS/PARTIAL/FAIL}
Gotchas documented: {count}
Request traces: {count}
Delivered to: output/{repo-name}/DEEP_ONBOARD.md
CLAUDE.md: {UPDATED/CREATED/SKIPPED (remote)} — document will auto-load in all sessions
Prompt caching: Active — subsequent sessions read at ~90% reduced cost
Clone: {remote only: .deep_crawl/repo/ — rm -rf when done}
Not all claims have the same epistemological status. Use these tags:
| Level | Tag | Standard | Example |
|---|---|---|---|
| Verified Fact | [FACT] | Read the specific code, confirmed at cited file:line | "payment_service retries 3x with exponential backoff (providers/stripe.py:89 or payments.ts:89)" |
| Verified Pattern | [PATTERN] | Observed in >=3 independent examples, state the count | "All service classes use DI via init (observed in 12/14 services)" |
| Verified Absence | [ABSENCE] | Searched for something expected, confirmed it doesn't exist | "No rate limiting found (grepped for rate_limit, throttle, slowapi — zero hits)" |
Rules:
No tests in the codebase:
No git history:
Remote repository:
DEEP_CRAWL_ROOT points to .deep_crawl/repo/ (the clone directory)output/{repo-name}/ (same structure as local crawls)gh repo clone handles auth via the user's gh credentialsrm -rf .deep_crawl/repo/ when doneMonorepo (>2000 files):
Interrupted crawl:
Missing investigation_targets in xray output:
Sub-agent spawning fails:
Assembly sub-agent produces undersized section:
DEEP_ONBOARD.md is included in CLAUDE.md so that:
No token ceilings. Include everything that's not redundant with information derivable from file names and signatures. Let the content determine the size.
Downstream agents log section references to docs/.onboard_feedback.log. During refresh, this data informs section prioritization.
.claude/
├── agents/
│ ├── deep_crawl.md (sequential fallback agent)
│ └── deep_onboard_validator.md
└── skills/
└── deep-crawl/
├── SKILL.md (this file — orchestrator instructions)
├── COMMANDS.md
├── configs/
│ ├── generic_names.json
│ ├── domain_profiles.json
│ ├── compression_targets.json
│ ├── quality_gates.json
│ └── exemplar_templates.md
└── templates/
├── DEEP_ONBOARD.md.template
├── CRAWL_PLAN.md.template
└── VALIDATION_REPORT.md.template