| name | agent-inspect |
| description | Use this skill whenever the user asks to inspect, audit, review, assess, or evaluate a codebase/project for maintainability, extensibility, architecture, reliability, security, testing, engineering quality, or AI-agent/prompt/tooling risks. Also use this when invoked by `/inspect`. It performs a read-only, evidence-backed, multi-agent project inspection with complete file coverage, per-dimension scoring, blunt technical findings, and explicit degraded-mode disclosure. |
Agent Inspect
Run a read-only, multi-agent, multi-dimensional code inspection for the current project.
Core Requirements
- Inspect the current worktree and recent commit range first to build context.
- Use multiple subagents in parallel for read-only evaluation across different dimensions when the host platform supports it.
- Cover at least these dimensions:
- Architecture and module boundaries
- Maintainability
- Extensibility
- Reliability and error handling
- Security
- Testing and engineering quality
- AI-specific project risks (prompt/agent/tool/model/config/eval)
- The main thread must verify key evidence directly instead of only restating subagent conclusions.
- Do not modify code unless the user explicitly asks for changes afterward.
- Match the output language to the language used by the user in the current conversation.
Dimension Boundaries
Use these boundaries to reduce duplicate findings and make scorecard rationale specific:
| Dimension | Owns | May reference | Must hand off |
|---|
| Architecture and module boundaries | Module boundaries, dependency direction, layering, ownership, coupling across subsystems. | Maintainability symptoms caused by structural coupling; extensibility limits caused by boundary choices. | Local readability or style-only issues to Maintainability; runtime failure modes to Reliability; auth/data exposure to Security. |
| Maintainability | Readability, local complexity, naming, cohesion, reviewability, change isolation inside a module. | Architectural coupling when it explains why code is hard to change. | Public interface design to Architecture; missing regression coverage to Testing; AI prompt/tool-specific risks to AI-specific project risks. |
| Extensibility | Plugin/provider seams, configuration extension points, migration paths, ability to add variants without core rewrites. | Architecture and maintainability evidence when it affects extension cost. | Current runtime breakage to Reliability; security policy expansion to Security; test harness capability to Testing. |
| Reliability and error handling | Failure modes, fallback boundaries, retries, timeouts, observability, data-loss and partial-failure behavior. | Security or AI-tool failures when their user impact is reliability degradation. | Exploitability and trust boundaries to Security; prompt/model/eval ownership to AI-specific project risks; test adequacy to Testing. |
| Security | Authentication, authorization, secret handling, injection surfaces, data exposure, unsafe defaults, supply-chain risk. | Reliability evidence when a failure mode creates a security impact. | Non-exploitable resilience issues to Reliability; generic dependency hygiene to Engineering quality unless it creates a concrete security risk. |
| Testing and engineering quality | Test strategy, CI gates, reproducibility, lint/type/build checks, release hygiene, fixtures, coverage quality. | Any dimension's finding when tests would catch or prevent it. | Root product/design failure to the owning dimension; prompt/model eval risks to AI-specific project risks unless the issue is general CI wiring. |
| AI-specific project risks | Prompt/tool routing/model selection/eval/agent config/memory/context risks that are specific to AI-assisted systems. | Reliability, Security, and Testing findings when AI behavior is the cause or amplifier. | Non-AI fallback bugs to Reliability; non-AI auth/data exposure to Security; generic test gaps to Testing. Use cross_dimension_links instead of duplicate scoring when a finding spans dimensions. |
Agent Coordination
Prefer the host platform's native subagent mechanism launched from the primary command context for parallel inspection. Do not run /inspect itself as a subtask; /inspect is the coordinator.
- Do not interpret "multiple subagents" as an external collaboration system, external CLI orchestrator, or extra team protocol.
- Unless the user explicitly asks for
squad or another external multi-agent tool, do not use them.
- If repository-local rules mention
squad, collaboration agents, or another multi-agent framework, do not automatically switch to it; this skill means host-platform native subagents by default.
- If native host-platform subagents are unavailable, enter degraded mode instead of using
squad as a silent substitute.
Use 7 parallel subagents by default, one per dimension. This keeps each subagent focused on a single concern and produces independent per-dimension scores for the Dimension Scorecard.
Subagent-architecture: architecture and module boundaries
Subagent-maintainability: maintainability
Subagent-extensibility: extensibility
Subagent-reliability: reliability and error handling
Subagent-security: security
Subagent-testing: testing and engineering quality
Subagent-ai-risks: AI-specific project risks (prompt/agent/tool/model/config/eval)
Degradation Ladder
If the host platform cannot sustain 7 concurrent subagents reliably, degrade in ordered steps rather than dropping any dimension. Use these trigger conditions; do not invent a looser fallback:
- Start in 7-agent mode. If two or more subagents fail to launch, are rejected by quota, or remain queued / silent for more than 5 minutes, cancel the incomplete fan-out and retry in 5-agent mode.
- In 5-agent mode, if any subagent fails to launch, is rejected by quota, or remains queued / silent for more than 5 minutes, cancel the incomplete fan-out and retry in 3-agent mode.
- In 3-agent mode, if any subagent fails to launch, is rejected by quota, or remains queued / silent for more than 5 minutes, stop using subagents and enter single-thread degraded mode.
- If the host runtime has no reliable native subagent API at all, skip retries and enter single-thread degraded mode immediately.
- 5-agent mode: merge architecture, maintainability, and extensibility into one structural subagent. Keep reliability, security, testing, and AI-risks separate.
- 3-agent mode: fall back to a structural / reliability+security+AI-risks / testing split. Use only when 5-agent mode still fails.
- single-thread degraded mode: run the inspection in the main thread only. The Executive Summary must state
subagent count: 0, single-thread degraded; every Dimension Scorecard entry must use Scored by: main-thread-only degraded; Residual Risks must explain that conclusions came from main-thread inspection instead of independent subagent review.
Never silently merge security, reliability, or AI-risks into another dimension. If platform limits force such a merge, state it in the Executive Summary and mark the affected Dimension Scorecard entries with the merge note. The current subagent count must be stated explicitly in the Executive Summary; do not pretend the default 7-agent inspection ran when it did not.
Main-thread responsibilities:
-
Review the current worktree and recent commits first. Scan commit trailers for Co-authored-by: or Assisted-by: markers that indicate AI-authored regions, and tell the relevant subagents to apply extra correctness scrutiny in those files. Do not invent heuristics (comment density, emoji, style) to guess AI authorship โ trust commit trailers only. Commit trailers identify candidates for extra AI-attribution scrutiny only; they do NOT restrict AI Project Focus. All files matching prompts/, agents/, skills/, commands/, .opencode/, .claude/, .codex/, or .agents/ paths must still receive AI-risk inspection regardless of trailer presence.
-
Determine the inspection scope. Apply the default exclusions from Coverage Rules ยง3 (dependency trees, lock files, build output, binary assets) and list every excluded path or pattern at the top of the report. Every file in the remaining scope must be fully read by at least one coverage owner. Dimension subagents may reread any file needed for their dimension; do not block cross-dimension review just because another subagent already read the file.
-
Maintain a coverage ledger during fan-out and fan-in: file -> reader(s) -> dimension(s). The ledger must show which subagent or main-thread pass fully read each in-scope file and which dimension used that file as direct evidence. Dimension Scorecard entries may cite only files directly read by that dimension subagent or explicitly verified by the main thread.
-
If the repository is too large to cover completely within the host platform's context or turn budget, enter over-budget degraded mode:
- identify core files first and inspect those completely
- list every file that was NOT fully read in Residual Risks, with absolute paths, not summaries
- do not silently sample or extrapolate
- do not claim "complete evaluation" in the Executive Summary or Verdict while in this mode
-
After subagents return, deduplicate findings across subagents on the triple (file, line range, category). When two subagents flag the same region from different angles โ for example, the architecture subagent calls out structural debt in a file that the security subagent also flags for an auth gap โ merge them into a single finding whose Recommendation covers both concerns and record the other subagent's view in cross_dimension_links, instead of emitting duplicate entries. Dimension-split subagents will overlap; an unmerged report dilutes signal.
-
Assemble the Dimension Scorecard using the scores proposed by each subagent. After direct source verification, adjust any score that the evidence does not support. When the main-thread-verified score differs from the subagent's proposal, show both scores in the scorecard entry with a one-sentence reason; do not silently reconcile.
-
After dedup and scorecard assembly, directly verify at least 3 classes of key evidence:
- The source location of one high-severity finding
- One configuration or script location related to testing or engineering quality
- One finding you consider most debatable or most likely to be a false positive
-
Critical and High findings in Top Findings must be main-thread-verified or cross-subagent-corroborated. They cannot be subagent-only.
-
Medium findings reported by only one subagent may enter Top Findings only with Confidence: low and an explicit explanation of what was not main-thread verified. Prefer moving unverified Medium findings to Residual Risks. Low findings may be subagent-only, but their confidence must not exceed low.
-
If main-thread verification conflicts with a subagent conclusion or score, explicitly report the conflict instead of silently resolving it.
-
If some subagents are queued, fail, time out, or never return, do not pretend that a full parallel inspection completed. Enter subagent degraded mode and explicitly report:
- Which subagents returned successfully
- Which subagents did not return
- Which conclusions mainly come from main-thread verification
- Which dimensions were merged via the degradation ladder due to concurrency limits
- If the runtime does not support reliable native parallel subagents at all, continue with a read-only inspection only in single-thread degraded mode, and say so explicitly in the Executive Summary and Residual Risks.
Coverage Rules
The default is complete coverage, not sampling. Sampling silently omits code; this skill treats that as a failure mode, not a shortcut.
-
Every file in the inspection scope must be fully read by at least one coverage owner. A coverage owner may be a dimension subagent or the main thread in degraded mode. Dimension subagents may reread overlapping files when their dimension requires direct evidence. No file is skipped because it is "too long". Files that do not fit in a single pass are split into overlapping ranges handled by the same coverage owner for continuity; do not summarize-then-infer over a range that was not directly read.
-
A single pass should comfortably fit in the subagent's context without relying on long-context retrieval tricks. The precise token budget depends on the host platform and model; the rule is "read in full or split deliberately", not a fixed line count.
-
Scope exclusions (list these at the top of the report, not silently dropped):
- dependency trees:
node_modules/, vendor/, .venv/, target/, Pods/, third_party/
- lock files:
package-lock.json, yarn.lock, pnpm-lock.yaml, poetry.lock, Cargo.lock, Gemfile.lock, go.sum
- build and generated output:
dist/, build/, .next/, out/, .turbo/, coverage reports
- binary assets, data fixtures, and media files
- If the user passes an explicit include/exclude argument to
/inspect, honor that over these defaults, and state the override at the top of the report.
-
If the repository is too large to cover completely within the host platform's context or turn budget, enter over-budget degraded mode (see Main-thread responsibilities ยง3). In this mode, every file that was NOT fully read is listed with its absolute path in Residual Risks; the Executive Summary and Verdict must not claim "complete evaluation".
-
Dimension Scorecard entries must be based on files actually read in full. Do not infer a dimension score from directory structure or file names alone; if a dimension has no directly-inspected evidence, say so in its scorecard entry.
-
The final report must include a concise coverage ledger summary. Use the shape file -> reader(s) -> dimension(s). If the full ledger is too large for the report body, include a summary by directory and list the complete ledger as an artifact; do not omit ledger coverage entirely.
AI Project Focus
When inspecting AI-assisted software projects, prioritize these common locations:
prompts/, agents/, skills/, commands/, .opencode/, .claude/, .codex/, .agents/
- Files whose names include model, tool, provider, prompt, eval, memory, or context
- Modules responsible for agent orchestration, tool routing, fallback, retry, or config loading
Explicitly check:
- Whether prompt / agent logic is maintainable
- Whether tool invocation and model selection are coupled too tightly
- Whether error recovery, fallback, and retry boundaries are clear
- Whether configs, rules, skills, and commands are easy to extend and migrate
- Whether there is context pollution, implicit state, or hard-to-verify behavior
- Whether verification paths are strong enough to support reliable delivery after AI-assisted changes
Review Tone
Use a blunt, kernel-style technical review voice: direct, specific, evidence-backed, and intolerant of vague reasoning.
- Criticize code, architecture, tests, and failure modes directly.
- Never attack the author, their ability, intent, or character.
- Avoid vague praise and vague criticism.
- Every strong claim must include concrete evidence.
- Prefer "this fails because..." over "this could be improved".
- If a change mixes unrelated work, say that it is not honestly reviewable and should be split.
- If tests only prove mocks, happy paths, or implementation details, call that out directly.
- If error handling hides bugs, say that it hides bugs.
- Do not soften real problems with filler like "maybe", "possibly", or "nice work overall" unless the uncertainty is real and explicitly justified.
- Keep the tone professional enough for an open-source review: blunt about the patch, not rude to people.
Visual Markers
Use lightweight visual markers to make the report scannable, without replacing evidence.
Marker conventions:
- โ
Confirmed strength or verified positive signal
- โ Confirmed issue or broken behavior
- โ ๏ธ Risk, caveat, or incomplete evidence
- ๐ข Low risk / healthy area
- ๐ก Medium risk / needs attention
- ๐ด High risk / urgent or structurally dangerous
Rules:
- Markers are scanability aids, not required data fields, and they never replace required finding fields or evidence.
- The
Severity field already contains the authoritative severity value: Critical / High / Medium / Low / Info. Emoji markers are optional additions.
- Marker absence is not a format failure.
- When markers are used, put them in section headers, finding bullets, or short status labels.
- Do not put emoji in every sentence.
- Do not let emoji replace file paths, line numbers, or technical explanation.
- Severity markers must match the finding severity. Do not label a minor nit as ๐ด.
- Keep the report readable in plain text terminals.
Output Format
The final report must use this structure:
Executive Summary
Summarize the overall project quality in 3-6 sentences. State the subagent count (default 7; if degraded, say so here), whether complete coverage was achieved or over-budget degraded mode was entered, and reference the lowest-scored dimension(s) from the Dimension Scorecard.
Dimension Scorecard
Every dimension in Core Requirements ยง3 must appear here with a score. The scale is five named levels; do not use decimals and do not use arbitrary numbers outside this scale.
- ๐ด 1 โ Critical debt: this dimension is actively harming the project; new work on top of it is difficult or unsafe.
- ๐ 2 โ Significant debt: multiple structural issues; delivery speed is measurably slowed.
- ๐ก 3 โ Functional but fragile: works today, but one bad change away from real trouble.
- ๐ข 4 โ Healthy: solid practice; minor refinements only.
- โ
5 โ Exemplary: reference-quality; worth copying elsewhere.
Each scorecard entry must contain:
- Dimension name
- Score (1-5, named level)
- Rationale: one or two sentences citing the finding ID or positive evidence ID that triggered the score (e.g.
H-1, M-2, P-1). Do not rely on broad adjectives such as "healthy" or "fragile" without the triggering evidence ID.
- Scored by: the responsible subagent, followed by
main-thread-verified or main-thread-override. When the main thread overrode the subagent's proposal, show both scores (e.g. Subagent-reliability proposed ๐ก 3; main-thread override to ๐ 2) with a one-sentence reason. Do not silently reconcile.
Scoring rules:
- A score of ๐ด 1 or โ
5 in any dimension requires at least two distinct findings or source locations as evidence, to guard against over-reaction in either direction.
- Do not assign โ
5 to a dimension that has any open finding at High or Critical severity within it.
- Do not assign ๐ด 1 to a dimension that has no open finding at High or Critical severity cited in its rationale. If you claim critical debt, show the bug โ symmetric with rule 2.
- If a dimension was merged into another due to the Degradation Ladder, the merged entry must say so (e.g.
merged into Subagent-structural โ see Executive Summary).
- If over-budget degraded mode limited coverage in a dimension, the entry must state which files were inspected in full and mark the score as based on partial coverage.
Score Determination Matrix:
- โ
5 โ Exemplary: complete coverage; no open finding in the dimension; at least two reproducible positive evidence IDs (for example
P-1, P-2) showing mechanisms worth copying.
- ๐ข 4 โ Healthy: complete coverage; no High or Critical finding in the dimension; every Medium finding affecting the score is
main-thread-verified or cross-subagent-corroborated.
- ๐ก 3 โ Functional but fragile: complete coverage with at least one unverified score-affecting Medium finding, or partial coverage that is explicitly disclosed in Executive Summary / Residual Risks and does not hide High or Critical evidence.
- ๐ 2 โ Significant debt: two or more High findings in the dimension, one High finding crossing a core path, or the same anti-pattern repeated across multiple files.
- ๐ด 1 โ Critical debt: any blocking Critical finding, or coverage severely missing and undisclosed for the dimension.
Cross-impact rule: the same finding may affect multiple dimensions, but each affected scorecard rationale must explain a different failure mode. Do not copy the same rationale across dimensions; use cross_dimension_links to connect the shared evidence.
Top Findings
Order findings by severity:
- Critical
- High
- Medium
- Low
- Info
Each finding must include the following fields:
- ID: a short identifier such as
H-1, M-2, so the Verdict, Residual Risks, or other findings can reference it.
- Title: one-line problem statement.
- Severity: Critical / High / Medium / Low / Info.
- Blocking:
yes if the problem should block merge or release as-is, no otherwise. Severity and blocking are related but not identical โ a High finding can be non-blocking if well scoped, and a Medium can be blocking if it masks data loss or hides real bugs behind broad fallback.
- Location: file path + concrete line range.
- Evidence: quote 3-15 lines of the actual source that triggers the finding, with the file path and line range repeated above the snippet. If the finding is about a missing behavior, quote the closest relevant source and explain what is not there. Prose summaries alone are not evidence. This is the single strongest defense against hallucinated findings: the reader must be able to verify the claim against real code without opening the repository.
- Why it matters: the concrete risk or impact. Do not soften with "this could be improved"; state what fails and under what condition.
- Recommendation: the specific fix direction. If multiple fixes are viable, list them in order of preference. A finding without a recommendation is an observation, not an audit result.
- Confidence:
high / medium / low. Use low for inferential findings that could not be directly verified; do not promote inference to certainty. Medium findings that remain subagent-only must use low.
- Verification: one of
main-thread-verified (the main thread opened the source and confirmed), cross-subagent-corroborated (two or more subagents independently reported the same issue), or subagent-only (a single subagent reported it and the main thread did not re-check). Critical and High findings cannot use subagent-only. Medium subagent-only findings should usually move to Residual Risks; if kept in Top Findings, they must use Confidence: low and name the missing verification. This pushes the SKILL's existing "main thread verifies โฅ3 classes" discipline down to per-finding transparency. In the default 7-agent mode (one subagent per dimension), cross-subagent-corroborated specifically means the issue genuinely spans dimensions โ that is stronger signal than in 3-agent mode where overlapping dimensions made corroboration routine.
- CWE (optional, security findings only): a CWE ID when the finding maps cleanly to CWE Top 25. Skip rather than guess; incorrect CWE IDs are worse than none.
- cross_dimension_links (optional): IDs of related findings from other subagents that were merged into or overlap with this one.
Do not emit CVSS vectors. LLM-generated CVSS without runtime, exploit, and deployment context produces plausible-looking but incorrect scores, which is the specific failure mode this SKILL is written to avoid.
Strengths
List what the project is doing well. Do not only look for negatives.
Residual Risks
List risks that are not fully proven yet but still worth further inspection.
If the inspection ran in degraded mode, explicitly describe evidence-coverage limitations here.
Verdict
Give a concise final judgment:
- Whether the project is worth continuing on its current path
- The top 1-3 priorities to address next
Execution Notes
- Default mode is complete coverage. Only enter over-budget degraded mode when the repository genuinely cannot fit within the host platform's budget, and disclose it per Coverage Rules ยง4. Do not "prioritize" by silently dropping files.
- Within each finding, evidence comes before broad interpretation. Avoid generic evaluation.
- If no serious issues are found, say so explicitly instead of inventing problems. A Dimension Scorecard full of ๐ข 4 and โ
5 is a valid outcome when the evidence supports it โ see the two-finding rule for โ
5.
- If the user passes extra context, treat it as an additional focus area without narrowing the default audit dimensions.
- Keep the output language aligned with the user's language unless the user explicitly requests another language.
- If over-budget degraded mode was entered, say so in the Executive Summary and list un-read files in Residual Risks. The per-finding
Verification field already records evidence provenance; the Executive Summary note is about coverage, not per-finding trust.
- If conclusions mainly come from main-thread verification instead of multiple successful subagent returns, say so explicitly. Subagent degraded mode and over-budget degraded mode are independent โ you may be in one, both, or neither.