Run any Skill in Manus with one click

Get Started

quality

Stars10

Forks1

UpdatedMay 27, 2026 at 02:02

Code quality analysis through cleanup analysis, multi-perspective reviews, and explicit finding decisions.

Installation

Install with Codex or Claude Copy this prompt, paste it into Codex, Claude, or another assistant, and let it review the skill page and install it for you.

Run Skill in Manus

Source

Yuvasee

Yuvasee/samocode

View GitHub Repository View Creator Repositories

Download

Run Skill in Manus

Related occupationsSOC

Based on SOC occupation classification

Software Quality Assurance Analysts and TestersComputer and Mathematical Occupations·SOC 15-1253

SKILL.md

readonly

Quality

Analyze code quality through different lenses: cleanup analysis for technical debt and multi-perspective reviews for holistic assessment.

Important Finding Policy

Any blocking or important finding is not complete when it is merely documented. It must receive an explicit decision:

Decision	Required evidence
Fix now	Link or describe the code/test change that will resolve it
Defer	Ticket/link or concrete follow-up owner/reason
Reject	Evidence that the finding is false, inapplicable, or intentionally accepted

Do not mark quality work done while blocking or important findings remain undecided.

Review Debt Ledger

Maintain _review_debt.md for important review debt. Use the active session path when known; otherwise use the original invocation directory. Never write the ledger into a temporary review worktree that will be removed.

Ledger rows:

| ID | Source | Severity | Finding | Location | Decision | Evidence / Ticket | Status |
|---|---|---|---|---|---|---|---|
| Q-001 | quality/multi-review | important | [title] | path:line | undecided |  | open |

Rules:

Add every validated blocking or important finding unless it is fixed during the same quality loop.
Keep Status as open until the finding is fixed, deferred with ticket/reason, or rejected with evidence.
Preserve existing rows and update them in place when a decision is made.
Suggestions and observations do not go into _review_debt.md unless the human explicitly asks to track them.

Requirements

None (both actions work independently)

Actions

cleanup - Code Quality Analysis

Analyze changed code for quality issues and technical debt.

Scope: $ARGUMENTS (or current branch vs main if not specified)

Steps

Determine scope:
- If $ARGUMENTS specifies files/scope, use that
- Otherwise: git diff origin/main...HEAD from current directory
Analyze for issues:
- Dead code - exported but never used functions/classes/constants
- Duplicate code - similar logic in multiple places
- Unclear patterns - confusing imports, magic numbers, missing comments
- Inconsistencies - same thing done differently in different files
- Type safety - missing or incorrect type hints
- Complexity - overly complex patterns that could be simplified
- Documentation - missing docstrings, unclear parameter purposes
- TODOs/FIXMEs - should they be tracked issues instead?

Create cleanup report:

File: [TIMESTAMP_FILE]-cleanup.md in current directory

# Cleanup Analysis
Date: [TIMESTAMP_LOG]
Scope: [what was analyzed]

## Issues Found

### 1. [Issue title]
**Type:** [Dead code/Duplicate/Unclear/etc.]
**Location:** [file:line]
**Priority:** [High/Medium/Low]
**Breaking:** [Yes/No]

**Problem:**
[Description]

**Options:**
1. [Option with pros/cons]
2. [Option with pros/cons]

**Recommendation:** [Which option and why]

[Repeat for each issue]

## Summary

| Issue | Recommendation | Priority | Breaking? |
|-------|---------------|----------|-----------|
| ...   | ...           | ...      | ...       |

## Required Decisions

| ID | Priority | Issue | Recommendation | Decision | Evidence / Ticket | Status |
|----|----------|-------|----------------|----------|-------------------|--------|
| C-001 | High | ... | ... | undecided |  | open |

## Implementation Phases

### Phase 1: High Priority
- [ ] [Action item]

### Phase 2: Medium Priority
- [ ] [Action item]

### Phase 3: Low Priority
- [ ] [Action item]

Update review debt ledger: Add high/medium cleanup issues to _review_debt.md when they remain open after the report.
Report back: Summary of issues found by priority and any required decisions. Do not call cleanup complete while high/medium issues remain undecided.

multi-review - Multi-Perspective Review

Review changes using five specialized perspectives: Future Maintainer, System Architect, Product/User Advocate, one opposite-provider external reviewer (Security & Correctness), and Gemini (Performance & Testing).

Target branch: $ARGUMENTS (defaults to current branch if not specified)

Setup

Before running reviewer passes, set up the review environment:

If $ARGUMENTS is empty or not provided:
- Use current directory, no setup needed
- Working directory for reviewers: current directory

If $ARGUMENTS is specified:

# Fetch to ensure remote branches are available
git fetch origin

# Clean up any existing worktree from previous reviews
WORKTREE_PATH="../review-$ARGUMENTS"
git worktree remove --force "$WORKTREE_PATH" 2>/dev/null || true

# Try local branch first, then remote
if git show-ref --verify --quiet "refs/heads/$ARGUMENTS"; then
  git worktree add "$WORKTREE_PATH" "$ARGUMENTS"
elif git show-ref --verify --quiet "refs/remotes/origin/$ARGUMENTS"; then
  git worktree add "$WORKTREE_PATH" "origin/$ARGUMENTS"
else
  echo "Error: Branch '$ARGUMENTS' not found locally or on origin"
  # Stop here - do not run reviewers
fi

Working directory for reviewers: the worktree path

Prepare change context: Read the commit messages yourself: cd <REVIEW_DIRECTORY> && git log origin/main..HEAD --oneline

IMPORTANT: Summarize the diff before review. If the Task tool is available, spawn a haiku sub-agent (model: haiku) to summarize the diff. If the Task tool is not available, run this as a separate, bounded pass yourself and avoid reviewing findings during the summary pass:
```
Run `cd <REVIEW_DIRECTORY> && git diff origin/main...HEAD` and write a concise
summary of the actual code changes -- what was added, modified, removed, and
the key patterns/logic involved. Keep it to 3-5 sentences. Return ONLY the
summary, nothing else.
```
Combine the commit messages with the haiku agent's diff summary into a change context block. Include it in every reviewer agent's instructions so each understands the purpose and substance of the change. Do not do a full subjective review pass yourself before agents; keep your own pre-synthesis inspection limited to the deterministic checks below.
Run deterministic architecture invariant checks: Run these from the review directory yourself before synthesis. They are not a replacement for reviewer judgment; they catch repeatable structural risks that reviewers often miss.
- New avoncore symbols/importers/name collisions:
  - Identify changed avoncore exports and public symbols: git diff origin/main...HEAD -- avoncore
  - For each new or promoted symbol, count current Python service importers with rg "from avoncore|import avoncore" .
  - Promotion into avoncore requires 2+ current Python service consumers. A frontend mirror does not count. A future consumer means ticket/TODO, not premature promotion.
  - Check for confusing name collisions with existing domain/service symbols using rg "<symbol_name>" .
- Controller/service/repository boundaries:
  - Controllers/routes should not own persistence details or business rules.
  - Services should not reach through repositories into raw database clients unless that is the established local pattern.
  - Repositories should not perform user-facing orchestration, queue publishing, or controller response shaping.
- Queue + DB source-of-truth/write ordering:
  - Identify every path that writes DB state and publishes/consumes queue messages.
  - Verify the source of truth is explicit.
  - Check failure ordering: queue succeeds/DB fails, DB succeeds/queue fails, duplicate delivery, retry after partial failure.
- Background fan-out/concurrency/retry assumptions:
  - Look for gather, task creation, worker pools, batch loops, queue consumers, and webhook fan-out.
  - Verify concurrency is bounded, cancellation/error propagation is intentional, and important work has durable retry or a documented reason it does not need one.
- Duplicate config/default sources:
  - Search for new env vars/defaults in settings, service constructors, frontend config, tests, and docs.
  - Flag duplicate default values or independent parsing paths that can drift.
Record any invariant failure as a blocking or important finding in synthesis, even if no agent independently reports it.

Shared Output Schema

All five agents MUST use this structured format for every finding. Define this once and include it in each agent's instructions:

For each finding, provide ALL of these fields:
- file: [exact file path from the diff]
- lines: [line range, e.g. "42-58"]
- category: [one of: naming, documentation, complexity, coupling, security,
             error-handling, performance, testing, logic, ux]
- severity: [blocking | important | suggestion]
- confidence: [high | medium | low]
- title: [one-line summary, max 80 chars]
- finding: [detailed description, 2-5 sentences]
- evidence: [specific code snippet or reference supporting this finding]
- recommendation: [concrete action -- what should be done instead]

After all findings, include a summary line:
"Total: X findings (Y blocking, Z important, W suggestions)"

Shared Reasoning Protocol

Prepend these instructions to every agent's review task, before their persona-specific instructions:

Before listing findings, follow this reasoning process:
1. Read the diff and understand what this change accomplishes
2. Identify the key operations, data flows, and integration points
3. Evaluate each against your review checklist below
4. Walk the feature edge-case prompts that apply to this diff
5. For each potential concern, verify it is genuine by checking surrounding context
6. Only report findings you can support with specific evidence from the diff

Reviewer Execution Instructions

Run three internal reviewer perspectives, plus one opposite-provider external review and one Gemini review. If the Task tool is available, spawn the three internal sub-agents in parallel. If the Task tool is not available, run the three internal perspectives yourself as separate passes before synthesis. Give each reviewer:

The change context summary from Setup step 3
The review directory path (current directory or worktree path)
The shared reasoning protocol above
The shared output schema above
Their persona-specific instructions below
Instructions to run the git diff command from that directory using cd <path> && git diff origin/main...HEAD

Critical: When a worktree is used, agents MUST cd into the worktree directory before running git commands.

Agent 1: Future Maintainer

ROLE: Senior developer who has just inherited this codebase with zero prior context.

GOAL: Ensure any competent developer can understand, debug, and safely modify
this code six months from now without access to the original author.

BACKSTORY: You have spent 15 years maintaining systems you did not write. You
have been burned by magic constants, undocumented side effects, and functions
that silently depend on global state. You value explicitness over cleverness
and believe the best code needs the fewest comments because its intent is
already obvious from structure and naming.

VOCABULARY: cognitive load, discoverability, naming semantics, implicit
coupling, self-documenting, entry point, call chain readability.

CHECKLIST:
- Unclear or misleading names (variables, functions, files, modules)
- Missing or outdated comments and documentation
- Complex logic that requires mental simulation to follow
- Magic numbers or unexplained constants
- Implicit assumptions not documented at point of use
- Functions doing too many things (violating single responsibility)
- Cognitive load -- how much context must a reader hold simultaneously?
- "Why was this done this way?" moments with no answer in code or comments
- Additionally, flag any other maintainability concern you notice

Agent 2: System Architect

ROLE: Principal engineer responsible for system-wide design coherence and
long-term technical health of the entire codebase.

GOAL: Ensure every change strengthens (or at minimum does not degrade) the
system's structural integrity, consistency, and capacity to evolve.

BACKSTORY: You have designed and evolved distributed systems serving millions
of users. You have seen promising projects collapse under accumulated coupling
and boundary violations. You think in terms of module boundaries, dependency
graphs, and change propagation -- you evaluate each modification by asking
"what does this force me to change next time?"

VOCABULARY: coupling, cohesion, module boundary, dependency direction,
abstraction layer, separation of concerns, extension point, invariant,
contract, change propagation radius.

CHECKLIST:
- Coupling issues -- does this create unwanted dependencies between modules?
- Boundary violations -- is code in the right layer or module?
- Pattern consistency -- does this follow or break established conventions?
- Scalability concerns -- will this approach hold under growth?
- Abstraction quality -- too much, too little, or at the wrong level?
- Single responsibility -- are concerns properly separated?
- Extensibility -- how difficult will it be to modify this path later?
- Error handling strategy -- consistent with the rest of the system?
- New shared package exports -- especially avoncore symbols -- are justified by current consumers, not speculative future use
- Controller/service/repository boundaries are preserved: controllers orchestrate requests, services own business logic, repositories own persistence
- Queue + DB workflows have one explicit source of truth and a defensible write/order/retry strategy
- Background fan-out is bounded, observable, and backed by durable retry assumptions where failure matters
- Config defaults come from one source; no duplicate env/default definitions drift across modules
- Additionally, flag any other architectural concern you notice

Agent 3: Product/User Advocate

ROLE: Staff engineer with deep product sense who represents the end user in
every technical decision.

GOAL: Ensure technical choices serve actual user needs, handle real-world
usage gracefully, and never expose users to preventable confusion or data loss.

BACKSTORY: You spent years as a full-stack engineer building consumer products
before moving into a technical leadership role. You have shipped features that
looked correct in code review but failed in production because nobody
considered what happens when the network drops mid-operation, or what error
message the user sees when a background job silently fails. You evaluate code
by mentally walking through user journeys.

VOCABULARY: user flow, edge case, failure mode, graceful degradation, error
surface, input validation, data integrity, user-facing, silent failure,
recovery path.

CHECKLIST:
- Does this actually solve the intended user problem?
- Edge cases in user flows that are not handled
- Error messages -- are they helpful to users or developer-speak?
- UX implications of technical decisions
- Accessibility concerns
- Data handling -- does it respect user expectations and prevent data loss?
- Failure modes -- what does the user experience when things go wrong?
- Missing validation that could let users reach bad states
- Additionally, flag any other product or user-impact concern you notice

FEATURE EDGE-CASE PROMPTS:
- All validators enabled, a partial validator set enabled, and all validators disabled
- Queue succeeds but DB write fails; DB write succeeds but queue publish fails
- Two users revalidate the same KI or shared entity concurrently
- Large uploads, empty uploads, and unsupported file/content types
- API-key user missing email/name or other human-profile fields
- Retry after partial failure, refresh after background completion, and stale UI state

Agent 4: External Reviewer (Opposite Provider) -- Security & Correctness

This agent runs via Bash tool (not Task tool) in parallel with the other four.

Instructions (for root agent):

Detect current host and choose opposite reviewer:

if [ -n "${CODEX_THREAD_ID:-}" ]; then
  REVIEWER_KIND="claude"
else
  REVIEWER_KIND="codex"
fi

Check reviewer availability:

if [ "$REVIEWER_KIND" = "codex" ]; then
  which codex >/dev/null 2>&1 || echo "CODEX_NOT_INSTALLED"
else
  which claude >/dev/null 2>&1 || echo "CLAUDE_NOT_INSTALLED"
fi

If unavailable, skip and note in synthesis:
- if codex: "Codex review skipped - not installed"
- if claude: "Claude review skipped - not installed"

If available, run (adjust based on input type):

IMPORTANT: Always use 2>&1 (not 2>/dev/null) to capture both stdout and stderr. Always cd into a git repo directory before running the external reviewer so gh commands work.

If $ARGUMENTS is a GitHub PR URL:

PROMPT="You are a security engineer and correctness specialist reviewing a pull request.

CHANGE CONTEXT: <SUMMARY_FROM_SETUP_STEP_3>

PR: <PR_URL>

Use gh CLI to fetch the PR diff, then review with this reasoning process:
1. Read the diff and understand what this change accomplishes
2. Identify security-sensitive operations and correctness-critical logic
3. Evaluate each against the checklist below
4. For each potential concern, verify it is genuine by checking context
5. Only report findings with specific evidence

FOCUS AREAS:

SECURITY:
- Input validation and sanitization gaps
- Authentication and authorization bypass risks
- Data exposure or leakage (logs, errors, API responses)
- Injection vectors (SQL, command, path traversal)
- Secrets or credentials in code
- Unsafe deserialization or file handling
- Think like an attacker first, then like a defender

CORRECTNESS:
- Logic errors, off-by-one, incorrect conditions
- Race conditions or concurrency issues
- Null/undefined handling gaps
- State mutation side effects
- Contract violations between caller and callee
- Boundary conditions and overflow potential
- Queue/DB consistency when one write succeeds and the other fails
- Concurrent revalidation or background fan-out races
- Missing API-key user profile fields such as email/name

For each finding provide ALL fields:
- file: [exact path]
- lines: [line range]
- category: [security | logic | error-handling]
- severity: [blocking | important | suggestion]
- confidence: [high | medium | low]
- title: [one-line summary, max 80 chars]
- finding: [2-5 sentence description]
- evidence: [specific code reference]
- recommendation: [concrete fix]

End with: Total: X findings (Y blocking, Z important, W suggestions)"

cd <REVIEW_DIRECTORY> && \
if [ "$REVIEWER_KIND" = "codex" ]; then
  timeout 900 codex exec --skip-git-repo-check --dangerously-bypass-approvals-and-sandbox "$PROMPT" 2>&1
else
  timeout 900 claude --dangerously-skip-permissions -p "$PROMPT" 2>&1
fi

If reviewing local branch (git diff):

PROMPT="You are a security engineer and correctness specialist reviewing code changes.

CHANGE CONTEXT: <SUMMARY_FROM_SETUP_STEP_3>

Run 'git diff origin/main...HEAD' to get the diff, then review with this reasoning process:
1. Read the diff and understand what this change accomplishes
2. Identify security-sensitive operations and correctness-critical logic
3. Evaluate each against the checklist below
4. For each potential concern, verify it is genuine by checking context
5. Only report findings with specific evidence

FOCUS AREAS:

SECURITY:
- Input validation and sanitization gaps
- Authentication and authorization bypass risks
- Data exposure or leakage (logs, errors, API responses)
- Injection vectors (SQL, command, path traversal)
- Secrets or credentials in code
- Unsafe deserialization or file handling
- Think like an attacker first, then like a defender

CORRECTNESS:
- Logic errors, off-by-one, incorrect conditions
- Race conditions or concurrency issues
- Null/undefined handling gaps
- State mutation side effects
- Contract violations between caller and callee
- Boundary conditions and overflow potential
- Queue/DB consistency when one write succeeds and the other fails
- Concurrent revalidation or background fan-out races
- Missing API-key user profile fields such as email/name

For each finding provide ALL fields:
- file: [exact path]
- lines: [line range]
- category: [security | logic | error-handling]
- severity: [blocking | important | suggestion]
- confidence: [high | medium | low]
- title: [one-line summary, max 80 chars]
- finding: [2-5 sentence description]
- evidence: [specific code reference]
- recommendation: [concrete fix]

End with: Total: X findings (Y blocking, Z important, W suggestions)"

cd <REVIEW_DIRECTORY> && \
if [ "$REVIEWER_KIND" = "codex" ]; then
  timeout 900 codex exec --skip-git-repo-check --dangerously-bypass-approvals-and-sandbox "$PROMPT" 2>&1
else
  timeout 900 claude --dangerously-skip-permissions -p "$PROMPT" 2>&1
fi

Include this external reviewer output in synthesis alongside other reviews

Agent 5: External Reviewer (Gemini) -- Performance & Testing

This agent runs via Bash tool (not Task tool) in parallel with all other agents.

Instructions (for root agent):

Check if Gemini CLI is available:

which gemini >/dev/null 2>&1 || echo "GEMINI_NOT_INSTALLED"

If not installed, skip and note in synthesis: "Gemini review skipped - not installed"

If available, run (adjust based on input type):

IMPORTANT: Always use 2>&1 to capture both stdout and stderr. Always cd into a git repo directory before running gemini so git commands work.

If $ARGUMENTS is a GitHub PR URL:

cd <REVIEW_DIRECTORY> && \
GEMINI_API_KEY=$(grep '^GEMINI_API_KEY=' .env 2>/dev/null | cut -d= -f2-) timeout 900 gemini -p "You are a performance engineer and testing specialist reviewing a pull request.

CHANGE CONTEXT: <SUMMARY_FROM_SETUP_STEP_3>

PR: <PR_URL>

Use gh CLI to fetch the PR diff, then review with this reasoning process:
1. Read the diff and understand what this change accomplishes
2. Identify performance-sensitive paths and testing gaps
3. Evaluate each against the checklist below
4. For each potential concern, verify it is genuine by checking context
5. Only report findings with specific evidence

FOCUS AREAS:

PERFORMANCE:
- Unnecessary allocations, copies, or repeated computation
- O(n^2) or worse algorithms where better alternatives exist
- Missing caching for expensive or repeated operations
- Unbounded growth (lists, queues, caches without limits)
- Blocking operations on hot paths
- I/O in loops, N+1 query patterns
- Resource leaks (unclosed handles, connections, files)

TESTING:
- Are the tests adequate for the scope of these changes?
- Missing test coverage for new code paths
- Missing edge case tests (empty input, boundary values, error paths)
- Missing tests for all/partial/no validators enabled
- Missing tests for queue succeeds/DB fails and DB succeeds/queue fails
- Missing tests for concurrent revalidation of the same KI/shared entity
- Missing tests for large uploads and API-key users missing email/name
- Test quality -- do tests verify behavior or just exercise code?
- Brittle tests that will break on unrelated changes
- Missing integration or contract tests for new interfaces
- Untested error handling paths

For each finding provide ALL fields:
- file: [exact path]
- lines: [line range]
- category: [performance | testing]
- severity: [blocking | important | suggestion]
- confidence: [high | medium | low]
- title: [one-line summary, max 80 chars]
- finding: [2-5 sentence description]
- evidence: [specific code reference]
- recommendation: [concrete fix]

End with: Total: X findings (Y blocking, Z important, W suggestions)" --yolo 2>&1

If reviewing local branch (git diff):

cd <REVIEW_DIRECTORY> && \
GEMINI_API_KEY=$(grep '^GEMINI_API_KEY=' .env 2>/dev/null | cut -d= -f2-) timeout 900 gemini -p "You are a performance engineer and testing specialist reviewing code changes.

CHANGE CONTEXT: <SUMMARY_FROM_SETUP_STEP_3>

Run 'git diff origin/main...HEAD' to get the diff, then review with this reasoning process:
1. Read the diff and understand what this change accomplishes
2. Identify performance-sensitive paths and testing gaps
3. Evaluate each against the checklist below
4. For each potential concern, verify it is genuine by checking context
5. Only report findings with specific evidence

FOCUS AREAS:

PERFORMANCE:
- Unnecessary allocations, copies, or repeated computation
- O(n^2) or worse algorithms where better alternatives exist
- Missing caching for expensive or repeated operations
- Unbounded growth (lists, queues, caches without limits)
- Blocking operations on hot paths
- I/O in loops, N+1 query patterns
- Resource leaks (unclosed handles, connections, files)

TESTING:
- Are the tests adequate for the scope of these changes?
- Missing test coverage for new code paths
- Missing edge case tests (empty input, boundary values, error paths)
- Missing tests for all/partial/no validators enabled
- Missing tests for queue succeeds/DB fails and DB succeeds/queue fails
- Missing tests for concurrent revalidation of the same KI/shared entity
- Missing tests for large uploads and API-key users missing email/name
- Test quality -- do tests verify behavior or just exercise code?
- Brittle tests that will break on unrelated changes
- Missing integration or contract tests for new interfaces
- Untested error handling paths

For each finding provide ALL fields:
- file: [exact path]
- lines: [line range]
- category: [performance | testing]
- severity: [blocking | important | suggestion]
- confidence: [high | medium | low]
- title: [one-line summary, max 80 chars]
- finding: [2-5 sentence description]
- evidence: [specific code reference]
- recommendation: [concrete fix]

End with: Total: X findings (Y blocking, Z important, W suggestions)" --yolo 2>&1

Include Gemini output in synthesis alongside other reviews

Synthesis

After receiving all five reviews (three internal perspectives + opposite-provider external reviewer + Gemini):

Step 1: Agreement Classification

Using the structured output fields (file + lines + category), classify every finding by agreement level:

Agreement Level	Criteria	Action
Strong consensus	3+ agents flagged the same issue	Accept without challenge -- reliability >95%
Corroborated	2 agents flagged the same issue	Accept with standard challenge (Step 2)
Individual	1 agent only, high confidence	Include with mandatory challenge (Step 2)
Observation	1 agent only, medium or low confidence	Include as observation only -- do not promote to action items
Contested	Agents explicitly disagree	Present both sides -- human decides

When matching findings across agents: two findings refer to the same issue if they reference the same file AND overlapping line ranges AND the same or closely related category. Err toward merging when in doubt.

Step 2: Adversarial Validation

For every finding rated blocking or important after Step 1, apply a challenge protocol scaled by agreement level:

Strong consensus (3+ agents) -- no challenge needed: Accept as-is. Multiple independent perspectives agreeing is strong evidence.

Corroborated (2 agents) -- standard challenge:

Steel-man: "The strongest case for this being a real issue is..."
Challenge: "However, context that might make this acceptable includes..."
Verdict: Keep severity, downgrade, or remove with brief justification.

Individual (1 agent) -- mandatory challenge with evidence requirement:

Steel-man: "The strongest case for this being a real issue is..."
Challenge: "However, context that might make this acceptable includes..."
Evidence check: Does the finding cite a specific code snippet? Is the concern verifiable from the diff?
Verdict: Keep (with evidence), downgrade to suggestion, or remove as false positive.

Step 3: Categorize and Organize

Group validated findings into:

Blocking -- must fix before merge
Important -- should fix, real risk or significant improvement
Suggestions -- minor improvements, style, polish
Observations -- low-confidence individual findings preserved for awareness

Note: preserve any explicit strengths or positive observations from agents -- things that are done well and should NOT be changed.

Step 4: Create Action Plan

Number each finding and order by priority (blocking first, then important). For each, state the concrete fix required. This becomes the implementation checklist if the human approves fixes.

Step 5: Required Decisions

Create a Required Decisions table for every blocking or important finding:

ID	Severity	Finding	Recommended fix	Decision	Evidence / Ticket	Status
Q-001	important	[title]	[concrete action]	undecided		open

Decision values are only:

fix now
defer
reject
undecided

Rules:

Default every blocking/important finding to undecided unless the human has already provided an explicit decision.
defer requires a ticket/link or concrete follow-up reason/owner.
reject requires evidence from the codebase, product requirement, or explicit human direction.
If any row remains undecided, quality is not done. Report that decisions are required and stop.
Update _review_debt.md with every row whose status is still open. If a row is fixed in the same loop, mark it fixed with the commit/test evidence instead of leaving it open.

Step 6: Human Review Guide

Create a short guide for the human reviewer: what this PR is about, what sequence to review files in, and any context needed. Keep it concise.

Final Output Format

Present the synthesized review as:

Summary (2-3 sentences on overall change quality)
Human review guide
Blocking issues (if any)
Important issues
Required Decisions table
_review_debt.md updates made
Suggestions
Observations (low-confidence, for awareness)
What's good / don't change

Each finding in sections 3-4 and 7-8 should include: agreement level tag, file, lines, title, description, and recommended fix.

Cleanup

After all reviewers complete and synthesis is done:

# Only if a worktree was created
git worktree remove --force "../review-$ARGUMENTS" 2>/dev/null || true

Important

Do NOT start fixing anything. After presenting the review, STOP and wait for explicit confirmation on which blocking/important items to fix, defer, or reject. If any blocking or important item remains undecided, say clearly that quality is pending and must not be marked done.

name	quality
description	Code quality analysis through cleanup analysis, multi-perspective reviews, and explicit finding decisions.

Quality

Analyze code quality through different lenses: cleanup analysis for technical debt and multi-perspective reviews for holistic assessment.

Important Finding Policy

Any blocking or important finding is not complete when it is merely documented. It must receive an explicit decision:

Decision	Required evidence
Fix now	Link or describe the code/test change that will resolve it
Defer	Ticket/link or concrete follow-up owner/reason
Reject	Evidence that the finding is false, inapplicable, or intentionally accepted

Do not mark quality work done while blocking or important findings remain undecided.

Review Debt Ledger

Ledger rows:

| ID | Source | Severity | Finding | Location | Decision | Evidence / Ticket | Status |
|---|---|---|---|---|---|---|---|
| Q-001 | quality/multi-review | important | [title] | path:line | undecided |  | open |

Rules:

Add every validated blocking or important finding unless it is fixed during the same quality loop.
Keep Status as open until the finding is fixed, deferred with ticket/reason, or rejected with evidence.
Preserve existing rows and update them in place when a decision is made.
Suggestions and observations do not go into _review_debt.md unless the human explicitly asks to track them.

Requirements

None (both actions work independently)

Actions

cleanup - Code Quality Analysis

Analyze changed code for quality issues and technical debt.

Scope: $ARGUMENTS (or current branch vs main if not specified)

Steps

Determine scope:
- If $ARGUMENTS specifies files/scope, use that
- Otherwise: git diff origin/main...HEAD from current directory
Analyze for issues:
- Dead code - exported but never used functions/classes/constants
- Duplicate code - similar logic in multiple places
- Unclear patterns - confusing imports, magic numbers, missing comments
- Inconsistencies - same thing done differently in different files
- Type safety - missing or incorrect type hints
- Complexity - overly complex patterns that could be simplified
- Documentation - missing docstrings, unclear parameter purposes
- TODOs/FIXMEs - should they be tracked issues instead?

Create cleanup report:

File: [TIMESTAMP_FILE]-cleanup.md in current directory

# Cleanup Analysis
Date: [TIMESTAMP_LOG]
Scope: [what was analyzed]

## Issues Found

### 1. [Issue title]
**Type:** [Dead code/Duplicate/Unclear/etc.]
**Location:** [file:line]
**Priority:** [High/Medium/Low]
**Breaking:** [Yes/No]

**Problem:**
[Description]

**Options:**
1. [Option with pros/cons]
2. [Option with pros/cons]

**Recommendation:** [Which option and why]

[Repeat for each issue]

## Summary

| Issue | Recommendation | Priority | Breaking? |
|-------|---------------|----------|-----------|
| ...   | ...           | ...      | ...       |

## Required Decisions

| ID | Priority | Issue | Recommendation | Decision | Evidence / Ticket | Status |
|----|----------|-------|----------------|----------|-------------------|--------|
| C-001 | High | ... | ... | undecided |  | open |

## Implementation Phases

### Phase 1: High Priority
- [ ] [Action item]

### Phase 2: Medium Priority
- [ ] [Action item]

### Phase 3: Low Priority
- [ ] [Action item]

Update review debt ledger: Add high/medium cleanup issues to _review_debt.md when they remain open after the report.
Report back: Summary of issues found by priority and any required decisions. Do not call cleanup complete while high/medium issues remain undecided.

multi-review - Multi-Perspective Review

Target branch: $ARGUMENTS (defaults to current branch if not specified)

Setup

Before running reviewer passes, set up the review environment:

If $ARGUMENTS is empty or not provided:
- Use current directory, no setup needed
- Working directory for reviewers: current directory

If $ARGUMENTS is specified:

# Fetch to ensure remote branches are available
git fetch origin

# Clean up any existing worktree from previous reviews
WORKTREE_PATH="../review-$ARGUMENTS"
git worktree remove --force "$WORKTREE_PATH" 2>/dev/null || true

# Try local branch first, then remote
if git show-ref --verify --quiet "refs/heads/$ARGUMENTS"; then
  git worktree add "$WORKTREE_PATH" "$ARGUMENTS"
elif git show-ref --verify --quiet "refs/remotes/origin/$ARGUMENTS"; then
  git worktree add "$WORKTREE_PATH" "origin/$ARGUMENTS"
else
  echo "Error: Branch '$ARGUMENTS' not found locally or on origin"
  # Stop here - do not run reviewers
fi

Working directory for reviewers: the worktree path

Prepare change context: Read the commit messages yourself: cd <REVIEW_DIRECTORY> && git log origin/main..HEAD --oneline

IMPORTANT: Summarize the diff before review. If the Task tool is available, spawn a haiku sub-agent (model: haiku) to summarize the diff. If the Task tool is not available, run this as a separate, bounded pass yourself and avoid reviewing findings during the summary pass:
```
Run `cd <REVIEW_DIRECTORY> && git diff origin/main...HEAD` and write a concise
summary of the actual code changes -- what was added, modified, removed, and
the key patterns/logic involved. Keep it to 3-5 sentences. Return ONLY the
summary, nothing else.
```
Combine the commit messages with the haiku agent's diff summary into a change context block. Include it in every reviewer agent's instructions so each understands the purpose and substance of the change. Do not do a full subjective review pass yourself before agents; keep your own pre-synthesis inspection limited to the deterministic checks below.
Run deterministic architecture invariant checks: Run these from the review directory yourself before synthesis. They are not a replacement for reviewer judgment; they catch repeatable structural risks that reviewers often miss.
- New avoncore symbols/importers/name collisions:
  - Identify changed avoncore exports and public symbols: git diff origin/main...HEAD -- avoncore
  - For each new or promoted symbol, count current Python service importers with rg "from avoncore|import avoncore" .
  - Promotion into avoncore requires 2+ current Python service consumers. A frontend mirror does not count. A future consumer means ticket/TODO, not premature promotion.
  - Check for confusing name collisions with existing domain/service symbols using rg "<symbol_name>" .
- Controller/service/repository boundaries:
  - Controllers/routes should not own persistence details or business rules.
  - Services should not reach through repositories into raw database clients unless that is the established local pattern.
  - Repositories should not perform user-facing orchestration, queue publishing, or controller response shaping.
- Queue + DB source-of-truth/write ordering:
  - Identify every path that writes DB state and publishes/consumes queue messages.
  - Verify the source of truth is explicit.
  - Check failure ordering: queue succeeds/DB fails, DB succeeds/queue fails, duplicate delivery, retry after partial failure.
- Background fan-out/concurrency/retry assumptions:
  - Look for gather, task creation, worker pools, batch loops, queue consumers, and webhook fan-out.
  - Verify concurrency is bounded, cancellation/error propagation is intentional, and important work has durable retry or a documented reason it does not need one.
- Duplicate config/default sources:
  - Search for new env vars/defaults in settings, service constructors, frontend config, tests, and docs.
  - Flag duplicate default values or independent parsing paths that can drift.
Record any invariant failure as a blocking or important finding in synthesis, even if no agent independently reports it.

Shared Output Schema

All five agents MUST use this structured format for every finding. Define this once and include it in each agent's instructions:

For each finding, provide ALL of these fields:
- file: [exact file path from the diff]
- lines: [line range, e.g. "42-58"]
- category: [one of: naming, documentation, complexity, coupling, security,
             error-handling, performance, testing, logic, ux]
- severity: [blocking | important | suggestion]
- confidence: [high | medium | low]
- title: [one-line summary, max 80 chars]
- finding: [detailed description, 2-5 sentences]
- evidence: [specific code snippet or reference supporting this finding]
- recommendation: [concrete action -- what should be done instead]

After all findings, include a summary line:
"Total: X findings (Y blocking, Z important, W suggestions)"

Shared Reasoning Protocol

Prepend these instructions to every agent's review task, before their persona-specific instructions:

Before listing findings, follow this reasoning process:
1. Read the diff and understand what this change accomplishes
2. Identify the key operations, data flows, and integration points
3. Evaluate each against your review checklist below
4. Walk the feature edge-case prompts that apply to this diff
5. For each potential concern, verify it is genuine by checking surrounding context
6. Only report findings you can support with specific evidence from the diff

Reviewer Execution Instructions

The change context summary from Setup step 3
The review directory path (current directory or worktree path)
The shared reasoning protocol above
The shared output schema above
Their persona-specific instructions below
Instructions to run the git diff command from that directory using cd <path> && git diff origin/main...HEAD

Critical: When a worktree is used, agents MUST cd into the worktree directory before running git commands.

Agent 1: Future Maintainer

ROLE: Senior developer who has just inherited this codebase with zero prior context.

GOAL: Ensure any competent developer can understand, debug, and safely modify
this code six months from now without access to the original author.

BACKSTORY: You have spent 15 years maintaining systems you did not write. You
have been burned by magic constants, undocumented side effects, and functions
that silently depend on global state. You value explicitness over cleverness
and believe the best code needs the fewest comments because its intent is
already obvious from structure and naming.

VOCABULARY: cognitive load, discoverability, naming semantics, implicit
coupling, self-documenting, entry point, call chain readability.

CHECKLIST:
- Unclear or misleading names (variables, functions, files, modules)
- Missing or outdated comments and documentation
- Complex logic that requires mental simulation to follow
- Magic numbers or unexplained constants
- Implicit assumptions not documented at point of use
- Functions doing too many things (violating single responsibility)
- Cognitive load -- how much context must a reader hold simultaneously?
- "Why was this done this way?" moments with no answer in code or comments
- Additionally, flag any other maintainability concern you notice

Agent 2: System Architect

ROLE: Principal engineer responsible for system-wide design coherence and
long-term technical health of the entire codebase.

GOAL: Ensure every change strengthens (or at minimum does not degrade) the
system's structural integrity, consistency, and capacity to evolve.

BACKSTORY: You have designed and evolved distributed systems serving millions
of users. You have seen promising projects collapse under accumulated coupling
and boundary violations. You think in terms of module boundaries, dependency
graphs, and change propagation -- you evaluate each modification by asking
"what does this force me to change next time?"

VOCABULARY: coupling, cohesion, module boundary, dependency direction,
abstraction layer, separation of concerns, extension point, invariant,
contract, change propagation radius.

CHECKLIST:
- Coupling issues -- does this create unwanted dependencies between modules?
- Boundary violations -- is code in the right layer or module?
- Pattern consistency -- does this follow or break established conventions?
- Scalability concerns -- will this approach hold under growth?
- Abstraction quality -- too much, too little, or at the wrong level?
- Single responsibility -- are concerns properly separated?
- Extensibility -- how difficult will it be to modify this path later?
- Error handling strategy -- consistent with the rest of the system?
- New shared package exports -- especially avoncore symbols -- are justified by current consumers, not speculative future use
- Controller/service/repository boundaries are preserved: controllers orchestrate requests, services own business logic, repositories own persistence
- Queue + DB workflows have one explicit source of truth and a defensible write/order/retry strategy
- Background fan-out is bounded, observable, and backed by durable retry assumptions where failure matters
- Config defaults come from one source; no duplicate env/default definitions drift across modules
- Additionally, flag any other architectural concern you notice

Agent 3: Product/User Advocate

ROLE: Staff engineer with deep product sense who represents the end user in
every technical decision.

GOAL: Ensure technical choices serve actual user needs, handle real-world
usage gracefully, and never expose users to preventable confusion or data loss.

BACKSTORY: You spent years as a full-stack engineer building consumer products
before moving into a technical leadership role. You have shipped features that
looked correct in code review but failed in production because nobody
considered what happens when the network drops mid-operation, or what error
message the user sees when a background job silently fails. You evaluate code
by mentally walking through user journeys.

VOCABULARY: user flow, edge case, failure mode, graceful degradation, error
surface, input validation, data integrity, user-facing, silent failure,
recovery path.

CHECKLIST:
- Does this actually solve the intended user problem?
- Edge cases in user flows that are not handled
- Error messages -- are they helpful to users or developer-speak?
- UX implications of technical decisions
- Accessibility concerns
- Data handling -- does it respect user expectations and prevent data loss?
- Failure modes -- what does the user experience when things go wrong?
- Missing validation that could let users reach bad states
- Additionally, flag any other product or user-impact concern you notice

FEATURE EDGE-CASE PROMPTS:
- All validators enabled, a partial validator set enabled, and all validators disabled
- Queue succeeds but DB write fails; DB write succeeds but queue publish fails
- Two users revalidate the same KI or shared entity concurrently
- Large uploads, empty uploads, and unsupported file/content types
- API-key user missing email/name or other human-profile fields
- Retry after partial failure, refresh after background completion, and stale UI state

Agent 4: External Reviewer (Opposite Provider) -- Security & Correctness

This agent runs via Bash tool (not Task tool) in parallel with the other four.

Instructions (for root agent):

Detect current host and choose opposite reviewer:

if [ -n "${CODEX_THREAD_ID:-}" ]; then
  REVIEWER_KIND="claude"
else
  REVIEWER_KIND="codex"
fi

Check reviewer availability:

if [ "$REVIEWER_KIND" = "codex" ]; then
  which codex >/dev/null 2>&1 || echo "CODEX_NOT_INSTALLED"
else
  which claude >/dev/null 2>&1 || echo "CLAUDE_NOT_INSTALLED"
fi

If unavailable, skip and note in synthesis:
- if codex: "Codex review skipped - not installed"
- if claude: "Claude review skipped - not installed"

If available, run (adjust based on input type):

IMPORTANT: Always use 2>&1 (not 2>/dev/null) to capture both stdout and stderr. Always cd into a git repo directory before running the external reviewer so gh commands work.

If $ARGUMENTS is a GitHub PR URL:

PROMPT="You are a security engineer and correctness specialist reviewing a pull request.

CHANGE CONTEXT: <SUMMARY_FROM_SETUP_STEP_3>

PR: <PR_URL>

Use gh CLI to fetch the PR diff, then review with this reasoning process:
1. Read the diff and understand what this change accomplishes
2. Identify security-sensitive operations and correctness-critical logic
3. Evaluate each against the checklist below
4. For each potential concern, verify it is genuine by checking context
5. Only report findings with specific evidence

FOCUS AREAS:

SECURITY:
- Input validation and sanitization gaps
- Authentication and authorization bypass risks
- Data exposure or leakage (logs, errors, API responses)
- Injection vectors (SQL, command, path traversal)
- Secrets or credentials in code
- Unsafe deserialization or file handling
- Think like an attacker first, then like a defender

CORRECTNESS:
- Logic errors, off-by-one, incorrect conditions
- Race conditions or concurrency issues
- Null/undefined handling gaps
- State mutation side effects
- Contract violations between caller and callee
- Boundary conditions and overflow potential
- Queue/DB consistency when one write succeeds and the other fails
- Concurrent revalidation or background fan-out races
- Missing API-key user profile fields such as email/name

For each finding provide ALL fields:
- file: [exact path]
- lines: [line range]
- category: [security | logic | error-handling]
- severity: [blocking | important | suggestion]
- confidence: [high | medium | low]
- title: [one-line summary, max 80 chars]
- finding: [2-5 sentence description]
- evidence: [specific code reference]
- recommendation: [concrete fix]

End with: Total: X findings (Y blocking, Z important, W suggestions)"

cd <REVIEW_DIRECTORY> && \
if [ "$REVIEWER_KIND" = "codex" ]; then
  timeout 900 codex exec --skip-git-repo-check --dangerously-bypass-approvals-and-sandbox "$PROMPT" 2>&1
else
  timeout 900 claude --dangerously-skip-permissions -p "$PROMPT" 2>&1
fi

If reviewing local branch (git diff):

PROMPT="You are a security engineer and correctness specialist reviewing code changes.

CHANGE CONTEXT: <SUMMARY_FROM_SETUP_STEP_3>

Run 'git diff origin/main...HEAD' to get the diff, then review with this reasoning process:
1. Read the diff and understand what this change accomplishes
2. Identify security-sensitive operations and correctness-critical logic
3. Evaluate each against the checklist below
4. For each potential concern, verify it is genuine by checking context
5. Only report findings with specific evidence

FOCUS AREAS:

SECURITY:
- Input validation and sanitization gaps
- Authentication and authorization bypass risks
- Data exposure or leakage (logs, errors, API responses)
- Injection vectors (SQL, command, path traversal)
- Secrets or credentials in code
- Unsafe deserialization or file handling
- Think like an attacker first, then like a defender

CORRECTNESS:
- Logic errors, off-by-one, incorrect conditions
- Race conditions or concurrency issues
- Null/undefined handling gaps
- State mutation side effects
- Contract violations between caller and callee
- Boundary conditions and overflow potential
- Queue/DB consistency when one write succeeds and the other fails
- Concurrent revalidation or background fan-out races
- Missing API-key user profile fields such as email/name

For each finding provide ALL fields:
- file: [exact path]
- lines: [line range]
- category: [security | logic | error-handling]
- severity: [blocking | important | suggestion]
- confidence: [high | medium | low]
- title: [one-line summary, max 80 chars]
- finding: [2-5 sentence description]
- evidence: [specific code reference]
- recommendation: [concrete fix]

End with: Total: X findings (Y blocking, Z important, W suggestions)"

cd <REVIEW_DIRECTORY> && \
if [ "$REVIEWER_KIND" = "codex" ]; then
  timeout 900 codex exec --skip-git-repo-check --dangerously-bypass-approvals-and-sandbox "$PROMPT" 2>&1
else
  timeout 900 claude --dangerously-skip-permissions -p "$PROMPT" 2>&1
fi

Include this external reviewer output in synthesis alongside other reviews

Agent 5: External Reviewer (Gemini) -- Performance & Testing

This agent runs via Bash tool (not Task tool) in parallel with all other agents.

Instructions (for root agent):

Check if Gemini CLI is available:

which gemini >/dev/null 2>&1 || echo "GEMINI_NOT_INSTALLED"

If not installed, skip and note in synthesis: "Gemini review skipped - not installed"

If available, run (adjust based on input type):

IMPORTANT: Always use 2>&1 to capture both stdout and stderr. Always cd into a git repo directory before running gemini so git commands work.

If $ARGUMENTS is a GitHub PR URL:

cd <REVIEW_DIRECTORY> && \
GEMINI_API_KEY=$(grep '^GEMINI_API_KEY=' .env 2>/dev/null | cut -d= -f2-) timeout 900 gemini -p "You are a performance engineer and testing specialist reviewing a pull request.

CHANGE CONTEXT: <SUMMARY_FROM_SETUP_STEP_3>

PR: <PR_URL>

Use gh CLI to fetch the PR diff, then review with this reasoning process:
1. Read the diff and understand what this change accomplishes
2. Identify performance-sensitive paths and testing gaps
3. Evaluate each against the checklist below
4. For each potential concern, verify it is genuine by checking context
5. Only report findings with specific evidence

FOCUS AREAS:

PERFORMANCE:
- Unnecessary allocations, copies, or repeated computation
- O(n^2) or worse algorithms where better alternatives exist
- Missing caching for expensive or repeated operations
- Unbounded growth (lists, queues, caches without limits)
- Blocking operations on hot paths
- I/O in loops, N+1 query patterns
- Resource leaks (unclosed handles, connections, files)

TESTING:
- Are the tests adequate for the scope of these changes?
- Missing test coverage for new code paths
- Missing edge case tests (empty input, boundary values, error paths)
- Missing tests for all/partial/no validators enabled
- Missing tests for queue succeeds/DB fails and DB succeeds/queue fails
- Missing tests for concurrent revalidation of the same KI/shared entity
- Missing tests for large uploads and API-key users missing email/name
- Test quality -- do tests verify behavior or just exercise code?
- Brittle tests that will break on unrelated changes
- Missing integration or contract tests for new interfaces
- Untested error handling paths

For each finding provide ALL fields:
- file: [exact path]
- lines: [line range]
- category: [performance | testing]
- severity: [blocking | important | suggestion]
- confidence: [high | medium | low]
- title: [one-line summary, max 80 chars]
- finding: [2-5 sentence description]
- evidence: [specific code reference]
- recommendation: [concrete fix]

End with: Total: X findings (Y blocking, Z important, W suggestions)" --yolo 2>&1

If reviewing local branch (git diff):

cd <REVIEW_DIRECTORY> && \
GEMINI_API_KEY=$(grep '^GEMINI_API_KEY=' .env 2>/dev/null | cut -d= -f2-) timeout 900 gemini -p "You are a performance engineer and testing specialist reviewing code changes.

CHANGE CONTEXT: <SUMMARY_FROM_SETUP_STEP_3>

Run 'git diff origin/main...HEAD' to get the diff, then review with this reasoning process:
1. Read the diff and understand what this change accomplishes
2. Identify performance-sensitive paths and testing gaps
3. Evaluate each against the checklist below
4. For each potential concern, verify it is genuine by checking context
5. Only report findings with specific evidence

FOCUS AREAS:

PERFORMANCE:
- Unnecessary allocations, copies, or repeated computation
- O(n^2) or worse algorithms where better alternatives exist
- Missing caching for expensive or repeated operations
- Unbounded growth (lists, queues, caches without limits)
- Blocking operations on hot paths
- I/O in loops, N+1 query patterns
- Resource leaks (unclosed handles, connections, files)

TESTING:
- Are the tests adequate for the scope of these changes?
- Missing test coverage for new code paths
- Missing edge case tests (empty input, boundary values, error paths)
- Missing tests for all/partial/no validators enabled
- Missing tests for queue succeeds/DB fails and DB succeeds/queue fails
- Missing tests for concurrent revalidation of the same KI/shared entity
- Missing tests for large uploads and API-key users missing email/name
- Test quality -- do tests verify behavior or just exercise code?
- Brittle tests that will break on unrelated changes
- Missing integration or contract tests for new interfaces
- Untested error handling paths

For each finding provide ALL fields:
- file: [exact path]
- lines: [line range]
- category: [performance | testing]
- severity: [blocking | important | suggestion]
- confidence: [high | medium | low]
- title: [one-line summary, max 80 chars]
- finding: [2-5 sentence description]
- evidence: [specific code reference]
- recommendation: [concrete fix]

End with: Total: X findings (Y blocking, Z important, W suggestions)" --yolo 2>&1

Include Gemini output in synthesis alongside other reviews

Synthesis

After receiving all five reviews (three internal perspectives + opposite-provider external reviewer + Gemini):

Step 1: Agreement Classification

Using the structured output fields (file + lines + category), classify every finding by agreement level:

Agreement Level	Criteria	Action
Strong consensus	3+ agents flagged the same issue	Accept without challenge -- reliability >95%
Corroborated	2 agents flagged the same issue	Accept with standard challenge (Step 2)
Individual	1 agent only, high confidence	Include with mandatory challenge (Step 2)
Observation	1 agent only, medium or low confidence	Include as observation only -- do not promote to action items
Contested	Agents explicitly disagree	Present both sides -- human decides

Step 2: Adversarial Validation

For every finding rated blocking or important after Step 1, apply a challenge protocol scaled by agreement level:

Strong consensus (3+ agents) -- no challenge needed: Accept as-is. Multiple independent perspectives agreeing is strong evidence.

Corroborated (2 agents) -- standard challenge:

Steel-man: "The strongest case for this being a real issue is..."
Challenge: "However, context that might make this acceptable includes..."
Verdict: Keep severity, downgrade, or remove with brief justification.

Individual (1 agent) -- mandatory challenge with evidence requirement:

Steel-man: "The strongest case for this being a real issue is..."
Challenge: "However, context that might make this acceptable includes..."
Evidence check: Does the finding cite a specific code snippet? Is the concern verifiable from the diff?
Verdict: Keep (with evidence), downgrade to suggestion, or remove as false positive.

Step 3: Categorize and Organize

Group validated findings into:

Blocking -- must fix before merge
Important -- should fix, real risk or significant improvement
Suggestions -- minor improvements, style, polish
Observations -- low-confidence individual findings preserved for awareness

Note: preserve any explicit strengths or positive observations from agents -- things that are done well and should NOT be changed.

Step 4: Create Action Plan

Number each finding and order by priority (blocking first, then important). For each, state the concrete fix required. This becomes the implementation checklist if the human approves fixes.

Step 5: Required Decisions

Create a Required Decisions table for every blocking or important finding:

ID	Severity	Finding	Recommended fix	Decision	Evidence / Ticket	Status
Q-001	important	[title]	[concrete action]	undecided		open

Decision values are only:

fix now
defer
reject
undecided

Rules:

Default every blocking/important finding to undecided unless the human has already provided an explicit decision.
defer requires a ticket/link or concrete follow-up reason/owner.
reject requires evidence from the codebase, product requirement, or explicit human direction.
If any row remains undecided, quality is not done. Report that decisions are required and stop.
Update _review_debt.md with every row whose status is still open. If a row is fixed in the same loop, mark it fixed with the commit/test evidence instead of leaving it open.

Step 6: Human Review Guide

Create a short guide for the human reviewer: what this PR is about, what sequence to review files in, and any context needed. Keep it concise.

Final Output Format

Present the synthesized review as:

Summary (2-3 sentences on overall change quality)
Human review guide
Blocking issues (if any)
Important issues
Required Decisions table
_review_debt.md updates made
Suggestions
Observations (low-confidence, for awareness)
What's good / don't change

Each finding in sections 3-4 and 7-8 should include: agreement level tag, file, lines, title, description, and recommended fix.

Cleanup

After all reviewers complete and synthesis is done:

# Only if a worktree was created
git worktree remove --force "../review-$ARGUMENTS" 2>/dev/null || true

quality

More from this repository

More from this repository

Quality

Important Finding Policy

Review Debt Ledger

Requirements

Actions

cleanup - Code Quality Analysis

Steps

multi-review - Multi-Perspective Review

Setup

Shared Output Schema

Shared Reasoning Protocol

Reviewer Execution Instructions

Synthesis

Step 1: Agreement Classification

Step 2: Adversarial Validation

Step 3: Categorize and Organize

Step 4: Create Action Plan

Step 5: Required Decisions

Step 6: Human Review Guide

Final Output Format

Cleanup

Important

Quality

Important Finding Policy

Review Debt Ledger

Requirements

Actions

cleanup - Code Quality Analysis

Steps

multi-review - Multi-Perspective Review

Setup

Shared Output Schema

Shared Reasoning Protocol

Reviewer Execution Instructions

Synthesis

Step 1: Agreement Classification

Step 2: Adversarial Validation

Step 3: Categorize and Organize

Step 4: Create Action Plan

Step 5: Required Decisions

Step 6: Human Review Guide

Final Output Format

Cleanup

Important