with one click
vg-review
// Post-build review — code scan + browser discovery + fix loop + goal comparison → RUNTIME-MAP
// Post-build review — code scan + browser discovery + fix loop + goal comparison → RUNTIME-MAP
[HINT] Download the complete skill directory including SKILL.md and all related files
| name | vg-review |
| description | Post-build review — code scan + browser discovery + fix loop + goal comparison → RUNTIME-MAP |
| metadata | {"short-description":"Post-build review — code scan + browser discovery + fix loop + goal comparison → RUNTIME-MAP"} |
<codex_skill_adapter>
This skill body is generated from VGFlow's canonical source. Claude Code and Codex use the same workflow contracts, but their orchestration primitives differ.
When this skill is running inside Codex, DO NOT switch to Claude CLI to execute
the workflow entrypoint. Keep the current Codex runtime, export
VG_RUNTIME=codex, use Codex update_plan for the compact visible task
window, and bind it with vg-orchestrator tasklist-projected --adapter codex.
.claude/scripts/* and .claude/commands/* are canonical VGFlow source
paths shared by both adapters; those paths do not mean the runtime changed to
Claude. References below to "Claude CLI", TodoWrite, or Haiku describe the
Claude adapter only. Codex must map them through this adapter contract instead
of aborting the current run and relaunching Claude.
| Claude Code concept | Codex-compatible pattern | Notes |
|---|---|---|
| AskUserQuestion | Ask concise questions in the main Codex thread | Codex does not expose the same structured prompt tool inside generated skills. Persist answers where the skill requires it; prefer Codex-native options such as codex-inline when the source prompt distinguishes providers. |
| Agent(...) / Task | Prefer commands/vg/_shared/lib/codex-spawn.sh or native Codex subagents | Use codex exec when exact model, timeout, output file, or schema control matters. |
| TaskCreate / TaskUpdate / TodoWrite | Compact Codex plan window + orchestrator step markers | Use tasklist-contract.json as source of truth. Do not paste the full hierarchy into Codex update_plan. Show at most 6 rows: active group/step first, next 2-3 pending steps, completed groups collapsed, and +N pending. After projecting, emit vg-orchestrator tasklist-projected --adapter codex. |
| Playwright MCP | Main Codex orchestrator MCP tools, or smoke-tested subagents | If an MCP-using subagent cannot access tools in a target environment, fall back to orchestrator-driven/inline scanner flow. |
| Graphify MCP | Python/CLI graphify calls | VGFlow's build/review paths already use deterministic scripts where possible. |
<codex_runtime_contract>
This generated skill must preserve the source command's artifacts, gates, telemetry events, and step ordering on both Claude and Codex. Do not remove, skip, or weaken a source workflow step because a Claude-only primitive appears in the body below.
| Source pattern | Claude path | Codex path |
|---|---|---|
| Planner/research/checker Agent | Use the source Agent(...) call and configured model tier | Use native Codex subagents only if the local Codex version has been smoke-tested; otherwise write the child prompt to a temp file and call commands/vg/_shared/lib/codex-spawn.sh --tier planner |
| Build executor Agent | Use the source executor Agent(...) call | Use codex-spawn.sh --tier executor --sandbox workspace-write with explicit file ownership and expected artifact output |
| Adversarial/CrossAI reviewer | Use configured external CLIs and consensus validators | Use configured codex exec/Gemini/Claude commands from .claude/vg.config.md; fail if required CLI output is missing or unparsable |
| Haiku scanner / Playwright / Maestro / MCP-heavy work | Use Claude subagents where the source command requires them | Keep MCP-heavy work in the main Codex orchestrator unless child MCP access was smoke-tested; scanner work may run inline/sequential instead of parallel, but must write the same scan artifacts and events |
| Reflection / learning | Use vg-reflector workflow | Use the Codex vg-reflector adapter or codex-spawn.sh --tier scanner; candidates still require the same user gate |
Claude Code has a project-local hook substrate; Codex skills do not receive
Claude UserPromptSubmit, Stop, or PostToolUse hooks automatically.
Therefore Codex must execute the lifecycle explicitly through the same
orchestrator that writes .vg/events.db:
| Claude hook | What it does on Claude | Codex obligation |
|---|---|---|
UserPromptSubmit -> vg-entry-hook.py | Pre-seeds vg-orchestrator run-start and .vg/.session-context.json before the skill loads | Treat the command body's explicit vg-orchestrator run-start as mandatory; if missing or failing, BLOCK before doing work |
Stop -> vg-verify-claim.py | Runs vg-orchestrator run-complete and blocks false done claims | Run the command body's terminal vg-orchestrator run-complete before claiming completion; if it returns non-zero, fix evidence and retry |
PostToolUse edit -> vg-edit-warn.py | Warns that command/skill edits require session reload | After editing VG workflow files on Codex, tell the user the current session may still use cached skill text |
PostToolUse Bash -> vg-step-tracker.py | Tracks marker commands and emits hook.step_active telemetry | Do not rely on the hook; call explicit vg-orchestrator mark-step lines in the skill and preserve marker/telemetry events |
Codex hook parity is evidence-based: .vg/events.db, step markers,
must_emit_telemetry, and run-complete output are authoritative. A Codex
run is not complete just because the model says it is complete.
After each STEP's primary action completes, run:
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator mark-step review <marker>
Required HARD markers for /vg:review (v2.65.0 A9):
| STEP | Marker |
|---|---|
| Pre-STEP 0 (integrity precheck) | 00_gate_integrity_precheck |
| STEP 0 (parse + validate) | 0_parse_and_validate |
| STEP 0b (goal coverage gate) | 0b_goal_coverage_gate |
| Final close | complete |
The remaining markers in must_touch_markers: (phase1_, phase2_, phaseP_*,
crossai_review, write_artifacts, bootstrap_reflection, env-mode-gate, etc.)
are advisory (severity: warn) or flag-gated; emit them when the matching
profile branch executes.
v2.67.0 #158 — lens telemetry parity: the body below explicitly calls
mark-step review 2b3_lens_dispatch_complete and
mark-step review 2b3_lens_matrix_rendered after the matching steps so
Codex matches the Claude PostToolUse hook's marker coverage on the
LENS-DISPATCH-PLAN.json + LENS-COVERAGE-MATRIX.md must_write artifacts.
Before executing command bash blocks from a Codex skill, export
VG_RUNTIME=codex. This is an adapter signal, not a source replacement:
Claude/unknown runtime keeps the canonical AskUserQuestion + Haiku path,
while Codex maps only the incompatible orchestration primitives to
Codex-native choices such as codex-inline.
When the source workflow below says Agent(...) or "spawn", Codex MUST
apply this table instead of treating the Claude syntax as executable:
| Source spawn site | Codex action | Tier/model env | Sandbox | Required evidence |
|---|---|---|---|---|
/vg:build wave executor, model="${MODEL_EXECUTOR}" | Write one prompt file per task, run codex-spawn.sh --tier executor; parallelize independent tasks with background processes and wait, serialize dependency groups | VG_CODEX_MODEL_EXECUTOR; leave unset to use Codex config default. Set this to the user's strongest coding model when they want Sonnet-class build quality. | workspace-write | child output, stdout/stderr logs, changed files, verification commands, task-fidelity prompt evidence |
/vg:blueprint, /vg:scope, planner/checker agents | Run codex-spawn.sh --tier planner or inline in the main orchestrator if the step needs interactive user answers | VG_CODEX_MODEL_PLANNER | workspace-write for artifact-writing planners, read-only for pure checks | requested artifacts or JSON verdict |
/vg:review navigator/scanner, Agent(model="haiku") | Use --scanner=codex-inline by default. Do NOT ask to spawn Haiku or blindly spawn codex exec for Playwright/Maestro work. Main Codex orchestrator owns MCP/browser/device actions. Use codex-spawn.sh --tier scanner --sandbox read-only only for non-MCP classification over captured snapshots/artifacts. | VG_CODEX_MODEL_SCANNER; set this to a cheap/fast model for review map/scanner work | read-only unless explicitly generating scan files from supplied evidence | same scan-*.json, RUNTIME-MAP.json, GOAL-COVERAGE-MATRIX.md, and review.haiku_scanner_spawned telemetry event semantics |
/vg:review fix agents and /vg:test codegen agents | Use codex-spawn.sh --tier executor because they edit code/tests | VG_CODEX_MODEL_EXECUTOR or explicit --model if the command selected a configured fix model | workspace-write | changed files, tests run, unresolved risks |
| Rationalization guard, reflector, gap hunters | Use codex-spawn.sh --tier scanner for read-only classification, or --tier adversarial for independent challenge/review | VG_CODEX_MODEL_SCANNER or VG_CODEX_MODEL_ADVERSARIAL | read-only by default | compact JSON/markdown verdict; fail closed on empty/unparseable output |
If a source sentence says "MUST spawn Haiku" and the step needs MCP/browser tools, Codex interprets that as "MUST run the scanner protocol and emit the same artifacts/events"; it does not require a child process unless child MCP access was smoke-tested in the current environment.
Model mapping is tier-based, not vendor-name-based.
VGFlow keeps tier names in .claude/vg.config.md; Codex subprocesses use
the user's Codex config model by default. Pin a tier only after smoke-testing
that model in the target account, via VG_CODEX_MODEL_PLANNER,
VG_CODEX_MODEL_EXECUTOR, VG_CODEX_MODEL_SCANNER, or
VG_CODEX_MODEL_ADVERSARIAL:
| VG tier | Claude-style role | Codex default | Fallback |
|---|---|---|---|
| planner | Opus-class planning/reasoning | Codex config default | Set VG_CODEX_MODEL_PLANNER only after smoke-testing |
| executor | Sonnet-class coding/review | Codex config default | Set VG_CODEX_MODEL_EXECUTOR only after smoke-testing |
| scanner | Haiku-class scan/classify | Codex config default | Set VG_CODEX_MODEL_SCANNER only after smoke-testing |
| adversarial | independent reviewer | Codex config default | Set VG_CODEX_MODEL_ADVERSARIAL only after smoke-testing |
For subprocess-based children, use:
bash .claude/commands/vg/_shared/lib/codex-spawn.sh \
--tier executor \
--prompt-file "$PROMPT_FILE" \
--out "$OUT_FILE" \
--timeout 900 \
--sandbox workspace-write
The helper wraps codex exec, writes the final message to --out, captures
stdout/stderr beside it, and fails loudly on timeout or empty output.
codex exec --model.--output-schema with MCP-heavy runs until the target Codex version is smoke-tested. Prefer plain text + post-parse for MCP flows.codex exec runs inherit sandbox constraints. Use the least sandbox that still allows the child to write expected artifacts.Pattern A: INLINE ORCHESTRATOR. For MCP-heavy support skills such as
vg-haiku-scanner, Codex keeps Playwright/Maestro actions in the main
orchestrator and only delegates read-only classification after snapshots are
captured. This preserves MCP access and avoids false confidence from a child
process that cannot see browser tools.
Invoke this skill as $vg-review. Treat all user text after the skill name as arguments.
</codex_skill_adapter>
<LANGUAGE_POLICY>
You MUST follow _shared/language-policy.md. NON-NEGOTIABLE.
Mặc định trả lời bằng tiếng Việt (config: language.primary trong
.claude/vg.config.md, fallback vi nếu chưa set). Dùng ngôn ngữ con
người, không technical jargon. Mỗi thuật ngữ tiếng Anh xuất hiện lần đầu
trong narration: thêm giải thích VN trong dấu ngoặc (per
_shared/term-glossary.md).
Ví dụ:
[path]. Mình sẽ sửa rồi chạy lại."File paths, code identifiers (G-04, Wave 9, getUserById), commit messages, CLI commands stay English. AskUserQuestion title + options + question prose: ngôn ngữ config. </LANGUAGE_POLICY>
Read _shared/lib/tasklist-projection-instruction.md and follow it
verbatim. The PreToolUse-bash hook will BLOCK every step-active call
in this slim entry until .vg/runs/${RUN_ID}/.tasklist-projected.evidence.json
exists.
Claude TodoWrite MUST include sub-items (↳ prefix) for each group header;
flat projection (group-headers only) is rejected by PostToolUse depth
check (Task 44b Rule V2).
Codex MUST keep the visible plan compact. Do not paste the full hierarchy
into Codex update_plan; use codex_plan_window from the contract and show
at most 6 rows: active group/step first, next 2-3 pending steps, completed
groups collapsed, and +N pending.
<TASKLIST_POLICY> Native task UI projection is REQUIRED.
Source of truth:
.vg/runs/{run_id}/tasklist-contract.json — canonical checklist for this run..vg/events.db — review.tasklist_shown, review.native_tasklist_projected, step.active, step.marked.${PHASE_DIR}/.step-markers/... — durable completion markers.Provider adapters:
TodoWrite
with the full two-layer hierarchy from projection_items[]; each todo
content MUST start with the contract checklist/step id or title. If this
Claude runtime exposes TaskCreate/TaskUpdate, that adapter is also
acceptable. Do not create ad-hoc todos outside tasklist-contract.json.codex_plan_window;
preserve current active group/step identity, but do not create one visible
item per projection_items[] row. Update the compact window before/after
each step and keep it at 6 visible rows or fewer.vg-orchestrator run-status --pretty before and after each step and record adapter fallback.Lifecycle:
replace-on-start: the first native projection MUST replace any stale task
list from a previous workflow. Never append current review items onto a
previous workflow's list.close-on-complete: before reporting success, mark all review checklist
items completed. Then clear the native list if supported; otherwise replace
it with one completed sentinel item: vg:review phase ${PHASE_NUMBER} complete.Mandatory binding:
emit-tasklist.py prints the taskboard and Tasklist contract: ..., read that contract."${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator tasklist-projected --adapter auto
# auto locks to claude, codex, or fallback from runtime env
vg-orchestrator step-active <step_name>.vg-orchestrator mark-step review <step_name>.Do not improvise a separate checklist. The native UI is a projection of tasklist-contract.json; the harness contract remains authoritative.
Long-running work still needs visible narration: run Bash jobs over 30s in background and poll with BashOutput; summarize Task subagent progress before and after spawning.
Dynamic sub-task append (RULE) — projection từ emit-tasklist là baseline,
KHÔNG cứng. Khi AI execute group/step phức tạp (e.g. phase2_browser_discovery
với nhiều view, phase2_5_recursive_lens_probe với nhiều lens), AI PHẢI append
child todos vào group đó để user thấy real-time progress.
Pattern for Claude native task UI (tolerant hook B11.6+):
↳ <id>: <one-line desc> (status: pending → in_progress → completed) ↳ View /campaigns: 12 actions captured ↳ Lens lens-modal-state: 3 modals probed (1 BLOCKED — focus trap) ↳ phase2c G-04: enriched with success criteriaCho operator visibility "AI sẽ làm gì tiếp / tiến độ tới đâu" mà không phải đọc Bash log dài.
Codex exception: keep these dynamic details folded into the active compact
plan row or the next row. Do not exceed the 6-row codex_plan_window budget.
Translate English terms (RULE) — output có thuật ngữ tiếng Anh PHẢI thêm giải thích VN trong dấu ngoặc tại lần đầu xuất hiện. Tham khảo _shared/term-glossary.md. Ví dụ: BLOCK (chặn), Foundation (nền tảng) drift detected (phát hiện lệch hướng), legacy-v1 (định dạng cũ v1), UNREACHABLE (không tiếp cận được). Không áp dụng: file path, code identifier (D-XX, git, pnpm), config tag values, lần lặp lại trong cùng message.
</TASKLIST_POLICY>
Pipeline: specs → scope → blueprint → build → review → test → accept
4 Phases:
Config: Read .claude/commands/vg/_shared/config-loader.md first.
Bug detection (v1.11.2 R6 — MANDATORY): Read .claude/commands/vg/_shared/bug-detection-guide.md BEFORE starting. Apply 6 detection patterns throughout: schema_violation, helper_error, user_pushback, ai_inconsistency, gate_loop, self_discovery. When detected: NARRATE intent + CALL report_bug via bash + CONTINUE workflow (non-blocking).
<CRITICAL_MCP_RULE> BEFORE any browser interaction, you MUST run the Playwright lock claim:
SESSION_ID="vg-${PHASE}-review-$$"
PLAYWRIGHT_SERVER=$(bash "${HOME}/.claude/playwright-locks/playwright-lock.sh" claim "$SESSION_ID")
# Auto-release lock on exit (normal/error/interrupt). Prevents leak if process dies mid-scan.
trap "bash '${HOME}/.claude/playwright-locks/playwright-lock.sh' release \"$SESSION_ID\" 2>/dev/null" EXIT INT TERM
Then use mcp__${PLAYWRIGHT_SERVER}__ as prefix for ALL browser tool calls.
NEVER call plugin:playwright:playwright directly. Other sessions (Codex, other tabs) may be using it.
If claim returns playwright3, your tools are mcp__playwright3__browser_navigate, mcp__playwright3__browser_snapshot, etc.
If ALL 5 servers locked → BLOCK. The lock manager auto-sweeps stale locks (TTL 1800s + dead-PID check)
on every claim — if still no slot free, it's genuinely contended. Do NOT manually cleanup other sessions' locks.
</CRITICAL_MCP_RULE>
If ${PLANNING_DIR}/vgflow-patches/gate-conflicts.md exists, a prior /vg:update detected that the 3-way merge (gộp) altered one or more HARD gate blocks. BLOCK (chặn) until resolved via /vg:reapply-patches --verify-gates.
# Harness v2.6.1 (2026-04-26): inject rule cards at skill entry — gives AI
# a 5-30 line digest of skill rules instead of skimming 1500-line body.
# Cards generated by extract-rule-cards.py. Per AUDIT.md D4 finding
# (inject_rule_cards 0/44 invocation = memory mechanism dead).
[ -f "${REPO_ROOT:-.}/.claude/commands/vg/_shared/lib/inject-rule-cards.sh" ] && \
source "${REPO_ROOT:-.}/.claude/commands/vg/_shared/lib/inject-rule-cards.sh" && \
inject_rule_cards "vg-review" "00_gate_integrity_precheck" 2>&1 || true
# v2.2 — T8 gate now routes through block_resolve. L1 auto-clears stale
# file when all entries carry resolution markers. Only genuine conflicts BLOCK.
if [ -f "${REPO_ROOT}/.claude/commands/vg/_shared/lib/t8-gate-check.sh" ]; then
[ -f "${REPO_ROOT}/.claude/commands/vg/_shared/lib/block-resolver.sh" ] && \
source "${REPO_ROOT}/.claude/commands/vg/_shared/lib/block-resolver.sh"
source "${REPO_ROOT}/.claude/commands/vg/_shared/lib/t8-gate-check.sh"
t8_gate_check "${PLANNING_DIR}" "review"
T8_RC=$?
[ "$T8_RC" -eq 2 ] && exit 2
[ "$T8_RC" -eq 1 ] && exit 1
elif [ -f "${PLANNING_DIR}/vgflow-patches/gate-conflicts.md" ]; then
echo "⛔ Gate integrity conflicts unresolved — run /vg:reapply-patches --verify-gates first."
exit 1
fi
# v2.2 — register run with orchestrator (idempotent with UserPromptSubmit hook)
# OHOK-8 round-4 Codex fix: parse PHASE_NUMBER BEFORE run-start so the run
# doesn't register against an empty phase (telemetry + runtime-contract
# evidence attaches to "" instead of the actual phase).
[ -z "${PHASE_NUMBER:-}" ] && PHASE_NUMBER=$(echo "${ARGUMENTS}" | awk '{print $1}')
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator run-start vg:review "${PHASE_NUMBER}" "${ARGUMENTS}" || { echo "⛔ vg-orchestrator run-start failed — cannot proceed" >&2; exit 1; }
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator mark-step review 00_gate_integrity_precheck 2>/dev/null || true
**Session lifecycle (tightened 2026-04-17) — clean tail UI across runs.**
Follow .claude/commands/vg/_shared/session-lifecycle.md helper.
PHASE_NUMBER=$(echo "$ARGUMENTS" | awk '{print $1}')
# v1.9.2.2 — handle zero-padding (`7.12` → `07.12-*`) via shared resolver
source "${REPO_ROOT:-.}/.claude/commands/vg/_shared/lib/phase-resolver.sh" 2>/dev/null || true
if type -t resolve_phase_dir >/dev/null 2>&1; then
PHASE_DIR_CANDIDATE=$(resolve_phase_dir "$PHASE_NUMBER" 2>/dev/null || echo "")
else
PHASE_DIR_CANDIDATE=$(ls -d ${PLANNING_DIR}/phases/${PHASE_NUMBER}* 2>/dev/null | head -1)
fi
TASKLIST_PROFILE="${PROFILE:-${CONFIG_PROFILE:-web-fullstack}}"
TASKLIST_MODE=""
source "${REPO_ROOT:-.}/.claude/commands/vg/_shared/lib/phase-profile.sh" 2>/dev/null || true
if [ -n "$PHASE_DIR_CANDIDATE" ] && type -t detect_phase_profile >/dev/null 2>&1; then
TASKLIST_PHASE_PROFILE=$(detect_phase_profile "$PHASE_DIR_CANDIDATE" 2>/dev/null || echo "feature")
TASKLIST_MODE=$(phase_profile_review_mode "$TASKLIST_PHASE_PROFILE" 2>/dev/null || echo "full")
fi
if [ -n "$PHASE_DIR_CANDIDATE" ] && type -t detect_phase_platform_profile >/dev/null 2>&1; then
TASKLIST_PROFILE=$(detect_phase_platform_profile "$PHASE_DIR_CANDIDATE" "$TASKLIST_PROFILE" 2>/dev/null || echo "$TASKLIST_PROFILE")
fi
if [[ "$ARGUMENTS" =~ --mode=([a-z-]+) ]]; then
TASKLIST_MODE="${BASH_REMATCH[1]}"
fi
: "${TASKLIST_MODE:=full}"
# Emit session-start banner → distinct separator for Claude Code tail UI
session_start "review" "${PHASE_NUMBER:-unknown}"
${PYTHON_BIN:-python3} .claude/scripts/emit-tasklist.py \
--command "vg:review" \
--profile "${TASKLIST_PROFILE}" \
--mode "${TASKLIST_MODE}" \
--phase "${PHASE_NUMBER:-unknown}" 2>&1 | head -40 || true
# Register EXIT trap emitting "━━━ review Phase X EXITED at step=Y ━━━" on any exit path
# Sweep stale state from previous interrupted runs (>config.session.stale_hours old)
[ -n "$PHASE_DIR_CANDIDATE" ] && stale_state_sweep "review" "$PHASE_DIR_CANDIDATE"
# Kill orphan dev servers on declared ports before pre-flight
[ "${CONFIG_SESSION_PORT_SWEEP_ON_START:-true}" = "true" ] && session_port_sweep "pre-flight"
session_mark_step "0-parse-args"
Immediately after this block returns, execute the TASKLIST_POLICY binding:
project .vg/runs/{run_id}/tasklist-contract.json to the native task UI and
call vg-orchestrator tasklist-projected --adapter auto.
Flags:
--skip-scan — skip Phase 1 (code scan), go directly to browser discovery. Gated: must pair with --override-reason="<text>" (logged to override-debt).--skip-discovery — skip Phase 2 (browser discovery), use existing RUNTIME-MAP for Phase 4. Gated: must pair with --override-reason="<text>" (logged to override-debt).--fix-only — skip to Phase 3 (requires RUNTIME-MAP with errors). Gated: listed in forbidden_without_override (line 34) — must combine with --override-reason="<text>" to run, otherwise hard BLOCK. Entry logged to override-debt register.--skip-crossai — skip CrossAI review at end--evaluate-only — skip Phase 1 + 2 (discovery already done by Codex/Gemini), read existing scan JSONs from ${PHASE_DIR}, go directly to Phase 2b-3 (collect + merge) → Phase 3 (fix) → Phase 4 (goal comparison). Requires: nav-discovery.json + scan-*.json already exist.--retry-failed — skip Phase 1 + Phase 2 navigator, re-scan ONLY views mapped to failed/blocked/SUSPECTED goals in GOAL-COVERAGE-MATRIX.md. Requires: GOAL-COVERAGE-MATRIX.md + RUNTIME-MAP.json already exist. Use when: review already ran but goals < 100%, code was fixed, need targeted re-scan without full re-discovery. v2.46-wave3.2: SUSPECTED status (matrix=READY but no submit evidence) is now included in retry set automatically — closes Phase 3.2 staleness gap where matrix lied.--re-scan-goals=G-XX,G-YY,G-ZZ — bypass matrix entirely; re-scan only the listed goal IDs. Resolves to start_views via RUNTIME-MAP.json goal_sequences. Use when: matrix-staleness validator surfaced specific suspected goals, OR you know exactly which goals need re-evidence. Mutually exclusive with --retry-failed (more precise overrides broader).--dogfood — re-scan ALL mutation goals (any goal with non-empty mutation_evidence in TEST-GOALS.md) regardless of matrix status. Use when: you suspect systemic submit-evidence gaps. Slower than --retry-failed but catches the full surface.--force — full rerun from scratch for an already-reviewed phase. Clears prior review markers/artifacts before any staleness/reuse logic runs, so API precheck + discovery + matrix must be regenerated in the current run. Use when: you explicitly want fresh evidence, not re-validation on existing artifacts.--full-scan — disable sidebar suppression. Haiku agents see full page (sidebar/header/footer) in every snapshot. Use when: app has non-standard layout, geometry detection fails, or debugging suppression issues.--with-probes — enable mutation probe variations (edit/boundary/repeat) in step 2b-3 step 9. Adds 1 Haiku per mutation goal. Default OFF — let /vg:test handle variations via Playwright codegen (deterministic, cheaper).--allow-no-crud-surface — last-resort waiver for legacy phases missing CRUD-SURFACES.md. Logs debt via validator output; do not use for new CRUD work.Flag parsing (v2.46-wave3.2 — explicit env vars used downstream):
# Parse boolean + value flags from $ARGUMENTS into env vars consumed by later steps.
# (Pre-wave3.2 the skill relied on AI interpretation; making it explicit closes
# the gap where staleness check + retry-failed referenced unset variables.)
ARGS_RAW="${ARGUMENTS:-}"
RETRY_FAILED=""
RE_SCAN_GOALS=""
DOGFOOD=""
SKIP_SCAN=""
SKIP_DISCOVERY=""
FIX_ONLY=""
EVALUATE_ONLY=""
FULL_SCAN=""
WITH_PROBES=""
SKIP_CROSSAI=""
ALLOW_NO_CRUD_SURFACE=""
NON_INTERACTIVE=""
FORCE_RERUN=""
for tok in $ARGS_RAW; do
case "$tok" in
--retry-failed) RETRY_FAILED=1 ;;
--re-scan-goals=*) RE_SCAN_GOALS="${tok#--re-scan-goals=}" ;;
--dogfood) DOGFOOD=1 ;;
--force) FORCE_RERUN=1 ;;
--skip-scan) SKIP_SCAN=1 ;;
--skip-discovery) SKIP_DISCOVERY=1 ;;
--fix-only) FIX_ONLY=1 ;;
--evaluate-only) EVALUATE_ONLY=1 ;;
--full-scan) FULL_SCAN=1 ;;
--with-probes) WITH_PROBES=1 ;;
--skip-crossai) SKIP_CROSSAI=1 ;;
--allow-no-crud-surface) ALLOW_NO_CRUD_SURFACE=1 ;;
--non-interactive) NON_INTERACTIVE=1 ;;
esac
done
export RETRY_FAILED RE_SCAN_GOALS DOGFOOD SKIP_SCAN SKIP_DISCOVERY FIX_ONLY \
EVALUATE_ONLY FULL_SCAN WITH_PROBES SKIP_CROSSAI ALLOW_NO_CRUD_SURFACE \
NON_INTERACTIVE FORCE_RERUN
Phase profile detection (P5, v1.9.2) — FIRST ACTION before any blanket check:
# Source phase-profile.sh — pure function, no side effects
source "${REPO_ROOT}/.claude/commands/vg/_shared/lib/phase-profile.sh" 2>/dev/null || true
if type -t detect_phase_profile >/dev/null 2>&1; then
PHASE_PROFILE=$(detect_phase_profile "$PHASE_DIR")
REVIEW_PLATFORM_PROFILE=$(detect_phase_platform_profile "$PHASE_DIR" "${PROFILE:-${CONFIG_PROFILE:-web-fullstack}}" 2>/dev/null || echo "${PROFILE:-${CONFIG_PROFILE:-web-fullstack}}")
PROFILE="$REVIEW_PLATFORM_PROFILE"
export PHASE_PROFILE
export REVIEW_PLATFORM_PROFILE PROFILE
REVIEW_MODE=$(phase_profile_review_mode "$PHASE_PROFILE")
REQUIRED_ARTIFACTS=$(phase_profile_required_artifacts "$PHASE_PROFILE")
SKIP_ARTIFACTS=$(phase_profile_skip_artifacts "$PHASE_PROFILE")
GOAL_COVERAGE_SRC=$(phase_profile_goal_coverage_source "$PHASE_PROFILE")
export REVIEW_MODE REQUIRED_ARTIFACTS SKIP_ARTIFACTS GOAL_COVERAGE_SRC
# Narrate detected profile (Vietnamese) — stderr so user sees reasoning.
phase_profile_summarize "$PHASE_DIR" "$PHASE_PROFILE"
else
# Graceful fallback for legacy workflows where helper not yet installed
PHASE_PROFILE="feature"
REVIEW_MODE="full"
REQUIRED_ARTIFACTS="SPECS.md CONTEXT.md PLAN.md API-CONTRACTS.md API-DOCS.md TEST-GOALS.md SUMMARY.md"
SKIP_ARTIFACTS=""
GOAL_COVERAGE_SRC="TEST-GOALS"
echo "⚠ phase-profile.sh missing — defaulting to profile=feature" >&2
fi
Profile-aware prerequisite gate (replaces hardcoded SUMMARY+API-CONTRACTS check):
MISSING=""
for artifact in $REQUIRED_ARTIFACTS; do
# SUMMARY.md check is glob-aware — SUMMARY*.md counts
if [ "$artifact" = "SUMMARY.md" ]; then
ls "${PHASE_DIR}"/SUMMARY*.md >/dev/null 2>&1 || MISSING="${MISSING} ${artifact}"
else
[ -f "${PHASE_DIR}/${artifact}" ] || MISSING="${MISSING} ${artifact}"
fi
done
MISSING=$(echo "$MISSING" | xargs)
if [ -n "$MISSING" ]; then
echo "⛔ Review prerequisites missing for profile='${PHASE_PROFILE}': ${MISSING}" >&2
# v1.9.1 R2+R4 + v1.9.2 P4: block-resolver — spawn architect proposal
# instead of anti-pattern "list 3 options, stop, wait".
source "${REPO_ROOT}/.claude/commands/vg/_shared/lib/block-resolver.sh" 2>/dev/null || true
if type -t block_resolve >/dev/null 2>&1; then
export VG_CURRENT_PHASE="$PHASE_NUMBER" VG_CURRENT_STEP="review.0-prerequisites"
BR_GATE_CONTEXT="Review prerequisites missing for profile='${PHASE_PROFILE}'. Required: ${REQUIRED_ARTIFACTS}. Missing: ${MISSING}. Profile detected from SPECS.md (parent_phase/issue_id/success_criteria bash commands/migration keywords)."
BR_EVIDENCE=$(printf '{"phase_profile":"%s","required":"%s","missing":"%s","skip":"%s"}' \
"$PHASE_PROFILE" "$REQUIRED_ARTIFACTS" "$MISSING" "$SKIP_ARTIFACTS")
# L1 fix candidates — try to generate missing artifact inline.
# Only safe auto-fixes (SUMMARY backfill from build-state, never TEST-GOALS which needs decisions).
BR_CANDIDATES='[
{"id":"summary-backfill","cmd":"[ -f \"'"$PHASE_DIR"'/build-state.log\" ] && echo \"SUMMARY could be backfilled from build-state.log — user must review\" && exit 1","confidence":0.4,"rationale":"SUMMARY missing but build-state.log exists → narrate user can backfill, but not auto-generate without human eye"},
{"id":"profile-retry-detect","cmd":"source \"'"$REPO_ROOT"'/.claude/commands/vg/_shared/lib/phase-profile.sh\"; P=$(detect_phase_profile \"'"$PHASE_DIR"'\"); test \"$P\" != \"'"$PHASE_PROFILE"'\" && echo \"profile changed to $P, re-check needed\" || exit 1","confidence":0.3,"rationale":"Re-detect in case SPECS was updated between runs"}
]'
BR_RESULT=$(block_resolve "review-prereq-missing" "$BR_GATE_CONTEXT" "$BR_EVIDENCE" "$PHASE_DIR" "$BR_CANDIDATES")
BR_LEVEL=$(echo "$BR_RESULT" | ${PYTHON_BIN} -c "import json,sys; print(json.loads(sys.stdin.read()).get('level',''))" 2>/dev/null)
if [ "$BR_LEVEL" = "L1" ]; then
echo "✓ Block resolver L1 self-resolved — prerequisites now satisfied, re-check below" >&2
# Re-check MISSING after L1 — fall through if still missing
MISSING=""
for artifact in $REQUIRED_ARTIFACTS; do
if [ "$artifact" = "SUMMARY.md" ]; then
ls "${PHASE_DIR}"/SUMMARY*.md >/dev/null 2>&1 || MISSING="${MISSING} ${artifact}"
else
[ -f "${PHASE_DIR}/${artifact}" ] || MISSING="${MISSING} ${artifact}"
fi
done
MISSING=$(echo "$MISSING" | xargs)
[ -z "$MISSING" ] || {
echo "⚠ L1 did not fully resolve — proceeding to L2 architect" >&2
}
fi
if [ -n "$MISSING" ] && [ "$BR_LEVEL" = "L2" ]; then
block_resolve_l2_handoff "review-prereq-missing" "$BR_RESULT" "$PHASE_DIR"
exit 2
elif [ -n "$MISSING" ]; then
# L4 — genuinely stuck (resolver disabled or architect unavailable)
echo "Fix paths by profile:" >&2
echo " feature → /vg:blueprint ${PHASE_NUMBER} (generates PLAN + API-CONTRACTS + TEST-GOALS)" >&2
echo " infra → add '## Success criteria' bash checklist to SPECS, commit PLAN + SUMMARY" >&2
echo " hotfix → ensure SPECS has 'Parent phase:' field + PLAN + SUMMARY" >&2
echo " bugfix → add 'issue_id:' or 'bug_ref:' to SPECS + PLAN + SUMMARY" >&2
echo " migration → add ROLLBACK.md with down-migration steps" >&2
echo " docs → only SPECS.md required" >&2
exit 1
fi
else
# Resolver unavailable → classic hard block (still better than 3-option anti-pattern)
echo "Required for profile='${PHASE_PROFILE}': ${REQUIRED_ARTIFACTS}" >&2
echo "Run /vg:blueprint or equivalent to produce missing artifacts, then retry." >&2
exit 1
fi
fi
Update PIPELINE-STATE.json:
# VG-native state update (no GSD dependency)
PIPELINE_STATE="${PHASE_DIR}/PIPELINE-STATE.json"
${PYTHON_BIN} -c "
import json; from pathlib import Path
p = Path('${PIPELINE_STATE}')
s = json.loads(p.read_text(encoding='utf-8')) if p.exists() else {}
s['status'] = 'reviewing'; s['pipeline_step'] = 'review'
s['updated_at'] = __import__('datetime').datetime.now().isoformat()
p.write_text(json.dumps(s, indent=2))
" 2>/dev/null
Force rerun reset (--force) — clear prior review state before any reuse/staleness branch can fire.
This is the explicit escape hatch for "phase already reviewed" situations.
--force means: do a real full rerun, not matrix-only re-validation.
if [ -n "$FORCE_RERUN" ]; then
if [ -n "$RETRY_FAILED" ] || [ -n "$RE_SCAN_GOALS" ] || [ -n "$EVALUATE_ONLY" ] || \
[ -n "$DOGFOOD" ] || [ -n "$SKIP_DISCOVERY" ] || [ -n "$FIX_ONLY" ]; then
echo "⛔ --force is incompatible with --retry-failed, --re-scan-goals, --evaluate-only, --dogfood, --skip-discovery, and --fix-only." >&2
echo " Use plain full review + --force when you need fresh API/browser evidence." >&2
exit 1
fi
# If operator explicitly pinned a non-full mode on CLI, reject. Force is
# for full rerun semantics, not profile-short-circuit modes.
if [[ "${ARGS_RAW}" =~ --mode=([a-z-]+) ]] && [ "${BASH_REMATCH[1]}" != "full" ]; then
echo "⛔ --force currently supports only --mode=full. Got --mode=${BASH_REMATCH[1]}." >&2
exit 1
fi
VG_SCRIPT_ROOT="${REPO_ROOT:-.}/.claude/scripts"
[ -d "$VG_SCRIPT_ROOT" ] || VG_SCRIPT_ROOT="${REPO_ROOT:-.}/scripts"
FORCE_RESET_SCRIPT="${VG_SCRIPT_ROOT}/reset-review-state.py"
if [ ! -f "$FORCE_RESET_SCRIPT" ]; then
echo "⛔ --force requested but reset helper missing: $FORCE_RESET_SCRIPT" >&2
exit 1
fi
echo ""
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
echo "↻ Force rerun requested"
echo " Clearing prior review markers/artifacts"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
FORCE_RESET_JSON=$("${PYTHON_BIN:-python3}" "$FORCE_RESET_SCRIPT" --phase-dir "${PHASE_DIR}")
FORCE_RESET_RC=$?
if [ "$FORCE_RESET_RC" -ne 0 ]; then
echo "⛔ reset-review-state failed — cannot guarantee a fresh rerun." >&2
exit 1
fi
echo "$FORCE_RESET_JSON" | "${PYTHON_BIN:-python3}" -c "
import json, sys
data = json.load(sys.stdin)
print(f\" removed artifacts: {data.get('removed_count', 0)}\")
for path in data.get('removed', [])[:12]:
print(f\" - {path}\")
extra = max(0, len(data.get('removed', [])) - 12)
if extra:
print(f\" ... +{extra} more\")
"
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator emit-event \
"review.force_rerun_requested" \
--step "0_parse_and_validate" \
--actor "llm-claimed" \
--outcome "INFO" \
--payload "{\"phase\":\"${PHASE_NUMBER}\",\"mode\":\"full\"}" >/dev/null 2>&1 || true
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator emit-event \
"review.force_rerun_workspace_reset" \
--step "0_parse_and_validate" \
--actor "llm-claimed" \
--outcome "INFO" \
--payload "$FORCE_RESET_JSON" >/dev/null 2>&1 || true
fi
Matrix staleness check (v2.46-wave3.2) — closes "matrix says READY but button still errors" gap.
Runs BEFORE --retry-failed short-circuit so SUSPECTED goals get folded into the retry set automatically. Cross-checks goal_sequences[].steps[] against each goal's mutation_evidence declaration:
mutation_evidence non-empty → goal expects submit click + 2xx mutation networkgoal_sequences[gid].result == 'passed' AND no submit step → SUSPECTEDgoal_sequences[gid].result == 'passed' AND no 2xx POST/PUT/PATCH/DELETE → SUSPECTEDgoal_sequences[gid].result == 'passed' AND only cancel-only steps → SUSPECTED# Only run if both artifacts exist (skip on first review where matrix not yet built)
if [ -f "${PHASE_DIR}/GOAL-COVERAGE-MATRIX.md" ] && [ -f "${PHASE_DIR}/RUNTIME-MAP.json" ]; then
STALENESS_VALIDATOR="${REPO_ROOT}/.claude/scripts/validators/verify-matrix-staleness.py"
if [ -f "$STALENESS_VALIDATOR" ]; then
echo "━━ Checking matrix staleness (matrix-vs-runtime evidence) ━━"
# --apply-status-update: rewrite matrix in place so retry-failed picks up SUSPECTED goals
STALENESS_OUT=$("${PYTHON_BIN:-python3}" "$STALENESS_VALIDATOR" \
--phase "$PHASE_NUMBER" \
--severity block \
--apply-status-update 2>&1)
STALENESS_RC=$?
SUSPECTED_COUNT=0
if [ -f "${PHASE_DIR}/.matrix-staleness.json" ]; then
SUSPECTED_COUNT=$("${PYTHON_BIN:-python3}" -c "
import json
try: print(json.load(open('${PHASE_DIR}/.matrix-staleness.json'))['suspected_count'])
except: print(0)
")
fi
if [ "$STALENESS_RC" -ne 0 ] && [ "$SUSPECTED_COUNT" -gt 0 ]; then
echo "⚠ ${SUSPECTED_COUNT} goal(s) marked SUSPECTED — matrix=READY but no submit evidence." >&2
# Auto-promote to --retry-failed if not already in a retry/scan-goals mode
if [ -z "$RETRY_FAILED" ] && [ -z "$RE_SCAN_GOALS" ] && [ -z "$DOGFOOD" ]; then
echo " Auto-promoting to --retry-failed (SUSPECTED goals folded into retry set)." >&2
RETRY_FAILED=1
export RETRY_FAILED
fi
elif [ "$SUSPECTED_COUNT" -eq 0 ]; then
echo "✓ Matrix staleness OK — all READY goals have submit + 2xx mutation evidence."
fi
else
echo "⚠ verify-matrix-staleness.py not found — skipping staleness check (re-sync vgflow)." >&2
fi
fi
Validate --re-scan-goals flag (mutually exclusive with --retry-failed):
if [ -n "$RE_SCAN_GOALS" ] && [ -n "$RETRY_FAILED" ]; then
echo "⛔ --re-scan-goals and --retry-failed are mutually exclusive (--re-scan-goals is more precise)." >&2
exit 1
fi
if [ -n "$RE_SCAN_GOALS" ]; then
# Validate each goal ID exists in RUNTIME-MAP.json goal_sequences
if [ ! -f "${PHASE_DIR}/RUNTIME-MAP.json" ]; then
echo "⛔ --re-scan-goals requires RUNTIME-MAP.json (run /vg:review {phase} first)." >&2
exit 1
fi
INVALID_GOALS=$("${PYTHON_BIN:-python3}" -c "
import json
rt = json.load(open('${PHASE_DIR}/RUNTIME-MAP.json'))
seqs = rt.get('goal_sequences') or {}
asked = '${RE_SCAN_GOALS}'.split(',')
missing = [g.strip() for g in asked if g.strip() and g.strip() not in seqs]
print(','.join(missing))
" 2>/dev/null)
if [ -n "$INVALID_GOALS" ]; then
echo "⛔ --re-scan-goals: unknown goal IDs: ${INVALID_GOALS}" >&2
echo " Valid IDs are keys of goal_sequences in RUNTIME-MAP.json." >&2
exit 1
fi
fi
## Step 0a — Confirm review env + mode + scanner (v2.42.1+ — HARD gate)
Background: Pre-v2.42, review used config.step_env.verify silently. User had no visibility. Phases 3.3/3.4a/3.4b needed re-runs because env wasn't pinned. v2.42 added prompt with severity: warn — AI skipped it. v2.42.1 makes this a HARD block (severity: block default + telemetry required).
This step makes env+mode+scanner user-visible + recorded BEFORE any other work happens.
STOP. The very first thing you do in step 0a_env_mode_gate is run the provider-native user prompt with the payload below — no exceptions other than the documented waivers.
Provider-native prompt means:
AskUserQuestion with the structured payload below.AskUserQuestion; Codex does not expose that Claude primitive inside generated skills.Runtime rule: Claude remains the canonical source path. If runtime is Claude
or unknown, keep AskUserQuestion + haiku-only. Only when
VG_RUNTIME=codex (set by the Codex adapter) or Codex runtime is detected
does this step map to main-thread prompt + codex-inline.
Skip the provider-native prompt ONLY when:
$ARGUMENTS contains --non-interactive flag, ORVG_NON_INTERACTIVE=1 env var is set, OR$ARGUMENTS contains ALL THREE: --target-env=<v> (or --sandbox/--local/--staging/--prod), --mode=<v>, and --scanner=<v>If ANY of the 3 axes is missing on CLI and not waived → provider-native prompt is REQUIRED. Do NOT silently default. Do NOT run the bash block below before the prompt returns.
Why HARD gate (v2.42.1): AI agents have a strong pull to silent-default in warn severity contracts. Phases 3.3/3.4a/3.4b confirmed this. Block severity + telemetry-required closes the gap.
questions:
- question: "Review environment — chạy review trên môi trường nào? (môi trường = environment, môi trường thử nghiệm)"
header: "Env"
multiSelect: false
options:
- label: "local — máy của bạn (port 3001-3010, fastest)"
description: "Browser MCP local, DB seed local, không cần SSH. Tốt khi iterate nhanh."
- label: "sandbox — VPS Hetzner (printway.work subdomain)"
description: "Production-like, ssh deploy. Mặc định cho phase ship-ready."
- label: "staging — staging server (CHỈ nếu config có)"
description: "Hiện chưa cấu hình ở project này — chọn sẽ fail."
- label: "prod — production (CẢNH BÁO: read-only debug)"
description: "CHỈ dùng debug khẩn cấp. Workflow sẽ block mutations."
- question: "Review mode — chạy theo profile (hồ sơ phase) nào?"
header: "Mode"
multiSelect: false
options:
- label: "full — discovery đầy đủ (feature profile)"
description: "Phase 1 code scan + Phase 2 browser scan + fix loop. Mặc định cho feature mới."
- label: "delta — chỉ scan vùng đã sửa (hotfix profile)"
description: "Diff-aware. Tốt cho hotfix nhỏ, không cần full sweep."
- label: "regression — sweep sau bugfix (bugfix profile)"
description: "Re-verify parent goals + new bug fix area."
- label: "schema-verify — round-trip migration (migration profile)"
description: "Up/down migration check, không discovery UI."
- label: "link-check — markdown links (docs profile)"
description: "Validate links + cross-refs only. Skip UI."
- label: "infra-smoke — chạy success_criteria bash (infra profile)"
description: "Parse SPECS bash bullets, run từng cái, ghi kết quả."
- question: "Scanner — model/runtime nào chạy code-scan + view-scan (deepscan = quét sâu)"
header: "Scanner"
multiSelect: false
options:
- label: "codex-inline — Codex main-orchestrator scan (Codex Recommended)"
description: "Codex giữ MCP/browser trong main orchestrator, chạy scanner protocol inline/sequential và vẫn ghi cùng scan artifacts/events. Không spawn Haiku."
- label: "haiku-only — Claude Haiku scanner (Claude Code fastest)"
description: "Claude Code path: Phase 1 + Phase 2b-2 dùng Haiku agents qua Task tool. Chỉ chọn trên Codex nếu child MCP/subagent path đã smoke-test."
- label: "codex-supplement — Haiku + Codex CLI deepscan trên surfaces trọng yếu"
description: "Claude: Haiku + Codex CLI supplement. Codex: inline scan + optional codex exec read-only classifier over captured snapshots if smoke-tested."
- label: "gemini-supplement — Haiku + Gemini CLI deepscan"
description: "Gemini Pro 3.1 cross-scan, focus on UI consistency + a11y. +cost +time."
- label: "council-all — Haiku + Codex + Gemini + Claude (full council deepscan)"
description: "Triple cross-AI review. CHỈ dùng khi phase ship-critical (e.g., payment, auth)."
- question: "Method — cách chạy scanner: spawn auto subprocess hay manual paste prompt? (chỉ áp dụng khi scanner cần external CLI)"
header: "Method"
multiSelect: false
options:
- label: "spawn — VG tự subprocess CLI scanner (Recommended)"
description: "Hands-off, tự chạy + tự gom log. Cần CLI authenticated trên máy này (codex / gemini). Với codex-inline thì main orchestrator sở hữu MCP/browser."
- label: "manual — VG sinh prompt files cho user paste"
description: "Generates per-tool prompts vào `.vg/phases/{phase}/review/prompts/{codex,gemini}/` cho user paste sang CLI desktop / Cursor / web ChatGPT. User tự chạy, drop scan results vào `runs/{tool}/`, VG verify khi user signal continue."
- label: "hybrid — auto cho high-confidence lenses, manual cho human-judgment"
description: "Routing per `vg.config review.scanner.hybrid_routing`. Phù hợp khi muốn tốc độ + control selective."
Export BEFORE running bash:
export VG_ENV="<chosen env>" # e.g., "local"
export VG_REVIEW_MODE="<chosen>" # e.g., "full"
export VG_SCANNER="<chosen scanner>" # e.g., "haiku-only" on Claude, "codex-inline" on Codex
export VG_METHOD="<chosen method>" # e.g., "spawn" / "manual" / "hybrid"
If non-interactive path: echo chosen values to user (Auto-pinned: env=X, mode=Y, scanner=Z, method=W) but do NOT prompt.
# 1. Defaults (if AI did not export — non-interactive or skipped paths)
detect_vg_runtime() {
case "${VG_RUNTIME:-${VG_PROVIDER:-}}" in
claude|claude-*) echo "claude"; return ;;
codex|codex-*) echo "codex"; return ;;
esac
if [ -n "${CLAUDE_SESSION_ID:-}" ] || [ -n "${CLAUDE_CODE_SESSION_ID:-}" ] || [ -n "${CLAUDE_PROJECT_DIR:-}" ]; then
echo "claude"
return
fi
if [ -n "${CODEX_SANDBOX:-}" ] || [ -n "${CODEX_CLI_SANDBOX:-}" ] || [ -n "${CODEX_HOME:-}" ]; then
echo "codex"
return
fi
echo "claude" # Preserve Claude/Haiku path unless Codex is explicit.
}
VG_RUNTIME="$(detect_vg_runtime)"
export VG_RUNTIME
: "${VG_ENV:=${CONFIG_STEP_ENV_VERIFY:-local}}"
: "${VG_REVIEW_MODE:=${REVIEW_MODE:-full}}"
if [ -z "${VG_SCANNER+x}" ]; then
case "$VG_RUNTIME" in
codex|codex-*) VG_SCANNER="codex-inline" ;;
*) VG_SCANNER="haiku-only" ;;
esac
fi
: "${VG_METHOD:=spawn}"
# 2. CLI flag override (still applies even if AI exported — explicit beats prompt)
if [[ "$ARGUMENTS" =~ --target-env=([a-z]+) ]]; then VG_ENV="${BASH_REMATCH[1]}"; fi
if [[ "$ARGUMENTS" =~ --sandbox ]]; then VG_ENV="sandbox"; fi
if [[ "$ARGUMENTS" =~ --local ]]; then VG_ENV="local"; fi
if [[ "$ARGUMENTS" =~ --staging ]]; then VG_ENV="staging"; fi
if [[ "$ARGUMENTS" =~ --prod ]]; then VG_ENV="prod"; fi
if [[ "$ARGUMENTS" =~ --mode=([a-z-]+) ]]; then VG_REVIEW_MODE="${BASH_REMATCH[1]}"; fi
if [[ "$ARGUMENTS" =~ --scanner=([a-z-]+) ]]; then VG_SCANNER="${BASH_REMATCH[1]}"; fi
if [[ "$ARGUMENTS" =~ --method=([a-z]+) ]]; then VG_METHOD="${BASH_REMATCH[1]}"; fi
# 2b. v2.43.4 — coerce method when scanner=haiku-only (Haiku spawns via Task,
# manual/hybrid don't apply). Echo correction so user sees what happened.
if [ "$VG_SCANNER" = "haiku-only" ] && [ "$VG_METHOD" != "spawn" ]; then
echo "ℹ Method '${VG_METHOD}' không áp dụng cho scanner=haiku-only (Haiku qua Task tool internal). Coerce method=spawn."
VG_METHOD="spawn"
fi
if [ "$VG_SCANNER" = "codex-inline" ] && [ "$VG_METHOD" != "spawn" ]; then
echo "ℹ Method '${VG_METHOD}' không áp dụng cho scanner=codex-inline (Codex main orchestrator owns MCP/browser). Coerce method=spawn."
VG_METHOD="spawn"
fi
export VG_ENV VG_REVIEW_MODE VG_SCANNER VG_METHOD
# 3. Backward-compat: existing code reads ENV_NAME / REVIEW_MODE
ENV_NAME="$VG_ENV"
REVIEW_MODE="$VG_REVIEW_MODE"
export ENV_NAME REVIEW_MODE
# 4. Validate env exists in config (warn if not configured)
if ! grep -qE "^[[:space:]]*${VG_ENV}:" .claude/vg.config.md 2>/dev/null; then
echo "⚠ Env '${VG_ENV}' không có trong vg.config.md — có thể fail ở deploy/auth steps." >&2
echo " Available envs (môi trường khả dụng): $(grep -oE '^ (local|sandbox|staging|prod):' .claude/vg.config.md | tr -d ' :' | tr '\n' ' ')" >&2
fi
# 5. Validate mode is recognized
case "$VG_REVIEW_MODE" in
full|delta|regression|schema-verify|link-check|infra-smoke) ;;
*)
echo "⚠ Mode '${VG_REVIEW_MODE}' không hợp lệ — fall back về 'full'." >&2
VG_REVIEW_MODE="full"
REVIEW_MODE="full"
export VG_REVIEW_MODE REVIEW_MODE
;;
esac
# 5b. Validate scanner is recognized
case "$VG_SCANNER" in
haiku-only|codex-inline|codex-supplement|gemini-supplement|council-all) ;;
*)
echo "⚠ Scanner '${VG_SCANNER}' không hợp lệ — fall back theo runtime." >&2
case "$VG_RUNTIME" in
codex|codex-*) VG_SCANNER="codex-inline" ;;
*) VG_SCANNER="haiku-only" ;;
esac
export VG_SCANNER
;;
esac
# 5c. v2.43.4 — validate method
case "$VG_METHOD" in
spawn|manual|hybrid) ;;
*)
echo "⚠ Method '${VG_METHOD}' không hợp lệ — fall back về 'spawn'." >&2
VG_METHOD="spawn"
export VG_METHOD
;;
esac
if [ -n "${FORCE_RERUN:-}" ] && [ "$VG_REVIEW_MODE" != "full" ]; then
echo "⛔ --force requires full review mode after env gate. Current mode=${VG_REVIEW_MODE}." >&2
echo " Re-run với --mode=full --force nếu muốn ép scan lại từ đầu." >&2
exit 1
fi
# 6. Display banner — user-visible confirmation of what's about to run
echo ""
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
echo " /vg:review configuration"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
echo " Phase: ${PHASE_NUMBER} (profile=${PHASE_PROFILE})"
echo " Env: ${VG_ENV}"
echo " Mode: ${VG_REVIEW_MODE}"
echo " Scanner: ${VG_SCANNER}"
echo " Method: ${VG_METHOD} # spawn=auto subprocess / manual=paste prompt / hybrid"
echo " Runtime: ${VG_RUNTIME}"
echo " Args: ${ARGUMENTS}"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
echo ""
# 7. Persist to PIPELINE-STATE.json — audit trail
${PYTHON_BIN} -c "
import json
from pathlib import Path
import datetime
p = Path('${PHASE_DIR}/PIPELINE-STATE.json')
s = json.loads(p.read_text(encoding='utf-8')) if p.exists() else {}
steps = s.setdefault('steps', {})
review = steps.setdefault('review', {})
review['env'] = '${VG_ENV}'
review['mode'] = '${VG_REVIEW_MODE}'
review['scanner'] = '${VG_SCANNER}'
review['method'] = '${VG_METHOD}'
review['runtime'] = '${VG_RUNTIME}'
review['profile'] = '${PHASE_PROFILE}'
review['last_invoked_at'] = datetime.datetime.now(datetime.timezone.utc).isoformat()
review['last_args'] = '''${ARGUMENTS}'''[:500]
p.write_text(json.dumps(s, indent=2))
" 2>/dev/null
# 8. Emit telemetry — orchestrator gates on this event.
# v2.47.1 (Issue #83) — emit-event signature: event_type is positional;
# --phase/--command are NOT valid args (orchestrator infers phase from
# active run; command is recorded by run-start). Valid --actor enum:
# orchestrator | hook | validator | llm-claimed | user. The literal
# "skill" is rejected by argparse, silently failing the call when stderr
# is redirected via 2>&1 || true. Pre-fix: review.env_mode_confirmed
# events never recorded. Phase + command moved into payload JSON.
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator emit-event \
"review.env_mode_confirmed" \
--step "0a_env_mode_gate" \
--actor "llm-claimed" \
--outcome "INFO" \
--payload "{\"phase\":\"${PHASE_NUMBER}\",\"command\":\"vg:review\",\"env\":\"${VG_ENV}\",\"mode\":\"${VG_REVIEW_MODE}\",\"scanner\":\"${VG_SCANNER}\",\"method\":\"${VG_METHOD}\",\"runtime\":\"${VG_RUNTIME}\",\"profile\":\"${PHASE_PROFILE}\",\"interactive\":$([[ \"$ARGUMENTS\" =~ --non-interactive ]] && echo false || echo true)}" \
2>/dev/null || true
# 9. Mark step
(type -t mark_step >/dev/null 2>&1 && mark_step "${PHASE_NUMBER:-unknown}" "0a_env_mode_gate" "${PHASE_DIR}") || touch "${PHASE_DIR}/.step-markers/0a_env_mode_gate.done" 2>/dev/null || true
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator mark-step review 0a_env_mode_gate 2>/dev/null || true
Downstream impact:
$ENV_NAME and $REVIEW_MODE — no further changes needed; they pick up user choice automatically.phaseP_* profile branches (line 528+) gate on $REVIEW_MODE — user can override auto-detected profile mode here.$VG_SCANNER is recorded into PIPELINE-STATE.json + telemetry. Banner echoes the choice at start of phase1_code_scan so user sees the value was honored. codex-inline is the Codex-native path: main orchestrator keeps MCP/browser control and writes the same scan artifacts/events without Haiku. Supplemental CrossAI CLI scan (codex-supplement / gemini-supplement / council-all) is wired in v2.42.2 — a follow-up patch lands the actual codex exec / gemini / Claude CLI dispatch after primary scan completes, plus merge into RUNTIME-MAP under crossai_scanner_findings. v2.42.1 captures the user choice so the data path is in place./vg:review <phase> re-prompts (audit trail accumulates last_invoked_at history).
Rationale: tests land in /vg:test (creates .spec.ts with TS-XX markers). Review runs BEFORE /vg:test → first-pass review always fails goal coverage → pipeline deadlock on backend-only phases.
Fix (v2.2+): at review stage = WARN only. At /vg:test + /vg:accept stages = BLOCK (those are the right enforcement points).
echo ""
echo "━━━ Goal coverage gate (advisory at review) ━━━"
${PYTHON_BIN} .claude/scripts/verify-goal-coverage-phase.py \
--phase-dir "${PHASE_DIR}" \
--repo-root "${REPO_ROOT}"
GOAL_RC=$?
if [ "$GOAL_RC" -eq 2 ]; then
echo ""
echo "⚠ Goal coverage gap (advisory at review stage — will enforce at /vg:test):"
echo " Some automated goals have no TS-XX binding. This is expected if /vg:test"
echo " hasn't run yet. Tests will be added there."
echo ""
echo "To hard-enforce at review: /vg:review ${PHASE_NUMBER} --strict-goal-coverage"
if [[ "${ARGUMENTS}" =~ --strict-goal-coverage ]]; then
echo "⛔ --strict-goal-coverage set — BLOCK at review (legacy v1.14.4 behavior)."
exit 1
fi
# Log advisory to debt register for /vg:test + /vg:accept to enforce
source "${REPO_ROOT}/.claude/commands/vg/_shared/lib/override-debt.sh" 2>/dev/null || true
if type -t log_override_debt >/dev/null 2>&1; then
log_override_debt "review-goal-coverage-advisory" "${PHASE_NUMBER}" "unbound goals expected before /vg:test" "${PHASE_DIR}"
fi
fi
(type -t mark_step >/dev/null 2>&1 && mark_step "${PHASE_NUMBER:-unknown}" "0b_goal_coverage_gate" "${PHASE_DIR}") || touch "${PHASE_DIR}/.step-markers/0b_goal_coverage_gate.done" 2>/dev/null || true
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator mark-step review 0b_goal_coverage_gate 2>/dev/null || true
## Step 0c — Reactive Telemetry Suggestions (v2.5 Phase E)
Read telemetry-generated suggestions (always-pass skip candidates / expensive reorder / override abuse) so orchestrator can surface to user BEFORE running full review pipeline. Purely advisory; never auto-applied. UNQUARANTINABLE validators (security/wave-verify/etc.) are never suggested for skip — closes AI-gaming surface.
TELEMETRY_ENABLED=$(${PYTHON_BIN:-python3} -c "
import re
in_t=False
for line in open('.claude/vg.config.md', encoding='utf-8'):
s=line.strip()
if s.startswith('telemetry:'): in_t=True; continue
if in_t:
m=re.match(r'^\s*enabled:\s*(true|false)', line, re.IGNORECASE)
if m: print(m.group(1).lower()); break
if line and not line[0].isspace() and ':' in s: break
print('true')
" 2>/dev/null | head -1)
if [ "$TELEMETRY_ENABLED" = "true" ]; then
SUGGESTIONS=$(${PYTHON_BIN:-python3} .claude/scripts/telemetry-suggest.py \
--command vg:review 2>/dev/null || echo "")
if [ -n "$SUGGESTIONS" ]; then
COUNT=$(echo "$SUGGESTIONS" | grep -c '^{' || echo 0)
if [ "${COUNT:-0}" -gt 0 ]; then
echo "▸ Telemetry suggestions (${COUNT}, advisory only — skip-security NEVER suggested):"
echo "$SUGGESTIONS" | head -5 | ${PYTHON_BIN:-python3} -c "
import json, sys
for line in sys.stdin:
line=line.strip()
if not line: continue
try:
d=json.loads(line)
t=d.get('type','?')
if t=='skip':
print(f\" [skip] {d.get('validator','?')} — {d.get('pass_rate',0):.0%} pass ({d.get('samples',0)} samples)\")
elif t=='reorder':
print(f\" [reorder-late] {d.get('validator','?')} — p95={d.get('p95_ms',0)}ms\")
elif t=='override_abuse':
print(f\" [override-abuse] {d.get('flag','?')} used {d.get('count_30d',0)}x/30d — gate may need tuning\")
except Exception: pass
" 2>/dev/null
echo " (apply: /vg:telemetry --apply <id>; full list: /vg:telemetry --suggest)"
fi
fi
fi
touch "${PHASE_DIR}/.step-markers/0c_telemetry_suggestions.done"
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator mark-step review 0c_telemetry_suggestions 2>/dev/null || true
**Project the harness tasklist contract into native task UI.**
Per TASKLIST_POLICY, the tasklist shown by emit-tasklist.py is a binding contract, not advisory text. Before running Phase 1, do all of this:
.vg/runs/{run_id}/tasklist-contract.json from the current run.TodoWrite replaces the whole tasklist with one item per
projection_items[] row, then updates the same list as steps move
active/completed. TaskCreate/TaskUpdate is acceptable only when that
runtime exposes those native task tools.codex_plan_window; do not
mirror every contract item. Keep at most 6 rows visible."${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator tasklist-projected --adapter auto
# auto locks to claude, codex, or fallback from runtime env
## ━━━ /vg:review step plan ━━━
1a: Contract verify (grep BE routes vs contracts)
1b: Element inventory (count UI elements per file)
1.5: Graphify ripple analysis (cross-module callers)
2a: Deploy + preflight (to {ENV}, health check)
2a.5: API Docs + API contract precheck (BE -> FE reference + live route probe)
2b-1: Navigator discovers views (Haiku scanning sidebar)
2b-2: Haiku scanners per view (N parallel agents)
2b-3: Merge + evaluate scan results
2c: Runtime lens dispatch (plugins: enrich, CRUD/RCRURD, filter, paging, sort, search, URL-state, visual)
2d: API error-message runtime lens (API body message -> visible toast/form error)
2e: Findings pipeline (merge, adversarial challenge, auto-fix routing)
3: Fix loop (max 3 iterations)
4a: Load goals + filter infra deps
4b: Map goals to RUNTIME-MAP
4c: Weighted gate evaluation
4d: Write GOAL-COVERAGE-MATRIX
post: Unreachable triage → CrossAI review → artifacts → reflection
vg-orchestrator step-active <step_name>, then narrate: ## ━━━ Running 2b-2: Scanning /conversions as advertiser (3/7 views) ━━━.vg-orchestrator mark-step review <step_name>, then update the native task item to completed.Dynamic header examples (concrete values in headers, not in stale task items):
## ━━━ 2b-2: Scanning /conversions as advertiser (3/7 views) ━━━## ━━━ 3: Fixing Bug #2: S2SSecretSection crash (iter 1/3) ━━━## ━━━ 4a: 38 goals loaded, 16 INFRA_PENDING (ClickHouse, pixel_server) ━━━
If REVIEW_MODE ≠ full, short-circuit before code scan + browser discovery.
Each non-full review mode has a dedicated handler. After handler completes,
jump straight to write_goal_coverage_matrix (step 4d equivalent) and exit.
The phase_profile_branch step is a router — see dedicated phaseP_* steps below.
case "$REVIEW_MODE" in
full)
# Classic path — Phase 1 code scan → Phase 2 browser → Phase 3 fix → Phase 4 goal compare
echo "▸ Review mode: full (feature profile) — running classic discovery pipeline"
;;
infra-smoke)
echo "▸ Review mode: infra-smoke (${PHASE_PROFILE} profile) — parsing SPECS success_criteria"
# Handled by `phaseP_infra_smoke` below. Jumps to goal-coverage-matrix write + exit.
;;
delta)
echo "▸ Review mode: delta (hotfix profile) — focus on delta + parent goals re-verify"
# Handled by `phaseP_delta` below.
;;
regression)
echo "▸ Review mode: regression (bugfix profile) — regression sweep around issue"
# Handled by `phaseP_regression` below.
;;
schema-verify)
echo "▸ Review mode: schema-verify (migration profile) — schema round-trip check"
# Handled by `phaseP_schema_verify` below.
;;
link-check)
echo "▸ Review mode: link-check (docs profile) — markdown link validation"
# Handled by `phaseP_link_check` below.
;;
*)
echo "⚠ Unknown REVIEW_MODE='${REVIEW_MODE}' — falling back to full pipeline" >&2
REVIEW_MODE="full"
;;
esac
# Materialize the profile-aware checklist/plugin contract early. Full web runs
# refresh it after RUNTIME-MAP discovery, but backend-only/CLI/library and
# non-full modes still need a REVIEW-LENS-PLAN artifact and task contract.
LENS_PLAN_SCRIPT="${REPO_ROOT}/.claude/scripts/review-lens-plan.py"
if [ -f "$LENS_PLAN_SCRIPT" ]; then
"${PYTHON_BIN:-python3}" "$LENS_PLAN_SCRIPT" \
--phase-dir "$PHASE_DIR" \
--profile "${PROFILE:-${CONFIG_PROFILE:-web-fullstack}}" \
--mode "${REVIEW_MODE:-full}" \
--write >/dev/null 2>&1 || {
echo "⛔ Review checklist/lens plan generation failed for profile=${PROFILE:-${CONFIG_PROFILE:-web-fullstack}} mode=${REVIEW_MODE:-full}" >&2
exit 1
}
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator emit-event \
"review.lens_plan_generated" \
--payload "{\"phase\":\"${PHASE_NUMBER}\",\"platform\":\"${PROFILE:-${CONFIG_PROFILE:-web-fullstack}}\",\"phase_profile\":\"${PHASE_PROFILE:-unknown}\",\"mode\":\"${REVIEW_MODE:-full}\",\"artifact\":\"REVIEW-LENS-PLAN.json\"}" \
>/dev/null 2>&1 || true
else
echo "⛔ Missing review lens planner: $LENS_PLAN_SCRIPT" >&2
exit 1
fi
Dispatcher rule: Orchestrator runs EXACTLY ONE of: phaseP_infra_smoke | phaseP_delta | phaseP_regression | phaseP_schema_verify | phaseP_link_check | classic phase1_code_scan → phase4_goal_comparison. Infra-smoke etc. write GOAL-COVERAGE-MATRIX.md directly (implicit goals from SPECS), skip browser + RUNTIME-MAP entirely.
Runs when REVIEW_MODE=infra-smoke (infra profile).
Logic: parse SPECS ## Success criteria checklist → run each bash command on target env → map to implicit goals S-01..S-NN → gate on all READY.
if [ "$REVIEW_MODE" != "infra-smoke" ]; then
echo "↷ Skipping phaseP_infra_smoke (REVIEW_MODE=$REVIEW_MODE)"
else
source "${REPO_ROOT}/.claude/commands/vg/_shared/lib/phase-profile.sh" 2>/dev/null
source "${REPO_ROOT}/.claude/commands/vg/_shared/lib/block-resolver.sh" 2>/dev/null || true
# 1. Parse SPECS success_criteria
SMOKE_JSON=$(parse_success_criteria "$PHASE_DIR")
SMOKE_COUNT=$(${PYTHON_BIN} -c "import json,sys; print(len(json.loads(sys.argv[1])))" "$SMOKE_JSON" 2>/dev/null || echo 0)
if [ "$SMOKE_COUNT" -eq 0 ]; then
echo "⛔ SPECS has no '## Success criteria' checklist bullets — infra-smoke needs implicit goals." >&2
echo " Fix: add markdown checklist ('- [ ] `cmd` → expected') to SPECS.md" >&2
exit 1
fi
echo "▸ Infra-smoke: phát hiện ${SMOKE_COUNT} implicit goals từ success_criteria"
echo "$SMOKE_JSON" > "${PHASE_DIR}/.success-criteria.json"
# 2. Determine run_prefix from env (--sandbox flag or config.step_env.verify)
RUN_PREFIX=""
ENV_NAME="${VG_ENV:-}"
if [ -z "$ENV_NAME" ]; then
if [[ "$ARGUMENTS" =~ --sandbox ]]; then ENV_NAME="sandbox"
elif [[ "$ARGUMENTS" =~ --local ]]; then ENV_NAME="local"
else ENV_NAME="${CONFIG_STEP_ENV_VERIFY:-local}"
fi
fi
# NOTE: commands in SPECS typically already include `ssh vollx`; don't double-prefix
# when command already has the run_prefix. phase-profile keeps this simple — run as-is.
# 3. Run each bullet, record status
RESULTS_JSON="${PHASE_DIR}/.infra-smoke-results.json"
${PYTHON_BIN} - "$SMOKE_JSON" "$RESULTS_JSON" "$ENV_NAME" <<'PY'
import json, sys, subprocess, shlex, time
smoke = json.loads(sys.argv[1])
out_path = sys.argv[2]
env_name = sys.argv[3]
results = []
for entry in smoke:
sid = entry['id']
cmd = entry.get('cmd','').strip()
expected = entry.get('expected','').strip()
raw = entry.get('raw','')
if not cmd:
results.append({"id":sid,"status":"UNREACHABLE","reason":"no bash command in bullet","raw":raw})
continue
if cmd.startswith('/vg:') or cmd.startswith('/gsd:'):
# Slash commands — not directly runnable here. Mark DEFERRED.
results.append({"id":sid,"status":"DEFERRED","reason":f"slash command requires orchestrator: {cmd}","raw":raw})
continue
try:
t0 = time.time()
p = subprocess.run(cmd, shell=True, capture_output=True, text=True, timeout=60)
dur = round(time.time() - t0, 2)
ok = p.returncode == 0
if ok and expected:
# Expected substring must appear in combined output
combined = (p.stdout or '') + (p.stderr or '')
ok = expected.split()[0] in combined or expected in combined
status = "READY" if ok else "BLOCKED"
tail = ((p.stdout or '')[-300:] + (p.stderr or '')[-200:]).replace('\n',' | ')
results.append({"id":sid,"status":status,"exit":p.returncode,"dur":dur,"expected":expected,"evidence":tail[:600],"raw":raw})
except subprocess.TimeoutExpired:
results.append({"id":sid,"status":"BLOCKED","reason":"timeout 60s","raw":raw})
except Exception as e:
results.append({"id":sid,"status":"FAILED","reason":str(e),"raw":raw})
with open(out_path,'w',encoding='utf-8') as f:
json.dump({"env":env_name,"results":results,"generated_at":time.strftime('%Y-%m-%dT%H:%M:%SZ',time.gmtime())}, f, ensure_ascii=False, indent=2)
PY
# 4. Display human-readable summary
${PYTHON_BIN} - "$RESULTS_JSON" <<'PY'
import json, sys
d = json.load(open(sys.argv[1], encoding='utf-8'))
r = d['results']
ready = sum(1 for x in r if x['status']=='READY')
blocked = sum(1 for x in r if x['status']=='BLOCKED')
failed = sum(1 for x in r if x['status']=='FAILED')
deferred = sum(1 for x in r if x['status']=='DEFERRED')
unreach = sum(1 for x in r if x['status']=='UNREACHABLE')
print(f"\n┌─ Infra-smoke results (env={d['env']}) ─────────────────")
for x in r:
icon = {'READY':'✓','BLOCKED':'⛔','FAILED':'✗','DEFERRED':'⟳','UNREACHABLE':'⚠'}.get(x['status'],'?')
print(f"│ {icon} {x['id']} [{x['status']}] {x.get('raw','')[:70]}")
if x['status'] in ('BLOCKED','FAILED'):
print(f"│ └─ {x.get('reason') or x.get('evidence','')[:160]}")
print(f"├─ Summary: READY={ready} BLOCKED={blocked} FAILED={failed} DEFERRED={deferred} UNREACHABLE={unreach} (total={len(r)})")
print("└──────────────────────────────────────────────────────────")
PY
# 5. Write GOAL-COVERAGE-MATRIX.md with implicit goals
${PYTHON_BIN} - "$RESULTS_JSON" "${PHASE_DIR}/GOAL-COVERAGE-MATRIX.md" "$PHASE_NUMBER" "$PHASE_PROFILE" <<'PY'
import json, sys
from datetime import datetime, timezone
results = json.load(open(sys.argv[1], encoding='utf-8'))['results']
out_path = sys.argv[2]
phase = sys.argv[3]
profile = sys.argv[4]
lines = [
f"# Goal Coverage Matrix — Phase {phase}",
"",
f"**Profile:** {profile} ",
f"**Source:** SPECS.success_criteria (implicit goals) ",
f"**Generated:** {datetime.now(timezone.utc).strftime('%Y-%m-%dT%H:%M:%SZ')} ",
f"**Review mode:** infra-smoke",
"",
"## Implicit Goals (from SPECS `## Success criteria`)",
"",
"| Goal | Status | Command | Evidence |",
"|------|--------|---------|----------|",
]
for r in results:
raw = r.get('raw','').replace('|',r'\|')[:120]
ev = (r.get('evidence') or r.get('reason') or '').replace('|',r'\|')[:120]
lines.append(f"| {r['id']} | {r['status']} | {raw} | {ev} |")
ready = sum(1 for r in results if r['status']=='READY')
total = len(results)
pct = round(100*ready/total, 1) if total else 0
lines += ["", f"## Gate", "", f"**Pass rate:** {ready}/{total} ({pct}%) READY ",
f"**Verdict:** {'PASS' if ready == total else 'BLOCK'}", ""]
open(out_path,'w',encoding='utf-8').write('\n'.join(lines) + '\n')
print(f"✓ GOAL-COVERAGE-MATRIX.md written with {total} implicit goals ({ready} READY)")
PY
# 6. Gate check + block_resolve fallback
READY_COUNT=$(${PYTHON_BIN} -c "import json; d=json.load(open('$RESULTS_JSON')); print(sum(1 for r in d['results'] if r['status']=='READY'))")
TOTAL=$(${PYTHON_BIN} -c "import json; d=json.load(open('$RESULTS_JSON')); print(len(d['results']))")
if [ "$READY_COUNT" -ne "$TOTAL" ]; then
echo "⛔ Infra-smoke gate: ${READY_COUNT}/${TOTAL} goals READY — phase NOT yet provisioned."
if type -t block_resolve >/dev/null 2>&1; then
export VG_CURRENT_PHASE="$PHASE_NUMBER" VG_CURRENT_STEP="review.infra-smoke"
BR_GATE_CONTEXT="Infra-smoke review: ${TOTAL} SPECS success_criteria checked, only ${READY_COUNT} READY. Remaining BLOCKED/FAILED/DEFERRED imply provisioning incomplete on env='${ENV_NAME}'."
BR_EVIDENCE=$(cat "$RESULTS_JSON")
BR_CANDIDATES='[{"id":"re-run-ansible","cmd":"echo would re-run ansible-playbook (user must chạy explicitly)","confidence":0.3,"rationale":"re-run provisioning may fix BLOCKED infra"}]'
BR_RESULT=$(block_resolve "infra-smoke-not-ready" "$BR_GATE_CONTEXT" "$BR_EVIDENCE" "$PHASE_DIR" "$BR_CANDIDATES")
BR_LEVEL=$(echo "$BR_RESULT" | ${PYTHON_BIN} -c "import json,sys; print(json.loads(sys.stdin.read()).get('level',''))" 2>/dev/null)
[ "$BR_LEVEL" = "L2" ] && { block_resolve_l2_handoff "infra-smoke-not-ready" "$BR_RESULT" "$PHASE_DIR"; exit 2; }
fi
exit 1
fi
echo "✓ Infra-smoke PASS (${READY_COUNT}/${TOTAL}) — phase provisioned as specified."
(type -t mark_step >/dev/null 2>&1 && mark_step "${PHASE_NUMBER:-unknown}" "phaseP_infra_smoke" "${PHASE_DIR}") || touch "${PHASE_DIR}/.step-markers/phaseP_infra_smoke.done"
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator mark-step review phaseP_infra_smoke 2>/dev/null || true
mkdir -p "${PHASE_DIR}/.tmp" 2>/dev/null
if [ -f "${REPO_ROOT}/.claude/scripts/review-lens-plan.py" ]; then
"${PYTHON_BIN:-python3}" "${REPO_ROOT}/.claude/scripts/review-lens-plan.py" \
--phase-dir "$PHASE_DIR" --profile "${PROFILE:-${CONFIG_PROFILE:-web-fullstack}}" --mode "$REVIEW_MODE" \
--write --validate-only --json > "${PHASE_DIR}/.tmp/review-lens-plan-validation.json" 2>&1 || {
echo "⛔ Infra-smoke checklist evidence missing — see ${PHASE_DIR}/.tmp/review-lens-plan-validation.json" >&2
DIAG_SCRIPT="${REPO_ROOT}/.claude/scripts/review-block-diagnostic.py"
if [ -f "$DIAG_SCRIPT" ]; then
"${PYTHON_BIN:-python3}" "$DIAG_SCRIPT" \
--gate-id "review.phaseP_infra_smoke_lens_plan" \
--phase-dir "$PHASE_DIR" \
--input "${PHASE_DIR}/.tmp/review-lens-plan-validation.json" \
--out-md "${PHASE_DIR}/.tmp/review-lens-plan-diagnostic.md" \
>/dev/null 2>&1 || true
cat "${PHASE_DIR}/.tmp/review-lens-plan-diagnostic.md" 2>/dev/null || true
fi
exit 1
}
fi
# Exit review early — subsequent steps (browser, goal comparison) N/A for infra profile.
exit 0
fi
## Review mode: delta (P5, v1.9.2 — OHOK Batch 2 B5: real verification)
Runs when REVIEW_MODE=delta (hotfix profile).
Previous behavior (performative): wrote "Verdict: PASS" stub without actually verifying hotfix touches parent failures. Hotfix could ship completely untested. OHOK Batch 2 B5 replaces stub with per-goal verification loop.
Logic: hotfix patches a parent phase. Review MUST:
if [ "$REVIEW_MODE" != "delta" ]; then
echo "↷ Skipping phaseP_delta (REVIEW_MODE=$REVIEW_MODE)"
else
# 1. Resolve parent phase
PARENT_REF=$(grep -E '^\*\*Parent phase:\*\*|^parent_phase:' "$PHASE_DIR/SPECS.md" 2>/dev/null | \
sed -E 's/.*(Parent phase:\*\*|parent_phase:)\s*//' | awk '{print $1}' | head -1)
if [ -z "$PARENT_REF" ]; then
echo "⛔ Hotfix profile but no parent_phase in SPECS.md — cannot derive delta context" >&2
exit 1
fi
PARENT_DIR=$(ls -d "${PHASES_DIR}/${PARENT_REF}"* 2>/dev/null | head -1)
if [ -z "$PARENT_DIR" ]; then
echo "⛔ Parent phase dir not found for ref '${PARENT_REF}'" >&2
exit 1
fi
PARENT_MATRIX="${PARENT_DIR}/GOAL-COVERAGE-MATRIX.md"
echo "▸ Delta review: parent=${PARENT_REF} → ${PARENT_DIR}"
# 2. Extract parent failed/blocked goals (actionable subset — UNREACHABLE/INFRA_PENDING
# are parent-domain issues hotfix can't resolve, exclude from coverage gate)
FAILED_GOALS=""
if [ -f "$PARENT_MATRIX" ]; then
FAILED_GOALS=$(grep -E '\|[[:space:]]*(BLOCKED|FAILED)[[:space:]]*\|' "$PARENT_MATRIX" | \
grep -oE 'G-[0-9]+' | sort -u)
FAILED_COUNT=$([ -z "$FAILED_GOALS" ] && echo 0 || echo "$FAILED_GOALS" | wc -l | tr -d ' ')
echo "▸ Parent BLOCKED/FAILED goals (${FAILED_COUNT}): $(echo $FAILED_GOALS | tr '\n' ' ')"
else
echo "⚠ Parent has no GOAL-COVERAGE-MATRIX — cannot verify delta coverage"
FAILED_COUNT=0
fi
# 3. Extract delta files (changes made in THIS phase — current commits only)
PHASE_COMMITS=$(git log --format=%H -- "${PHASE_DIR}" 2>/dev/null | head -1)
BASELINE_SHA=$(git rev-parse HEAD~1 2>/dev/null || git rev-parse HEAD 2>/dev/null)
DELTA_FILES=$(git diff --name-only "${BASELINE_SHA}" HEAD -- \
'apps/**/src/**' 'packages/**/src/**' 'infra/**' 2>/dev/null | sort -u)
DELTA_COUNT=$([ -z "$DELTA_FILES" ] && echo 0 || echo "$DELTA_FILES" | wc -l | tr -d ' ')
if [ "$DELTA_COUNT" -eq 0 ]; then
echo "⛔ Hotfix phase has 0 code files changed (apps/**/src|packages/**/src|infra/**)" >&2
echo " Hotfix must modify at least 1 code file to be meaningful." >&2
echo " Override: --allow-empty-hotfix (log to override-debt)" >&2
if [[ ! "${ARGUMENTS}" =~ --allow-empty-hotfix ]]; then
exit 1
fi
source "${REPO_ROOT}/.claude/commands/vg/_shared/lib/override-debt.sh" 2>/dev/null || true
type -t log_override_debt >/dev/null 2>&1 && \
log_override_debt "phaseP-delta-empty-hotfix" "${PHASE_NUMBER}" "hotfix has no code delta" "${PHASE_DIR}"
fi
# 4. For each failed goal, check if delta files overlap with files parent modified
# when the goal was recorded as failing. Proxy: grep parent commits mentioning G-XX
# for file paths, check intersection with DELTA_FILES.
COVERAGE_JSON="${PHASE_DIR}/.delta-coverage.json"
${PYTHON_BIN} - "$PARENT_DIR" "$PHASE_DIR" "$COVERAGE_JSON" "$PARENT_REF" <<'PY' || true
import json, re, subprocess, sys
from pathlib import Path
parent_dir, phase_dir, out_path, parent_ref = sys.argv[1:5]
matrix = Path(parent_dir) / "GOAL-COVERAGE-MATRIX.md"
failed = []
if matrix.exists():
for line in matrix.read_text(encoding="utf-8", errors="replace").splitlines():
m = re.search(r'\|\s*(G-\d+)\s*\|.*\|\s*(BLOCKED|FAILED)\s*\|', line)
if m:
failed.append(m.group(1))
# Delta files (current hotfix phase)
try:
r = subprocess.run(
["git", "diff", "--name-only", "HEAD~1", "HEAD",
"--", "apps/**/src/**", "packages/**/src/**", "infra/**"],
capture_output=True, text=True, timeout=10,
)
delta_files = set(f.strip() for f in r.stdout.splitlines() if f.strip())
except Exception:
delta_files = set()
# Per-goal overlap (CrossAI R6 fix).
# Previously: one global parent file set → any touched parent file false-PASSes
# every failed goal. Now: for each failed goal, find files in parent commits
# that cite that goal_id, compute overlap per-goal.
def _files_for_goal(goal_id: str) -> set[str]:
"""Files changed in parent commits whose message cites `goal_id`.
Proxy for "files involved in this goal's implementation/failure". Limits
by default to code paths. Falls back to empty set if grep yields nothing
(goal may have no associated commit — e.g., goal added but never worked on).
"""
try:
r = subprocess.run(
["git", "log", f"--grep={goal_id}", "--name-only", "--format=",
"--", str(parent_dir), "apps", "packages", "infra"],
capture_output=True, text=True, timeout=10,
)
return {
ln.strip() for ln in r.stdout.splitlines()
if ln.strip() and any(
ln.startswith(p) for p in ("apps/", "packages/", "infra/")
)
}
except Exception:
return set()
per_goal: dict[str, dict] = {}
parent_files: set[str] = set()
goals_covered = 0
goals_orthogonal = 0
for g in failed:
gf = _files_for_goal(g)
parent_files |= gf
ovl = sorted(gf & delta_files)
covered = bool(ovl)
if covered:
goals_covered += 1
elif gf:
# Goal has known files but none overlap with delta — orthogonal
goals_orthogonal += 1
per_goal[g] = {
"parent_files": sorted(gf),
"parent_files_count": len(gf),
"overlap_files": ovl,
"overlap_count": len(ovl),
"covered": covered,
"has_goal_commits": bool(gf),
}
overlap = sorted(parent_files & delta_files)
coverage_pct = (len(overlap) / len(parent_files) * 100) if parent_files else 0
out = {
"parent_ref": parent_ref,
"failed_goals": failed,
"parent_files_count": len(parent_files),
"delta_files_count": len(delta_files),
"overlap_files": overlap,
"overlap_count": len(overlap),
"coverage_pct_of_parent": round(coverage_pct, 1),
"per_goal": per_goal,
"goals_covered": goals_covered,
"goals_orthogonal": goals_orthogonal,
"goals_no_commits": sum(1 for d in per_goal.values() if not d["has_goal_commits"]),
}
Path(out_path).write_text(json.dumps(out, indent=2))
print(f"✓ delta coverage: {goals_covered}/{len(failed)} failed goals have file overlap "
f"({goals_orthogonal} orthogonal, {out['goals_no_commits']} unmapped); "
f"total overlap {len(overlap)}/{len(parent_files)} files")
PY
# 5. Gate: per-goal coverage (CrossAI R6 fix).
# Previously: one global overlap → any touched parent file false-PASSed every
# goal. Now: each failed goal evaluated independently. Block if ANY failed
# goal with known commits has zero overlap with delta.
GOALS_COVERED=$("${PYTHON_BIN}" -c "import json; print(json.load(open('${COVERAGE_JSON}')).get('goals_covered',0))" 2>/dev/null || echo 0)
GOALS_ORTHOGONAL=$("${PYTHON_BIN}" -c "import json; print(json.load(open('${COVERAGE_JSON}')).get('goals_orthogonal',0))" 2>/dev/null || echo 0)
GOALS_UNMAPPED=$("${PYTHON_BIN}" -c "import json; print(json.load(open('${COVERAGE_JSON}')).get('goals_no_commits',0))" 2>/dev/null || echo 0)
if [ "${FAILED_COUNT:-0}" -gt 0 ] && [ "${GOALS_ORTHOGONAL:-0}" -gt 0 ]; then
echo "⛔ ${GOALS_ORTHOGONAL} of ${FAILED_COUNT} failed parent goal(s) have known commits but delta touches NONE of their files." >&2
echo " Covered: ${GOALS_COVERED}/${FAILED_COUNT}" >&2
echo " Orthogonal: ${GOALS_ORTHOGONAL}/${FAILED_COUNT} ← hotfix does not address these" >&2
echo " Unmapped: ${GOALS_UNMAPPED}/${FAILED_COUNT} (no parent commits cite these goals)" >&2
echo " Delta files: ${DELTA_COUNT}" >&2
echo " Options:" >&2
echo " (a) Ensure hotfix edits files involved in each failed goal" >&2
echo " (b) /vg:scope ${PHASE_NUMBER} — re-scope if truly unrelated" >&2
echo " (c) --allow-orthogonal-hotfix override (debt logged, hotfix ships without per-goal coverage)" >&2
if [[ ! "${ARGUMENTS}" =~ --allow-orthogonal-hotfix ]]; then
exit 1
fi
source "${REPO_ROOT}/.claude/commands/vg/_shared/lib/override-debt.sh" 2>/dev/null || true
# v2.6.1 (2026-04-26): fix arg-ordering bug — was passing flag-as-step,
# phase-dir-as-reason, missing gate_id. Function signature is:
# log_override_debt FLAG PHASE STEP REASON [GATE_ID]
# gate_id="review-goal-coverage" enables auto-resolve when next phase
# review goal-coverage validator passes clean.
type -t log_override_debt >/dev/null 2>&1 && \
log_override_debt "--allow-orthogonal-hotfix" "${PHASE_NUMBER}" "review.goal-coverage" \
"${GOALS_ORTHOGONAL}/${FAILED_COUNT} failed goals uncovered per-goal — hotfix delta orthogonal to failed goals" \
"review-goal-coverage"
fi
# Preserve legacy OVERLAP_COUNT var for downstream consumers
OVERLAP_COUNT=$("${PYTHON_BIN}" -c "import json; print(json.load(open('${COVERAGE_JSON}'))['overlap_count'])" 2>/dev/null || echo 0)
# 6. Write matrix with actual per-goal delta-overlap status
${PYTHON_BIN} - "${PHASE_DIR}/GOAL-COVERAGE-MATRIX.md" "$PHASE_NUMBER" "$PARENT_REF" "$COVERAGE_JSON" <<'PY'
import json, sys
from datetime import datetime, timezone
from pathlib import Path
out_path, phase, parent_ref, cov_json = sys.argv[1:5]
cov = json.loads(Path(cov_json).read_text(encoding="utf-8")) if Path(cov_json).exists() else {}
failed = cov.get("failed_goals", [])
overlap = cov.get("overlap_files", [])
delta_count = cov.get("delta_files_count", 0)
parent_files_count = cov.get("parent_files_count", 0)
coverage_pct = cov.get("coverage_pct_of_parent", 0)
per_goal = cov.get("per_goal", {})
goals_covered = cov.get("goals_covered", 0)
goals_orthogonal = cov.get("goals_orthogonal", 0)
goals_no_commits = cov.get("goals_no_commits", 0)
# Decide verdict using PER-GOAL coverage (CrossAI R6 fix).
# Previously: any global overlap = PASS for all goals. Now: goals tracked
# individually. BLOCK if any failed goal with known commits has no per-goal
# overlap with delta.
if failed and goals_orthogonal > 0:
verdict = (f"BLOCK ({goals_orthogonal}/{len(failed)} failed goals orthogonal — "
f"hotfix doesn't touch their files)")
elif failed and goals_covered == 0 and goals_no_commits == len(failed):
verdict = (f"FLAG (all {len(failed)} failed goals have no parent commits — "
f"cannot verify per-goal coverage; /vg:test must re-verify each)")
elif failed and goals_covered > 0:
verdict = (f"PASS ({goals_covered}/{len(failed)} failed goals have file overlap; "
f"{goals_no_commits} unmapped deferred to /vg:test)")
elif not failed:
verdict = "PASS (parent had no failed goals; delta review defers full goal re-check to /vg:test)"
else:
verdict = "PASS (no parent matrix found; /vg:test will handle regression)"
lines = [
f"# Goal Coverage Matrix — Phase {phase} (hotfix delta)",
"",
f"**Profile:** hotfix ",
f"**Parent phase:** {parent_ref} ",
f"**Source:** parent GOAL-COVERAGE-MATRIX + git delta vs HEAD~1 ",
f"**Generated:** {datetime.now(timezone.utc).strftime('%Y-%m-%dT%H:%M:%SZ')} ",
"",
"## Parent failed goals (per-goal overlap)",
"",
]
if failed:
lines.append("| Goal | Status | Parent files | Overlap | Verdict |")
lines.append("|------|--------|--------------|---------|---------|")
for g in failed:
pg = per_goal.get(g, {})
pf_count = pg.get("parent_files_count", 0)
ov_count = pg.get("overlap_count", 0)
if pg.get("covered"):
mark = f"COVERED ({ov_count}/{pf_count})"
elif pg.get("has_goal_commits"):
mark = f"ORTHOGONAL (0/{pf_count})"
else:
mark = "UNMAPPED (no parent commits cite goal)"
lines.append(f"| {g} | BLOCKED/FAILED (parent) | {pf_count} | {ov_count} | {mark} |")
else:
lines.append("_no parent failed/blocked goals found_")
lines += [
"",
"## Delta changes",
"",
f"**Files changed (code paths):** {delta_count}",
f"**Overlap with parent files:** {len(overlap)}/{parent_files_count} ({coverage_pct:.1f}%)",
"",
]
if overlap:
lines.append("Sample overlapping files:")
for f in overlap[:10]:
lines.append(f"- `{f}`")
lines += [
"",
"## Gate",
"",
f"**Verdict:** {verdict}",
"",
]
Path(out_path).write_text('\n'.join(lines) + '\n', encoding='utf-8')
print(f"✓ GOAL-COVERAGE-MATRIX.md written — verdict: {verdict.split(' (')[0]}")
PY
mkdir -p "${PHASE_DIR}/.step-markers" 2>/dev/null
(type -t mark_step >/dev/null 2>&1 && mark_step "${PHASE_NUMBER:-unknown}" "phaseP_delta" "${PHASE_DIR}") || touch "${PHASE_DIR}/.step-markers/phaseP_delta.done"
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator mark-step review phaseP_delta 2>/dev/null || true
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator emit-event "review.phaseP_delta_verified" \
--payload "{\"phase\":\"${PHASE_NUMBER}\",\"parent\":\"${PARENT_REF}\",\"overlap_count\":${OVERLAP_COUNT:-0},\"failed_count\":${FAILED_COUNT:-0}}" >/dev/null 2>&1 || true
mkdir -p "${PHASE_DIR}/.tmp" 2>/dev/null
if [ -f "${REPO_ROOT}/.claude/scripts/review-lens-plan.py" ]; then
"${PYTHON_BIN:-python3}" "${REPO_ROOT}/.claude/scripts/review-lens-plan.py" \
--phase-dir "$PHASE_DIR" --profile "${PROFILE:-${CONFIG_PROFILE:-web-fullstack}}" --mode "$REVIEW_MODE" \
--write --validate-only --json > "${PHASE_DIR}/.tmp/review-lens-plan-validation.json" 2>&1 || {
echo "⛔ Delta checklist evidence missing — see ${PHASE_DIR}/.tmp/review-lens-plan-validation.json" >&2
DIAG_SCRIPT="${REPO_ROOT}/.claude/scripts/review-block-diagnostic.py"
if [ -f "$DIAG_SCRIPT" ]; then
"${PYTHON_BIN:-python3}" "$DIAG_SCRIPT" \
--gate-id "review.phaseP_delta_lens_plan" \
--phase-dir "$PHASE_DIR" \
--input "${PHASE_DIR}/.tmp/review-lens-plan-validation.json" \
--out-md "${PHASE_DIR}/.tmp/review-lens-plan-diagnostic.md" \
>/dev/null 2>&1 || true
cat "${PHASE_DIR}/.tmp/review-lens-plan-diagnostic.md" 2>/dev/null || true
fi
exit 1
}
fi
exit 0
fi
## Review mode: regression (P5, v1.9.2 — bugfix profile, OHOK Batch 2 B5: real verification)
Runs when REVIEW_MODE=regression.
Previous behavior (performative): wrote "Verdict: PASS (regression handled at /vg:test)" stub without verifying (a) issue is referenced, (b) code was actually changed, or (c) regression test exists. Bugfix could ship with empty changeset.
OHOK Batch 2 B5 enforces 3 real checks:
if [ "$REVIEW_MODE" != "regression" ]; then
echo "↷ Skipping phaseP_regression (REVIEW_MODE=$REVIEW_MODE)"
else
# 1. Extract bug reference — MUST exist
BUG_REF=$(grep -E '^\*\*issue_id\*\*:|^issue_id:|^\*\*bug_ref\*\*:|^bug_ref:|^\*\*Fixes bug\*\*:' \
"$PHASE_DIR/SPECS.md" 2>/dev/null | sed -E 's/.*://; s/^\s*//' | head -1)
if [ -z "$BUG_REF" ]; then
echo "⛔ Bugfix profile requires issue_id/bug_ref in SPECS.md — no reference found" >&2
echo " Add to SPECS frontmatter: issue_id: JIRA-123" >&2
echo " or body: **Fixes bug**: GH#456" >&2
if [[ ! "${ARGUMENTS}" =~ --allow-no-bugref ]]; then
exit 1
fi
source "${REPO_ROOT}/.claude/commands/vg/_shared/lib/override-debt.sh" 2>/dev/null || true
# v2.6.1 (2026-04-26): correct API call (FLAG PHASE STEP REASON GATE_ID)
# gate_id="bugfix-bugref-required" enables auto-resolve when next review
# finds the bugref later added.
type -t log_override_debt >/dev/null 2>&1 && \
log_override_debt "--allow-no-bugref" "${PHASE_NUMBER}" "review.bugref-check" \
"bugfix without issue_id — SPECS frontmatter missing issue_id/bug_ref/Fixes bug" \
"bugfix-bugref-required"
BUG_REF="<unspecified>"
fi
echo "▸ Regression review (bugfix): issue_ref=${BUG_REF}"
# 2. Phase must have ≥1 code commit — empty bugfix is meaningless
BASELINE_SHA=$(git rev-parse HEAD~1 2>/dev/null || git rev-parse HEAD 2>/dev/null)
CODE_FILES=$(git diff --name-only "${BASELINE_SHA}" HEAD -- \
'apps/**/src/**' 'packages/**/src/**' 'infra/**' 2>/dev/null | sort -u)
CODE_COUNT=$([ -z "$CODE_FILES" ] && echo 0 || echo "$CODE_FILES" | wc -l | tr -d ' ')
if [ "$CODE_COUNT" -eq 0 ]; then
echo "⛔ Bugfix phase has 0 code files changed in apps|packages|infra" >&2
echo " Bugfix must modify at least 1 production file." >&2
if [[ ! "${ARGUMENTS}" =~ --allow-empty-bugfix ]]; then
exit 1
fi
source "${REPO_ROOT}/.claude/commands/vg/_shared/lib/override-debt.sh" 2>/dev/null || true
# v2.6.1 (2026-04-26): correct API call (FLAG PHASE STEP REASON GATE_ID)
# gate_id="bugfix-code-delta-required" enables auto-resolve when next
# review finds non-empty code delta in apps|packages|infra.
type -t log_override_debt >/dev/null 2>&1 && \
log_override_debt "--allow-empty-bugfix" "${PHASE_NUMBER}" "review.code-delta-check" \
"bugfix has 0 code files changed in apps|packages|infra — no production delta" \
"bugfix-code-delta-required"
fi
# 3. Scan for regression test — WARN if missing (don't block; /vg:test catches real gaps)
TEST_FILES=$(git diff --name-only "${BASELINE_SHA}" HEAD -- \
'**/e2e/**/*.spec.ts' '**/__tests__/**' '**/*.test.ts' '**/*.test.js' \
'**/tests/**/*.py' 2>/dev/null | sort -u)
TEST_COUNT=$([ -z "$TEST_FILES" ] && echo 0 || echo "$TEST_FILES" | wc -l | tr -d ' ')
BUG_ID_SAFE=$(echo "$BUG_REF" | grep -oE '[A-Za-z0-9_-]+' | head -1)
# Look for test file mentioning the bug ID (by name or content)
TEST_MENTIONS_BUG=0
if [ -n "$BUG_ID_SAFE" ] && [ "$TEST_COUNT" -gt 0 ]; then
for f in $TEST_FILES; do
if [ -f "$f" ]; then
if grep -qiE "(${BUG_ID_SAFE}|regression|bugfix)" "$f" 2>/dev/null; then
TEST_MENTIONS_BUG=1
break
fi
fi
done
fi
if [ "$TEST_COUNT" -eq 0 ]; then
echo "⚠ Bugfix introduces no test files — /vg:test will attempt to generate regression coverage" >&2
REGRESSION_NOTE="no-test-added (WARN — /vg:test to generate)"
elif [ "$TEST_MENTIONS_BUG" -eq 0 ]; then
echo "⚠ Bugfix has ${TEST_COUNT} test files but none reference bug ID '${BUG_ID_SAFE}'" >&2
REGRESSION_NOTE="test-files-unlinked (WARN — consider adding bug ID comment to test)"
else
echo "✓ Bugfix has ${TEST_COUNT} test files, at least one references bug ID" >&2
REGRESSION_NOTE="test-linked (OK)"
fi
# 4. Write matrix with actual verification results
${PYTHON_BIN} - "${PHASE_DIR}/GOAL-COVERAGE-MATRIX.md" "$PHASE_NUMBER" "$BUG_REF" \
"$CODE_COUNT" "$TEST_COUNT" "$TEST_MENTIONS_BUG" "$REGRESSION_NOTE" <<'PY'
import sys
from datetime import datetime, timezone
out, phase, bug, code_count, test_count, test_mentions, note = sys.argv[1:8]
code_count = int(code_count); test_count = int(test_count); test_mentions = int(test_mentions)
if code_count == 0:
verdict = "BLOCK (empty bugfix — no code changes)"
elif test_count > 0 and test_mentions:
verdict = f"PASS (bugfix has {code_count} code files + linked test; /vg:test re-verifies)"
elif test_count > 0:
verdict = f"PASS-WARN (bugfix has {code_count} code files + {test_count} tests unlinked to bug)"
else:
verdict = f"PASS-WARN (bugfix has {code_count} code files but 0 tests; /vg:test must generate)"
lines = [
f"# Goal Coverage Matrix — Phase {phase} (bugfix regression)",
"",
f"**Profile:** bugfix ",
f"**Bug reference:** {bug} ",
f"**Source:** SPECS.md issue_id + git delta vs HEAD~1 ",
f"**Generated:** {datetime.now(timezone.utc).strftime('%Y-%m-%dT%H:%M:%SZ')}",
"",
"## Verification checks",
"",
"| Check | Status |",
"|-------|--------|",
f"| Bug reference present | {'✓' if bug != '<unspecified>' else '⛔ missing'} |",
f"| Code files changed | {code_count} |",
f"| Test files changed | {test_count} |",
f"| Tests reference bug ID | {'✓' if test_mentions else 'no'} |",
f"| Regression note | {note} |",
"",
"## Gate",
"",
f"**Verdict:** {verdict}",
"",
"**Next:** /vg:test runs issue-specific runner to re-verify bug is actually fixed.",
"",
]
open(out, 'w', encoding='utf-8').write('\n'.join(lines) + '\n')
print(f"✓ GOAL-COVERAGE-MATRIX.md written — {verdict.split(' (')[0]}")
PY
mkdir -p "${PHASE_DIR}/.step-markers" 2>/dev/null
(type -t mark_step >/dev/null 2>&1 && mark_step "${PHASE_NUMBER:-unknown}" "phaseP_regression" "${PHASE_DIR}") || touch "${PHASE_DIR}/.step-markers/phaseP_regression.done"
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator mark-step review phaseP_regression 2>/dev/null || true
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator emit-event "review.phaseP_regression_verified" \
--payload "{\"phase\":\"${PHASE_NUMBER}\",\"bug_ref\":\"${BUG_REF}\",\"code_count\":${CODE_COUNT},\"test_count\":${TEST_COUNT},\"test_linked\":${TEST_MENTIONS_BUG}}" >/dev/null 2>&1 || true
mkdir -p "${PHASE_DIR}/.tmp" 2>/dev/null
if [ -f "${REPO_ROOT}/.claude/scripts/review-lens-plan.py" ]; then
"${PYTHON_BIN:-python3}" "${REPO_ROOT}/.claude/scripts/review-lens-plan.py" \
--phase-dir "$PHASE_DIR" --profile "${PROFILE:-${CONFIG_PROFILE:-web-fullstack}}" --mode "$REVIEW_MODE" \
--write --validate-only --json > "${PHASE_DIR}/.tmp/review-lens-plan-validation.json" 2>&1 || {
echo "⛔ Regression checklist evidence missing — see ${PHASE_DIR}/.tmp/review-lens-plan-validation.json" >&2
DIAG_SCRIPT="${REPO_ROOT}/.claude/scripts/review-block-diagnostic.py"
if [ -f "$DIAG_SCRIPT" ]; then
"${PYTHON_BIN:-python3}" "$DIAG_SCRIPT" \
--gate-id "review.phaseP_regression_lens_plan" \
--phase-dir "$PHASE_DIR" \
--input "${PHASE_DIR}/.tmp/review-lens-plan-validation.json" \
--out-md "${PHASE_DIR}/.tmp/review-lens-plan-diagnostic.md" \
>/dev/null 2>&1 || true
cat "${PHASE_DIR}/.tmp/review-lens-plan-diagnostic.md" 2>/dev/null || true
fi
exit 1
}
fi
exit 0
fi
## Review mode: schema-verify (P5, v1.9.2 — migration profile)
if [ "$REVIEW_MODE" != "schema-verify" ]; then
echo "↷ Skipping phaseP_schema_verify (REVIEW_MODE=$REVIEW_MODE)"
else
echo "▸ Schema-verify review (migration): checking ROLLBACK.md + migration files"
# Minimum verification: ROLLBACK.md present (already checked in prereq),
# migration files referenced in PLAN exist.
MISSING_MIG=""
for f in $(grep -oE '<file-path>[^<]*migrations[^<]*\.sql[^<]*</file-path>' "${PHASE_DIR}/PLAN.md" 2>/dev/null | \
sed -E 's/<\/?file-path>//g'); do
[ -f "$f" ] || MISSING_MIG="${MISSING_MIG} $f"
done
if [ -n "$MISSING_MIG" ]; then
echo "⛔ Migration files referenced in PLAN but missing:$MISSING_MIG"
exit 1
fi
${PYTHON_BIN} - "${PHASE_DIR}/GOAL-COVERAGE-MATRIX.md" "$PHASE_NUMBER" <<'PY'
import sys
from datetime import datetime, timezone
out, phase = sys.argv[1], sys.argv[2]
lines = [
f"# Goal Coverage Matrix — Phase {phase} (migration schema-verify)",
"",
"**Profile:** migration ",
"**Source:** SPECS.migration_plan + ROLLBACK.md ",
f"**Generated:** {datetime.now(timezone.utc).strftime('%Y-%m-%dT%H:%M:%SZ')}",
"",
"## Schema verification",
"",
"- ROLLBACK.md present",
"- Migration files referenced in PLAN exist on disk",
"- Schema round-trip validation deferred to /vg:test schema-roundtrip runner",
"",
"**Verdict:** PASS",
"",
]
open(out, 'w', encoding='utf-8').write('\n'.join(lines) + '\n')
print("✓ GOAL-COVERAGE-MATRIX.md written (migration schema-verify)")
PY
(type -t mark_step >/dev/null 2>&1 && mark_step "${PHASE_NUMBER:-unknown}" "phaseP_schema_verify" "${PHASE_DIR}") || touch "${PHASE_DIR}/.step-markers/phaseP_schema_verify.done"
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator mark-step review phaseP_schema_verify 2>/dev/null || true
mkdir -p "${PHASE_DIR}/.tmp" 2>/dev/null
if [ -f "${REPO_ROOT}/.claude/scripts/review-lens-plan.py" ]; then
"${PYTHON_BIN:-python3}" "${REPO_ROOT}/.claude/scripts/review-lens-plan.py" \
--phase-dir "$PHASE_DIR" --profile "${PROFILE:-${CONFIG_PROFILE:-web-fullstack}}" --mode "$REVIEW_MODE" \
--write --validate-only --json > "${PHASE_DIR}/.tmp/review-lens-plan-validation.json" 2>&1 || {
echo "⛔ Schema-verify checklist evidence missing — see ${PHASE_DIR}/.tmp/review-lens-plan-validation.json" >&2
DIAG_SCRIPT="${REPO_ROOT}/.claude/scripts/review-block-diagnostic.py"
if [ -f "$DIAG_SCRIPT" ]; then
"${PYTHON_BIN:-python3}" "$DIAG_SCRIPT" \
--gate-id "review.phaseP_schema_verify_lens_plan" \
--phase-dir "$PHASE_DIR" \
--input "${PHASE_DIR}/.tmp/review-lens-plan-validation.json" \
--out-md "${PHASE_DIR}/.tmp/review-lens-plan-diagnostic.md" \
>/dev/null 2>&1 || true
cat "${PHASE_DIR}/.tmp/review-lens-plan-diagnostic.md" 2>/dev/null || true
fi
exit 1
}
fi
exit 0
fi
## Review mode: link-check (P5, v1.9.2 — docs profile)
if [ "$REVIEW_MODE" != "link-check" ]; then
echo "↷ Skipping phaseP_link_check (REVIEW_MODE=$REVIEW_MODE)"
else
echo "▸ Link-check review (docs): scanning markdown files for broken relative links"
DOC_FILES=$(grep -oE '<file-path>[^<]+\.md</file-path>' "${PHASE_DIR}/PLAN.md" 2>/dev/null | \
sed -E 's/<\/?file-path>//g')
BROKEN=""
for f in $DOC_FILES; do
[ -f "$f" ] || continue
for link in $(grep -oE '\]\([^)]+\)' "$f" | sed -E 's/\]\(//; s/\)$//' | grep -vE '^https?://|^#'); do
target=$(dirname "$f")/"$link"
[ -e "$target" ] || BROKEN="${BROKEN}\n$f → $link"
done
done
if [ -n "$BROKEN" ]; then
echo -e "⚠ Broken relative links:$BROKEN"
fi
${PYTHON_BIN} - "${PHASE_DIR}/GOAL-COVERAGE-MATRIX.md" "$PHASE_NUMBER" <<'PY'
import sys
from datetime import datetime, timezone
out, phase = sys.argv[1], sys.argv[2]
lines = [
f"# Goal Coverage Matrix — Phase {phase} (docs link-check)",
"",
"**Profile:** docs ",
f"**Generated:** {datetime.now(timezone.utc).strftime('%Y-%m-%dT%H:%M:%SZ')}",
"",
"Docs-only phase — link-check performed; content fidelity deferred to /vg:test markdown-lint.",
"",
"**Verdict:** PASS",
"",
]
open(out, 'w', encoding='utf-8').write('\n'.join(lines) + '\n')
print("✓ GOAL-COVERAGE-MATRIX.md written (docs link-check)")
PY
(type -t mark_step >/dev/null 2>&1 && mark_step "${PHASE_NUMBER:-unknown}" "phaseP_link_check" "${PHASE_DIR}") || touch "${PHASE_DIR}/.step-markers/phaseP_link_check.done"
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator mark-step review phaseP_link_check 2>/dev/null || true
mkdir -p "${PHASE_DIR}/.tmp" 2>/dev/null
if [ -f "${REPO_ROOT}/.claude/scripts/review-lens-plan.py" ]; then
"${PYTHON_BIN:-python3}" "${REPO_ROOT}/.claude/scripts/review-lens-plan.py" \
--phase-dir "$PHASE_DIR" --profile "${PROFILE:-${CONFIG_PROFILE:-web-fullstack}}" --mode "$REVIEW_MODE" \
--write --validate-only --json > "${PHASE_DIR}/.tmp/review-lens-plan-validation.json" 2>&1 || {
echo "⛔ Link-check checklist evidence missing — see ${PHASE_DIR}/.tmp/review-lens-plan-validation.json" >&2
DIAG_SCRIPT="${REPO_ROOT}/.claude/scripts/review-block-diagnostic.py"
if [ -f "$DIAG_SCRIPT" ]; then
"${PYTHON_BIN:-python3}" "$DIAG_SCRIPT" \
--gate-id "review.phaseP_link_check_lens_plan" \
--phase-dir "$PHASE_DIR" \
--input "${PHASE_DIR}/.tmp/review-lens-plan-validation.json" \
--out-md "${PHASE_DIR}/.tmp/review-lens-plan-diagnostic.md" \
>/dev/null 2>&1 || true
cat "${PHASE_DIR}/.tmp/review-lens-plan-diagnostic.md" 2>/dev/null || true
fi
exit 1
}
fi
exit 0
fi
## Phase 0.5: RFC v9 preflight (data invariants + RCRURD + cache hygiene)
RFC v9 PR-D1/D2/F integration. Runs deterministic gates BEFORE the scanner so we fail fast on broken sandbox state instead of burning Haiku tokens on a doomed scan.
# RFC v9 preflight — Codex-HIGH-5-ter fix: outer guard checks scripts/runtime
# only. The inner gates check their own artifacts (ENV-CONTRACT.md for
# invariants, FIXTURES/ for RCRURD). Previously the outer guard required
# BOTH scripts/runtime AND ENV-CONTRACT, so a phase with FIXTURES+lifecycle
# but no ENV-CONTRACT skipped RCRURD entirely.
PRE_OK=1
VG_SCRIPT_ROOT="${REPO_ROOT}/.claude/scripts"
[ -d "${VG_SCRIPT_ROOT}/runtime" ] || VG_SCRIPT_ROOT="${REPO_ROOT}/scripts"
if [ -d "${VG_SCRIPT_ROOT}/runtime" ]; then
RFC_V9_GATE_RAN=0
# Only echo the banner once when at least one gate has work to do
HAS_INVARIANTS_FILE=0
HAS_FIXTURES_DIR=0
[ -f "${PHASE_DIR}/ENV-CONTRACT.md" ] && HAS_INVARIANTS_FILE=1
[ -d "${PHASE_DIR}/FIXTURES" ] && HAS_FIXTURES_DIR=1
if [ "$HAS_INVARIANTS_FILE" = "1" ] || [ "$HAS_FIXTURES_DIR" = "1" ]; then
echo ""
echo "━━━ Phase 0.5 — RFC v9 preflight ━━━"
# 1. Reap expired leases + orphans (PR-F) — only if FIXTURES exist
if [ "$HAS_FIXTURES_DIR" = "1" ] && [ -f "${VG_SCRIPT_ROOT}/fixture-prune.py" ]; then
"${PYTHON_BIN:-python3}" "${VG_SCRIPT_ROOT}/fixture-prune.py" \
--phase "$PHASE_NUMBER" --apply --skip-orphans 2>&1 | sed 's/^/ prune: /'
fi
# Codex-HIGH-1-ter fix: delete stale snapshot at entry. Otherwise post
# mode at run-complete may load a snapshot from a PRIOR run.
rm -f "${PHASE_DIR}/.rcrurd-pre-snapshot.json" 2>/dev/null || true
fi # end OR guard (banner + prune + snapshot reset)
# 2. data_invariants N-consumer check (PR-C, live HTTP wiring stub-1 fix)
# Codex-HIGH-5-ter: guarded by ENV-CONTRACT.md only — independent of FIXTURES.
if [ -f "${VG_SCRIPT_ROOT}/preflight-invariants.py" ] && \
[ -f "${PHASE_DIR}/ENV-CONTRACT.md" ]; then
# Codex-R4-HIGH-2 fix: PREVIOUSLY parsed step_env.sandbox_test from
# vg.config.md, but that's an ENV NAME like "local", not a URL. The
# actual URL lives in ENV-CONTRACT.md under `target.base_url`. Use
# that as canonical source; fall back to VG_BASE_URL env override.
PRE_BASE=$("${PYTHON_BIN:-python3}" -c "
import re, sys
text = open('${PHASE_DIR}/ENV-CONTRACT.md', encoding='utf-8').read()
# Match \`target:\n base_url: \"...\"\` block (handles single+double quotes)
m = re.search(r'^target:\s*\n((?:[ \t].*\n)+)', text, re.MULTILINE)
if m:
body = m.group(1)
bm = re.search(r'^\s*base_url:\s*[\"\\']?([^\"\\'\s#]+)', body, re.MULTILINE)
if bm: print(bm.group(1))
" 2>/dev/null)
[ -z "$PRE_BASE" ] && PRE_BASE="${VG_BASE_URL:-}"
if [ -n "$PRE_BASE" ]; then
PRE_OUT=$("${PYTHON_BIN:-python3}" "${VG_SCRIPT_ROOT}/preflight-invariants.py" \
--phase "$PHASE_NUMBER" --base-url "$PRE_BASE" \
--severity "${VG_PREFLIGHT_SEVERITY:-block}" 2>&1)
PRE_RC=$?
echo " preflight invariants: $(echo "$PRE_OUT" | "${PYTHON_BIN:-python3}" -c '
import json, sys
try:
d = json.loads(sys.stdin.read())
v = d.get("verdict","?")
if v == "BLOCK":
print(f"BLOCK ({len(d.get(\"gaps\",[]))} gap(s))")
elif v == "WARN":
print(f"WARN ({len(d.get(\"gaps\",[]))} gap(s) — VG_PREFLIGHT_SEVERITY=warn override active)")
elif v == "PASS":
print(f"PASS (checked {d.get(\"invariants_checked\",\"?\")} invariants)")
elif v == "DRY_RUN":
print("DRY_RUN")
else:
print(f"ERROR: {d.get(\"error\",\"unknown\")[:200]}")
except: print("parse-error")
')"
# Codex-HIGH-5 fix: setup errors (RC=2) ALWAYS block, regardless of
# severity gate. Missing api_index, bad creds, or missing base_url
# mean we cannot proceed safely — silent skip would let scan run on
# broken sandbox state.
if [ "$PRE_RC" -eq 2 ]; then
echo "⛔ Phase 0.5 preflight setup error — cannot proceed (RFC v9 PR-C):"
echo "$PRE_OUT" | "${PYTHON_BIN:-python3}" -c '
import json, sys
try:
d = json.loads(sys.stdin.read())
print(f" {d.get(\"error\",\"unknown setup error\")[:300]}")
except: print(" (could not parse error)")
'
echo " Fix path: ENV-CONTRACT.md must declare api_index for every"
echo " resource referenced in data_invariants. Verify vg.config.md"
echo " credentials_map covers all count_role values."
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator emit-event "review.preflight_setup_error" --payload "{\"phase\":\"${PHASE_NUMBER}\"}" >/dev/null 2>&1 || true
exit 1
fi
if [ "$PRE_RC" -eq 1 ] && [ "${VG_PREFLIGHT_SEVERITY:-block}" = "block" ]; then
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator emit-event "review.preflight_invariants_blocked" --payload "{\"phase\":\"${PHASE_NUMBER}\"}" >/dev/null 2>&1 || true
source scripts/lib/blocking-gate-prompt.sh
EVIDENCE_PATH="${PHASE_DIR}/.vg/preflight-invariants-evidence.json"
mkdir -p "$(dirname "$EVIDENCE_PATH")"
cat > "$EVIDENCE_PATH" <<JSON
{
"gate": "preflight_invariants",
"summary": "Phase 0.5 preflight invariants gate BLOCK",
"fix_hint": "Fix the data invariant gaps reported by the preflight validator. Check fix_hint in each gap entry."
}
JSON
blocking_gate_prompt_emit "preflight_invariants" "$EVIDENCE_PATH" "error"
# AI controller calls AskUserQuestion → resolve via Leg 2.
# Leg 2 exit codes: 0=continue, 1=continue-with-debt, 2=route-amend (exit 0), 3=abort (exit 1), 4=re-prompt.
fi
else
# Codex-HIGH-5-bis fix: missing base_url WAS silent skip — that's
# the original "RFC v9 silently bypassed" failure mode. If
# ENV-CONTRACT.md declares data_invariants, missing base_url IS a
# setup error → block. If no invariants declared, this is a no-op
# phase and skip is fine.
HAS_INVARIANTS=$(grep -cE '^\s*data_invariants:' "${PHASE_DIR}/ENV-CONTRACT.md" 2>/dev/null || echo 0)
if [ "$HAS_INVARIANTS" -gt 0 ] && [ "${VG_PREFLIGHT_SEVERITY:-block}" = "block" ]; then
echo "⛔ Phase 0.5 preflight setup error — ENV-CONTRACT.md declares"
echo " data_invariants but no sandbox base_url found."
echo " Fix path: set step_env.sandbox_test in vg.config.md OR"
echo " export VG_BASE_URL=https://your-sandbox/."
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator emit-event "review.preflight_no_base_url" --payload "{\"phase\":\"${PHASE_NUMBER}\"}" >/dev/null 2>&1 || true
exit 1
fi
echo " preflight invariants: SKIPPED (no base_url + no data_invariants — no-op)"
fi
fi
# 3. RCRURD pre_state gate (PR-D2, live wiring stub-2 fix)
# Calls scripts/rcrurd-preflight.py: walks FIXTURES/*.yaml, runs pre_state
# GET + assert_jsonpath for each lifecycle block. Fail-fast before scan.
if [ -f "${VG_SCRIPT_ROOT}/rcrurd-preflight.py" ] && \
[ -d "${PHASE_DIR}/FIXTURES" ]; then
PRE_BASE_RC="${PRE_BASE:-${VG_BASE_URL:-}}"
if [ -n "$PRE_BASE_RC" ]; then
# Codex-HIGH-1-ter fix: capture snapshot in the SAME pre-mode call
# (not a follow-up). The snapshot is read by post-mode at run-complete
# to compute increased_by_at_least deltas against real pre-action state.
RCRURD_SNAP="${PHASE_DIR}/.rcrurd-pre-snapshot.json"
RCRURD_OUT=$("${PYTHON_BIN:-python3}" "${VG_SCRIPT_ROOT}/rcrurd-preflight.py" \
--phase "$PHASE_NUMBER" --base-url "$PRE_BASE_RC" \
--severity "${VG_RCRURD_SEVERITY:-block}" \
--capture-snapshot "$RCRURD_SNAP" 2>&1)
RCRURD_RC=$?
echo " RCRURD pre_state: $(echo "$RCRURD_OUT" | "${PYTHON_BIN:-python3}" -c '
import json, sys
try:
d = json.loads(sys.stdin.read())
v = d.get("verdict","?")
if v == "BLOCK":
print(f"BLOCK ({d.get(\"failed\",0)}/{d.get(\"checked\",0)} failed)")
elif v == "WARN":
print(f"WARN ({d.get(\"failed\",0)}/{d.get(\"checked\",0)} failed — VG_RCRURD_SEVERITY=warn override active)")
elif v == "PASS":
print(f"PASS (checked {d.get(\"checked\",0)} fixtures)")
else:
print(f"ERROR: {d.get(\"error\",\"unknown\")[:200]}")
except: print("parse-error")
')"
# Codex-HIGH-5 fix: RCRURD setup errors (RC=2) ALWAYS block.
if [ "$RCRURD_RC" -eq 2 ]; then
echo "⛔ Phase 0.5 RCRURD setup error — cannot proceed:"
echo "$RCRURD_OUT" | "${PYTHON_BIN:-python3}" -c '
import json, sys
try:
d = json.loads(sys.stdin.read())
print(f" {d.get(\"error\",\"unknown setup error\")[:300]}")
except: print(" (could not parse error)")
'
echo " Fix path: vg.config.md credentials_map must cover every role"
echo " referenced in FIXTURES/{G-XX}.yaml lifecycle.pre_state."
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator emit-event "review.rcrurd_setup_error" --payload "{\"phase\":\"${PHASE_NUMBER}\"}" >/dev/null 2>&1 || true
exit 1
fi
if [ "$RCRURD_RC" -eq 1 ] && [ "${VG_RCRURD_SEVERITY:-block}" = "block" ]; then
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator emit-event "review.rcrurd_preflight_blocked" --payload "{\"phase\":\"${PHASE_NUMBER}\"}" >/dev/null 2>&1 || true
source scripts/lib/blocking-gate-prompt.sh
EVIDENCE_PATH="${PHASE_DIR}/.vg/rcrurd-preflight-evidence.json"
mkdir -p "$(dirname "$EVIDENCE_PATH")"
cat > "$EVIDENCE_PATH" <<JSON
{
"gate": "rcrurd_preflight",
"summary": "Phase 0.5 RCRURD gate BLOCK — fixture pre_state assertions failed",
"fix_hint": "Ensure sandbox seed data matches each fixture's lifecycle.pre_state assertions"
}
JSON
blocking_gate_prompt_emit "rcrurd_preflight" "$EVIDENCE_PATH" "error"
# AI controller calls AskUserQuestion → resolve via Leg 2.
# Leg 2 exit codes: 0=continue, 1=continue-with-debt, 2=route-amend (exit 0), 3=abort (exit 1), 4=re-prompt.
fi
# Codex-HIGH-1-ter fix: validate snapshot was captured. If pre-mode
# passed assertions but snapshot file missing, post-mode delta
# assertions will be wrong. Block if any fixture declares an
# increased_by_at_least assertion (those NEED snapshot).
if [ "$RCRURD_RC" -eq 0 ] && [ -d "${PHASE_DIR}/FIXTURES" ]; then
NEEDS_SNAPSHOT=$(grep -lE 'increased_by_at_least|decreased_by_at_least' \
"${PHASE_DIR}/FIXTURES"/*.yaml 2>/dev/null | wc -l | tr -d ' ')
if [ "$NEEDS_SNAPSHOT" -gt 0 ] && [ ! -f "$RCRURD_SNAP" ]; then
echo "⛔ Phase 0.5 RCRURD setup error — pre-state snapshot not"
echo " captured but ${NEEDS_SNAPSHOT} fixture(s) declare delta"
echo " assertions (increased_by_at_least / decreased_by_at_least)."
echo " Without snapshot, post-mode would compare post-action to"
echo " post-action → delta=0 false-fail."
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator emit-event "review.rcrurd_snapshot_missing" --payload "{\"phase\":\"${PHASE_NUMBER}\"}" >/dev/null 2>&1 || true
exit 1
fi
fi
else
# Codex-HIGH-5-bis: missing base_url + FIXTURES with lifecycle = setup error
HAS_LIFECYCLE=$(grep -lE '^lifecycle:' "${PHASE_DIR}/FIXTURES"/*.yaml 2>/dev/null | wc -l | tr -d ' ')
if [ "$HAS_LIFECYCLE" -gt 0 ] && [ "${VG_RCRURD_SEVERITY:-block}" = "block" ]; then
echo "⛔ Phase 0.5 RCRURD setup error — FIXTURES declare ${HAS_LIFECYCLE}"
echo " lifecycle blocks but no sandbox base_url found."
echo " Fix path: set step_env.sandbox_test in vg.config.md OR"
echo " export VG_BASE_URL=https://your-sandbox/."
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator emit-event "review.rcrurd_no_base_url" --payload "{\"phase\":\"${PHASE_NUMBER}\"}" >/dev/null 2>&1 || true
exit 1
fi
echo " RCRURD pre_state: SKIPPED (no base_url + no lifecycle — no-op)"
fi
fi
fi
If --skip-scan, skip this phase.
# Echo scanner choice from step 0a so user sees their --scanner choice was honored
echo ""
echo "━━━ Phase 1 — Code scan (scanner=${VG_SCANNER:-haiku-only}) ━━━"
case "${VG_SCANNER:-haiku-only}" in
haiku-only)
echo " Mode: Haiku-only — fastest path, default depth."
;;
codex-inline)
echo " Mode: Codex inline scanner — main orchestrator owns MCP/browser; no Haiku spawn."
;;
codex-supplement)
echo " Mode: Haiku + Codex CLI supplement (queued for v2.42.2 wiring)."
echo " Note: v2.42.1 records the choice; supplemental Codex scan invocation lands in next iter."
;;
gemini-supplement)
echo " Mode: Haiku + Gemini CLI supplement (queued for v2.42.2 wiring)."
echo " Note: v2.42.1 records the choice; supplemental Gemini scan invocation lands in next iter."
;;
council-all)
echo " Mode: Haiku + Codex + Gemini + Claude council (queued for v2.42.2 wiring)."
echo " Note: v2.42.1 records the choice; full council scan invocation lands in next iter."
;;
esac
echo ""
Read .claude/skills/api-contract/SKILL.md — Mode: Verify-Grep.
Read .claude/commands/vg/_shared/env-commands.md — contract_verify_grep(phase_dir, "both").
Run contract_verify_grep against $SCAN_PATTERNS paths from config:
Result:
Count UI elements using $SCAN_PATTERNS from config:
For each source file matching config.code_patterns.web_pages:
Run element_count(file) from env-commands.md
→ uses SCAN_PATTERNS keys (modals, tables, forms, actions, etc.)
Write ${PHASE_DIR}/element-counts.json — reference data for discovery (not a gate).
Skip conditions:
config.i18n.enabled is false or absent → skip entirelyconfig.i18n.locale_dir is empty → skipPurpose: Verify every i18n key used in phase-changed FE files actually resolves to a
translation string. Missing keys = user sees raw key like dashboard.title instead of text.
I18N_ENABLED="${config.i18n.enabled:-false}"
if [ "$I18N_ENABLED" = "true" ]; then
LOCALE_DIR="${config.i18n.locale_dir}"
DEFAULT_LOCALE="${config.i18n.default_locale:-en}"
KEY_FN="${config.i18n.key_function:-t}"
# Get FE files changed in this phase
CHANGED_FE=$(git diff --name-only HEAD~${COMMIT_COUNT:-5} HEAD -- "${config.code_patterns.web_pages}" 2>/dev/null)
if [ -n "$CHANGED_FE" ] && [ -d "$LOCALE_DIR" ]; then
# Extract all i18n keys from changed files
I18N_KEYS=$(echo "$CHANGED_FE" | xargs grep -ohE "${KEY_FN}\(['\"]([^'\"]+)['\"]\)" 2>/dev/null | \
grep -oE "['\"][^'\"]+['\"]" | tr -d "'" | tr -d '"' | sort -u)
# Check each key resolves in default locale file
LOCALE_FILE=$(find "$LOCALE_DIR" -path "*/${DEFAULT_LOCALE}*" -name "*.json" 2>/dev/null | head -1)
MISSING_KEYS=0
if [ -n "$LOCALE_FILE" ] && [ -n "$I18N_KEYS" ]; then
while IFS= read -r KEY; do
[ -z "$KEY" ] && continue
# Check key exists in JSON (dot-path → nested lookup)
EXISTS=$(${PYTHON_BIN} -c "
import json, sys
from pathlib import Path
data = json.loads(Path('$LOCALE_FILE').read_text())
keys = '$KEY'.split('.')
ref = data
for k in keys:
if isinstance(ref, dict) and k in ref:
ref = ref[k]
else:
print('MISSING')
sys.exit(0)
print('OK')
" 2>/dev/null)
if [ "$EXISTS" = "MISSING" ]; then
echo " WARN: i18n key '$KEY' not found in ${LOCALE_FILE}"
MISSING_KEYS=$((MISSING_KEYS + 1))
fi
done <<< "$I18N_KEYS"
fi
echo "i18n check: $(echo "$I18N_KEYS" | wc -l) keys, ${MISSING_KEYS} missing"
fi
fi
Result routing: MISSING_KEYS > 0 → GAPS_FOUND (not block — may be added in later commit).
Display:
Phase 1 Code Scan:
Contract verify: {PASS|WARNING — N mismatches}
Element inventory: {N} files, ~{M} interactive elements
i18n key check: {N keys checked, M missing|skipped (disabled)}
(Reference data for Phase 2 — not a gate)
When phase1_code_scan completes with no scan-driven regression (contract verify PASS or WARNING-only, i18n missing-keys treated as non-blocking), the 5 Phase-M gate_ids on the supported list can auto-resolve any matching prior debt entries from earlier phases.
The skip-when-current-phase-also-uses-flag guard mirrors the v2.6.1 accept.md pattern: never resolve a gate_id whose flag is being used right now.
source "${REPO_ROOT}/.claude/commands/vg/_shared/lib/override-debt.sh" 2>/dev/null || true
if type -t override_auto_resolve_clean_run >/dev/null 2>&1; then
RESOLUTION_EVENT_ID="review-clean-${PHASE_NUMBER}-$(date -u +%s)"
if [[ ! "${ARGUMENTS}" =~ --allow-orthogonal-hotfix ]]; then
override_auto_resolve_clean_run "allow-orthogonal-hotfix" "${PHASE_NUMBER}" \
"${RESOLUTION_EVENT_ID}" 2>&1 | sed 's/^/ /'
fi
if [[ ! "${ARGUMENTS}" =~ --allow-no-bugref ]]; then
override_auto_resolve_clean_run "allow-no-bugref" "${PHASE_NUMBER}" \
"${RESOLUTION_EVENT_ID}" 2>&1 | sed 's/^/ /'
fi
if [[ ! "${ARGUMENTS}" =~ --allow-empty-hotfix ]]; then
override_auto_resolve_clean_run "allow-empty-hotfix" "${PHASE_NUMBER}" \
"${RESOLUTION_EVENT_ID}" 2>&1 | sed 's/^/ /'
fi
if [[ ! "${ARGUMENTS}" =~ --allow-empty-bugfix ]]; then
override_auto_resolve_clean_run "allow-empty-bugfix" "${PHASE_NUMBER}" \
"${RESOLUTION_EVENT_ID}" 2>&1 | sed 's/^/ /'
fi
if [[ ! "${ARGUMENTS}" =~ --allow-unresolved-overrides ]]; then
override_auto_resolve_clean_run "allow-unresolved-overrides" "${PHASE_NUMBER}" \
"${RESOLUTION_EVENT_ID}" 2>&1 | sed 's/^/ /'
fi
fi
The helper emits one override.auto_resolved audit event per gate_id that
matched at least one OPEN debt entry from a prior phase (R9: gate_id +
timestamp + git_sha). No-op when there are no matching entries.
Purpose: retroactive safety net for changes that affect callers outside the phase's changed-files list. Complement to /vg:build's proactive caller graph.
Prereq: _shared/config-loader.md already resolved $GRAPHIFY_ACTIVE, $GRAPHIFY_GRAPH_PATH, $PYTHON_BIN, $REPO_ROOT, $VG_TMP at command start.
if [ "$GRAPHIFY_ACTIVE" != "true" ]; then
echo "ℹ Graphify not available — skipping Phase 1.5"
echo "RIPPLE_SKIPPED=true" > "${PHASE_DIR}/uat-ripples.txt"
echo "RIPPLE_SKIP_REASON=graphify-inactive" >> "${PHASE_DIR}/uat-ripples.txt"
(type -t mark_step >/dev/null 2>&1 && mark_step "${PHASE_NUMBER:-unknown}" "phase1_5_ripple_and_god_node" "${PHASE_DIR}") || touch "${PHASE_DIR}/.step-markers/phase1_5_ripple_and_god_node.done"
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator mark-step review phase1_5_ripple_and_god_node 2>/dev/null || true
# skip to Phase 2
fi
⛔ BUG #3 fix (2026-04-18): Stale graphify check + auto-rebuild before ripple analysis.
Without this, ripple analysis runs against stale graph → reports "0 callers affected" because graph doesn't know about new callers added since last build. Falsely safe verdict.
if [ "$GRAPHIFY_ACTIVE" = "true" ]; then
GRAPH_BUILD_EPOCH=$(stat -c %Y "$GRAPHIFY_GRAPH_PATH" 2>/dev/null || stat -f %m "$GRAPHIFY_GRAPH_PATH" 2>/dev/null)
COMMITS_SINCE=$(git log --since="@${GRAPH_BUILD_EPOCH}" --oneline 2>/dev/null | wc -l | tr -d ' ')
STALE_THRESHOLD="${GRAPHIFY_STALE_WARN:-50}"
echo "Review Phase 1.5: graphify ${COMMITS_SINCE} commits since last build"
# Always rebuild before ripple — review is the SAFETY NET, must be accurate
if [ "${COMMITS_SINCE:-0}" -gt 0 ]; then
source "${REPO_ROOT}/.claude/commands/vg/_shared/lib/graphify-safe.sh"
vg_graphify_rebuild_safe "$GRAPHIFY_GRAPH_PATH" "review-phase1_5-${PHASE_NUMBER}" || {
echo "⛔ Review cannot trust ripple analysis with stale graph"
echo " Fix manually: ${PYTHON_BIN} -m graphify update ."
}
fi
fi
If graphify active, proceed:
# Prefer phase commit range if available (git tag from /vg:build step 8b: "vg-build-{phase}-wave-{N}-start")
PHASE_START_TAG=$(git tag --list "vg-build-${PHASE_NUMBER}-wave-*-start" | sort -t'-' -k5,5n | head -1)
if [ -n "$PHASE_START_TAG" ]; then
CHANGED_FILES=$(git diff --name-only "$PHASE_START_TAG" HEAD | sort -u)
else
# Fallback: diff against merge-base with main
CHANGED_FILES=$(git diff --name-only $(git merge-base HEAD main) HEAD | sort -u)
fi
# Filter to source files only (exclude ${PLANNING_DIR}/, .claude/, node_modules, etc)
CHANGED_SRC=$(echo "$CHANGED_FILES" | grep -vE '^\.(planning|claude|codex)/|/node_modules/|/dist/|/build/|/target/|^graphify-out/' || true)
echo "Phase changed $(echo "$CHANGED_SRC" | wc -l) source files"
echo "$CHANGED_SRC" > "${PHASE_DIR}/.ripple-input.txt"
Why script not MCP: graphify TS extractor doesn't resolve path aliases (e.g., @/hooks/X → src/hooks/X). Pure MCP queries miss alias-imported callers. The hybrid script uses graphify + git grep, catches both.
${PYTHON_BIN} .claude/scripts/build-caller-graph.py \
--changed-files-input "${PHASE_DIR}/.ripple-input.txt" \
--config .claude/vg.config.md \
--graphify-graph "$GRAPHIFY_GRAPH_PATH" \
--output "${PHASE_DIR}/.ripple.json"
Output (.ripple.json):
{
"mode": "ripple",
"tools_used": ["grep(rg|git)", "graphify"],
"changed_files_count": N,
"ripples": [
{
"changed_file": "<path>",
"exports_at_risk": ["SymbolA", "SymbolB"],
"callers": [
{"file": "<caller>", "line": N, "symbol": "SymbolA", "source": ["grep(...)"]}
]
}
],
"affected_callers": ["<unique caller paths>"]
}
Script extracts exports via stack-agnostic regex (TS/JS/Rust/Python/Go), then searches scope_apps for each symbol using grep + graphify enrichment. Every caller NOT in the changed list = at-risk.
${PYTHON_BIN} - <<'PY' > "${PHASE_DIR}/.god-nodes.json"
import json
from graphify.analyze import god_nodes
from graphify.build import build_from_json
from networkx.readwrite import json_graph
from pathlib import Path
data = json.loads(Path("${GRAPHIFY_GRAPH_PATH}").read_text(encoding="utf-8"))
G = json_graph.node_link_graph(data, edges="links")
gods = god_nodes(G)[:20] # top-20 highest-degree nodes
print(json.dumps([{"label": g.get("label"), "source_file": g.get("source_file"), "degree": g.get("degree")} for g in gods], indent=2))
PY
Then for each god node, check if git diff $PHASE_START_TAG HEAD includes lines adding an import pointing to god_node's source_file — flag as coupling warning (language-aware via config.scan_patterns).
Script returns callers list per changed file. Orchestrator classifies:
symbol match is a function/class/schema name (likely direct usage)Default LOW for ambiguous — reverse of earlier design. Rationale: too many HIGH = noise → users ignore. Start LOW, escalate via evidence.
Write ${PHASE_DIR}/RIPPLE-ANALYSIS.md:
# Phase {N} — Ripple Analysis (Graphify)
**Generated**: {ISO timestamp}
**Changed files in phase**: {N}
**Graph**: `graphify-out/graph.json` ({node_count} nodes)
## High-Severity Ripples (REVIEW REQUIRED)
Callers of changed code that were NOT updated in this phase. Verify these callers still work with the new symbol shapes.
| Caller File | Calls Changed Symbol | Changed In | Severity |
|---|---|---|---|
| {caller.file} | {symbol} | {changed.file} | HIGH |
| ... | ... | ... | ... |
## Low-Severity Ripples (likely safe — scan for regressions)
| Caller File | Import Type | Changed In |
|---|---|---|
| {caller.file} | barrel re-export | {changed.file} |
## God Node Coupling Warnings
| God Node | Degree | New Edge From | Recommendation |
|---|---|---|---|
| {god.label} | {N} | {changed.file} | Refactor consideration |
## Summary
- HIGH ripples: {N} (review these callers manually or via browser)
- LOW ripples: {N}
- God node warnings: {N}
- Action: Phase 2 browser discovery will prioritize checking HIGH-ripple caller paths first
Phase 2 priority hint: if ripple affects a specific view, browser discovery should navigate there first (higher priority in scan queue). Save .ripple-browser-priorities.json:
{ "priority_urls": ["route1", "route2"], "reason": "high-ripple callers live here" }
Phase 4 goal comparison input: include RIPPLE-ANALYSIS.md as evidence. If a goal says "Feature X works" and Feature X uses a HIGH-ripple caller that wasn't verified → flag as UNVERIFIED instead of READY.
Skip Phase 1.5 with warning:
ℹ Phase 1.5 skipped — graphify not active. Cross-module ripple bugs may
only be caught at Phase 2 browser discovery or Phase 5 test. To enable:
set graphify.enabled=true in .claude/vg.config.md + graphify update .
Still write empty RIPPLE-ANALYSIS.md stub so Phase 4 doesn't error on missing file:
# Phase {N} — Ripple Analysis (SKIPPED)
Graphify inactive. Enable for cross-module impact detection.
Final action: (type -t mark_step >/dev/null 2>&1 && mark_step "${PHASE_NUMBER:-unknown}" "phase1_5_ripple_and_god_node" "${PHASE_DIR}") || touch "${PHASE_DIR}/.step-markers/phase1_5_ripple_and_god_node.done"
Mandatory before browser discovery for web feature phases.
Purpose:
Scope: low-cost readiness gate only. This is NOT the full /vg:test runtime contract verification and NOT a project-specific mutation batch. Mutating endpoints are probed safely (OPTIONS / existence check), not executed for side effects.
echo ""
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
echo "🔎 Phase 2a.5 — API contract probe"
echo " Curl API contracts trước browser discovery"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator step-active phase2a_api_contract_probe >/dev/null 2>&1 || true
API_PROBE_OUT="${PHASE_DIR}/api-contract-precheck.txt"
API_DOCS_CHECK_OUT="${PHASE_DIR}/api-docs-check.txt"
VG_SCRIPT_ROOT="${REPO_ROOT:-.}/.claude/scripts"
[ -d "$VG_SCRIPT_ROOT" ] || VG_SCRIPT_ROOT="${REPO_ROOT:-.}/scripts"
PROBE_SCRIPT="${VG_SCRIPT_ROOT}/review-api-contract-probe.py"
INTERFACE_CHECK_OUT="${PHASE_DIR}/.tmp/interface-standards-review.json"
if [ ! -f "$PROBE_SCRIPT" ]; then
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator emit-event "review.api_precheck_blocked" \
--payload "{\"phase\":\"${PHASE_NUMBER}\",\"reason\":\"missing_helper\"}" >/dev/null 2>&1 || true
source scripts/lib/blocking-gate-prompt.sh
EVIDENCE_PATH="${PHASE_DIR}/.vg/api-precheck-evidence.json"
mkdir -p "$(dirname "$EVIDENCE_PATH")"
cat > "$EVIDENCE_PATH" <<JSON
{
"gate": "api_precheck",
"summary": "API contract probe setup error — missing helper: $PROBE_SCRIPT",
"fix_hint": "Ensure review-api-contract-probe.py exists in .claude/scripts/ or scripts/"
}
JSON
blocking_gate_prompt_emit "api_precheck" "$EVIDENCE_PATH" "error"
# AI controller calls AskUserQuestion → resolve via Leg 2.
# Leg 2 exit codes: 0=continue, 1=continue-with-debt, 2=route-amend (exit 0), 3=abort (exit 1), 4=re-prompt.
fi
mkdir -p "${PHASE_DIR}/.tmp" 2>/dev/null
INTERFACE_VAL="${VG_SCRIPT_ROOT}/validators/verify-interface-standards.py"
if [ -f "$INTERFACE_VAL" ]; then
"${PYTHON_BIN:-python3}" "$INTERFACE_VAL" \
--phase-dir "$PHASE_DIR" \
--profile "${PROFILE:-${CONFIG_PROFILE:-web-fullstack}}" \
> "$INTERFACE_CHECK_OUT" 2>&1
INTERFACE_RC=$?
cat "$INTERFACE_CHECK_OUT"
if [ "$INTERFACE_RC" -ne 0 ]; then
echo "⛔ Interface standards gate failed — review cannot continue with undefined API/FE error semantics." >&2
DIAG_SCRIPT="${REPO_ROOT}/.claude/scripts/review-block-diagnostic.py"
if [ -f "$DIAG_SCRIPT" ]; then
"${PYTHON_BIN:-python3}" "$DIAG_SCRIPT" \
--gate-id "review.interface_standards" \
--phase-dir "$PHASE_DIR" \
--input "$INTERFACE_CHECK_OUT" \
--out-md "${PHASE_DIR}/.tmp/interface-standards-diagnostic.md" \
>/dev/null 2>&1 || true
cat "${PHASE_DIR}/.tmp/interface-standards-diagnostic.md" 2>/dev/null || true
fi
exit 1
fi
else
echo "⛔ Interface standards validator missing: $INTERFACE_VAL" >&2
exit 1
fi
"${PYTHON_BIN:-python3}" .claude/scripts/validators/verify-api-docs-coverage.py \
--phase "${PHASE_NUMBER}" \
> "${API_DOCS_CHECK_OUT}" 2>&1
API_DOCS_RC=$?
cat "${API_DOCS_CHECK_OUT}"
if [ "$API_DOCS_RC" -ne 0 ]; then
echo "⛔ API docs coverage failed — browser discovery is not allowed to continue with incomplete API-DOCS.md." >&2
DIAG_SCRIPT="${REPO_ROOT}/.claude/scripts/review-block-diagnostic.py"
if [ -f "$DIAG_SCRIPT" ]; then
"${PYTHON_BIN:-python3}" "$DIAG_SCRIPT" \
--gate-id "review.api_docs_contract_coverage" \
--phase-dir "$PHASE_DIR" \
--input "$API_DOCS_CHECK_OUT" \
--out-md "${PHASE_DIR}/.tmp/api-docs-diagnostic.md" \
>/dev/null 2>&1 || true
cat "${PHASE_DIR}/.tmp/api-docs-diagnostic.md" 2>/dev/null || true
fi
exit 1
fi
"${PYTHON_BIN:-python3}" "${VG_SCRIPT_ROOT}/emit-evidence-manifest.py" \
--path "${PHASE_DIR}/api-docs-check.txt" \
--source-inputs "${PHASE_DIR}/API-CONTRACTS.md,${PHASE_DIR}/API-DOCS.md,.claude/vg.config.md" \
--producer "vg:review/phase2a_api_contract_probe"
API_DOCS_MANIFEST_RC=$?
if [ "$API_DOCS_MANIFEST_RC" -ne 0 ]; then
echo "⛔ API docs check wrote report but failed to bind evidence to current run." >&2
exit 1
fi
# Resolve base URL from the same canonical source used by Phase 0.5 preflight.
API_PROBE_BASE=$("${PYTHON_BIN:-python3}" -c "
import re, sys
path = '${PHASE_DIR}/ENV-CONTRACT.md'
try:
text = open(path, encoding='utf-8').read()
except OSError:
sys.exit(0)
m = re.search(r'^target:\\s*\\n((?:[ \\t].*\\n)+)', text, re.MULTILINE)
if m:
body = m.group(1)
bm = re.search(r'^\\s*base_url:\\s*[\"\\']?([^\"\\'\\s#]+)', body, re.MULTILINE)
if bm:
print(bm.group(1))
" 2>/dev/null)
[ -z "$API_PROBE_BASE" ] && API_PROBE_BASE="${VG_BASE_URL:-}"
if [ -z "$API_PROBE_BASE" ]; then
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator emit-event "review.api_precheck_blocked" \
--payload "{\"phase\":\"${PHASE_NUMBER}\",\"reason\":\"missing_base_url\"}" >/dev/null 2>&1 || true
source scripts/lib/blocking-gate-prompt.sh
EVIDENCE_PATH="${PHASE_DIR}/.vg/api-precheck-evidence.json"
mkdir -p "$(dirname "$EVIDENCE_PATH")"
cat > "$EVIDENCE_PATH" <<JSON
{
"gate": "api_precheck",
"summary": "API contract probe setup error — no base_url found in ENV-CONTRACT.md and VG_BASE_URL is empty",
"fix_hint": "Set target.base_url in ENV-CONTRACT.md or export VG_BASE_URL"
}
JSON
blocking_gate_prompt_emit "api_precheck" "$EVIDENCE_PATH" "error"
# AI controller calls AskUserQuestion → resolve via Leg 2.
# Leg 2 exit codes: 0=continue, 1=continue-with-debt, 2=route-amend (exit 0), 3=abort (exit 1), 4=re-prompt.
fi
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator emit-event "review.api_precheck_started" \
--payload "$(printf '{"phase":"%s","base_url":"%s"}' "${PHASE_NUMBER}" "${API_PROBE_BASE}")" >/dev/null 2>&1 || true
PROBE_CMD=("${PYTHON_BIN:-python3}" "$PROBE_SCRIPT"
--contracts "${PHASE_DIR}/API-CONTRACTS.md"
--base-url "$API_PROBE_BASE"
--out "$API_PROBE_OUT")
# Optional auth token from deploy/auth bootstrap. If absent, 401/403 still count
# as route-exists evidence for auth-protected endpoints.
if [ -n "${AUTH_TOKEN:-}" ]; then
PROBE_CMD+=(--header "Authorization: Bearer ${AUTH_TOKEN}")
fi
"${PROBE_CMD[@]}"
API_PROBE_RC=$?
cat "$API_PROBE_OUT"
if [ "$API_PROBE_RC" -ne 0 ]; then
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator emit-event "review.api_precheck_blocked" \
--payload "$(printf '{"phase":"%s","base_url":"%s","rc":%s}' "${PHASE_NUMBER}" "${API_PROBE_BASE}" "${API_PROBE_RC}")" >/dev/null 2>&1 || true
source scripts/lib/blocking-gate-prompt.sh
EVIDENCE_PATH="${PHASE_DIR}/.vg/api-precheck-evidence.json"
mkdir -p "$(dirname "$EVIDENCE_PATH")"
cat > "$EVIDENCE_PATH" <<JSON
{
"gate": "api_precheck",
"summary": "API contract probe failed — browser discovery is not allowed to start on stale/broken API surface",
"fix_hint": "Fix the API surface issues found in api-contract-precheck.txt before continuing review"
}
JSON
blocking_gate_prompt_emit "api_precheck" "$EVIDENCE_PATH" "error"
# AI controller calls AskUserQuestion → resolve via Leg 2.
# Leg 2 exit codes: 0=continue, 1=continue-with-debt, 2=route-amend (exit 0), 3=abort (exit 1), 4=re-prompt.
fi
"${PYTHON_BIN:-python3}" "${VG_SCRIPT_ROOT}/emit-evidence-manifest.py" \
--path "${PHASE_DIR}/api-contract-precheck.txt" \
--source-inputs "${PHASE_DIR}/API-CONTRACTS.md,.claude/vg.config.md" \
--producer "vg:review/phase2a_api_contract_probe"
MANIFEST_RC=$?
if [ "$MANIFEST_RC" -ne 0 ]; then
echo "⛔ API contract probe wrote report but failed to bind evidence to current run." >&2
exit 1
fi
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator emit-event "review.api_precheck_completed" \
--payload "$(printf '{"phase":"%s","base_url":"%s","artifact":"%s"}' "${PHASE_NUMBER}" "${API_PROBE_BASE}" "api-contract-precheck.txt")" >/dev/null 2>&1 || true
(type -t mark_step >/dev/null 2>&1 && mark_step "${PHASE_NUMBER:-unknown}" "phase2a_api_contract_probe" "${PHASE_DIR}") || touch "${PHASE_DIR}/.step-markers/phase2a_api_contract_probe.done"
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator mark-step review phase2a_api_contract_probe 2>/dev/null || true
## Phase 2: BROWSER DISCOVERY (MCP Playwright — organic)
🎬 Live narration protocol (tightened 2026-04-17 — user theo dõi flow):
Orchestrator PHẢI in dòng tiếng người BEFORE mỗi sub-phase + BEFORE mỗi view/goal đang xử lý. Khác test.md: review chạy parallel nhiều Haiku, narration ở orchestrator level không cần per-step.
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator step-active phase2_browser_discovery >/dev/null 2>&1 || true
narrate_phase() {
# $1=phase_name, $2=intent tiếng Việt
echo ""
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
echo "🔎 $1"
echo " $2"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
}
narrate_view_scan() {
# $1=view_url, $2=idx, $3=total, $4=roles, $5=element_count
echo " [${2}/${3}] 📄 Đang scan: ${1} (role: ${4}, ~${5} elements)"
}
narrate_view_done() {
# $1=view_url, $2=status, $3=issues_count, $4=duration_s
case "$2" in
ok) echo " ✓ Scan xong — ${3} issues phát hiện (${4}s)" ;;
partial) echo " ⚠ Scan 1 phần — ${3} issues (${4}s)" ;;
fail) echo " ❌ Scan lỗi — xem ${PHASE_DIR}/scan-*.json (per-view atomic artifacts)" ;;
esac
}
narrate_goal_flow() {
# $1=gid, $2=title, $3=idx, $4=total
echo ""
echo " ▶ Flow [${3}/${4}] ${1}: ${2}"
}
narrate_goal_flow_step() {
# $1=n, $2=total, $3=action_vn, $4=target
echo " ${1}/${2} → ${3} ${4}"
}
narrate_goal_flow_end() {
# $1=gid, $2=status (passed|failed|blocked), $3=steps_captured, $4=reason
case "$2" in
passed) echo " ✅ Flow ${1} ghi ${3} bước, ready for /vg:test" ;;
failed) echo " ❌ Flow ${1} fail — ${4}" ;;
blocked) echo " ⚠ Flow ${1} blocked — ${4}" ;;
esac
}
Ví dụ user thấy khi /vg:review chạy:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🔎 Phase 2a — Deploy + preflight
Triển khai code lên sandbox, kiểm tra health + seed data
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
✓ Deploy OK (sha abc1234)
✓ Health: https://sandbox.example.com/health → 200
✓ Seed: 12 sites, 48 campaigns loaded
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🔎 Phase 2b-1 — Navigator (Haiku)
Login, đọc sidebar, liệt kê tất cả views
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Phát hiện 14 views: /sites, /campaigns, /reports, /settings, ...
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🔎 Phase 2b-2 — Parallel scanners (8 Haiku agents)
Mỗi agent scan 1 view: modals, forms, interactions
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
[1/14] 📄 Đang scan: /sites (role: publisher, ~32 elements)
✓ Scan xong — 2 issues phát hiện (12s)
[2/14] 📄 Đang scan: /campaigns (role: advertiser, ~48 elements)
✓ Scan xong — 0 issues (8s)
[3/14] 📄 Đang scan: /reports (role: admin, ~15 elements)
⚠ Scan 1 phần — 3 issues (14s)
...
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🔎 Phase 2b-3 — Goal sequence recording
Ghi lại chuỗi thao tác cho từng business goal
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
▶ Flow [1/8] G-01: Tạo site mới với domain + brand safety
1/5 → 📍 Mở /sites
2/5 → 👆 Bấm "New Site"
3/5 → ⌨️ Điền domain
4/5 → 🔽 Chọn category
5/5 → ✓ Xác nhận toast "Site created"
✅ Flow G-01 ghi 5 bước, ready for /vg:test
▶ Flow [2/8] G-02: Edit site floor price
1/4 → ...
❌ Flow G-02 fail — button "Edit" không tìm thấy trên row
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🔎 Phase 3 — Fix loop (iteration 1/3)
Sửa các bug MINOR, re-verify affected views
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
✓ Fixed: /reports missing empty-state (1 file changed)
✓ Re-scan /reports: 0 issues
⚠ 2 MAJOR issues escalated to REVIEW-FEEDBACK.md
Rule: narrator gọi ở các điểm sau trong phase 2:
narrate_phase "Phase 2a — Deploy" "Triển khai + health"narrate_phase "Phase 2b-1 — Navigator" "Login, đọc sidebar..."Phát hiện N views: ...narrate_phase "Phase 2b-2 — Parallel scanners" + Spawning N Haiku agents...narrate_view_scan + narrate_view_donenarrate_phase "Phase 2b-3 — Goal flows" "Ghi chuỗi thao tác..."narrate_goal_flow + step loop + narrate_goal_flow_endnarrate_phase "Phase 3 — Fix loop" "Iteration {i}/3..."If --skip-discovery, skip to Phase 4.
If --evaluate-only, skip to Phase 2b-3 (collect + merge scan results) → Phase 3 → Phase 4.
Validate: ${PHASE_DIR}/nav-discovery.json AND at least 1 scan-*.json must exist.
Missing → BLOCK: "Run discovery first: $vg-review {phase} --discovery-only in Codex/Gemini."
If --retry-failed:
Validate: ${PHASE_DIR}/GOAL-COVERAGE-MATRIX.md AND ${PHASE_DIR}/RUNTIME-MAP.json exist.
Missing → BLOCK: "Run /vg:review {phase} first to generate initial artifacts."
Parse GOAL-COVERAGE-MATRIX.md → collect all goals where status NOT IN (READY, INFRA_PENDING, DEFERRED, MANUAL).
This includes: BLOCKED, UNREACHABLE, FAILED, PARTIAL, NOT_SCANNED, and SUSPECTED (v2.46-wave3.2 — matrix=READY but no submit/2xx evidence; flagged by verify-matrix-staleness.py at step 0_parse_and_validate and folded in here).
If none found → print "All goals already READY. Nothing to retry." → skip to Phase 4.
Parse RUNTIME-MAP.json → for each failed goal_id: start_view = goal_sequences[goal_id].start_view RETRY_VIEWS[] = unique(all start_views), with roles from RUNTIME-MAP views[start_view].role
Print: "Retry mode: {N} failed/suspected goals → {M} views to re-scan: {RETRY_VIEWS[]}"
Skip Phase 1 (code scan). Skip 2b-0 (seed). Skip 2b-1 (navigator — reuse existing nav-discovery.json). Go directly to 2b-2 using RETRY_VIEWS[] as view_assignments (NOT view-assignments.json).
If --re-scan-goals=G-XX,G-YY,G-ZZ (v2.46-wave3.2):
Validate: ${PHASE_DIR}/RUNTIME-MAP.json exists (matrix not required — bypasses status filter).
Missing → BLOCK: "Run /vg:review {phase} first to generate RUNTIME-MAP.json."
Each goal ID validated already at step 0_parse_and_validate (unknown IDs → exit 1 there).
Parse RUNTIME-MAP.json → for each goal_id in $RE_SCAN_GOALS: start_view = goal_sequences[goal_id].start_view RETRY_VIEWS[] = unique(all start_views).
Print: "Re-scan mode: {N} explicit goals → {M} views: {RETRY_VIEWS[]}"
Skip Phase 1 + 2b-0 + 2b-1. Go directly to 2b-2 using RETRY_VIEWS[].
Marker: write ${PHASE_DIR}/.re-scan-goals.txt with the list (consumed by 2b-3 to scope sequence recording to just these goals).
If --dogfood (v2.46-wave3.2): Validate: ${PHASE_DIR}/TEST-GOALS.md AND ${PHASE_DIR}/RUNTIME-MAP.json exist.
Parse TEST-GOALS.md → all goals with non-empty **Mutation evidence:** field (see verify-matrix-staleness.py parse_goals for parser).
RE_SCAN_GOALS := comma-join(those goal_ids), then proceed exactly as --re-scan-goals branch above.
Print: "Dogfood mode: re-scanning ALL {N} mutation goals (regardless of matrix status)."
Deploy to target environment:
1. Record SHAs (local + target)
2. Build + restart on target
3. Health check → if fail → PRE-FLIGHT BLOCK (see below)
4. (v1.14.0+) Infra auto-start — nếu review.try_infra_start=true AND config có infra_start declared → chạy
5. DB seed (if configured): run_on_target "${config.environments[ENV].seed_command}"
(skip if seed_command not in config — portable)
6. Auth bootstrap (if configured):
For each role in config.credentials[ENV]:
Run config.environments[ENV].auth_command with role credentials
Save response token for API checks below
(skip if auth_command not in config — MCP login handles auth instead)
Read .claude/commands/vg/_shared/env-commands.md — deploy(env) + preflight(env).
Triết lý: Review hiện skip INFRA_PENDING goals (ClickHouse/Kafka/Pixel không chạy). Cổng 100% không cho phép skip — review phải tự khởi động hạ tầng để goals verify được.
# Gate 1: config knob enabled?
TRY_INFRA_START=$(yq '.review.try_infra_start // true' .claude/vg.config.md 2>/dev/null)
if [ "$TRY_INFRA_START" != "true" ]; then
echo "ℹ review.try_infra_start=false — bỏ qua bước khởi động hạ tầng"
else
# Gate 2: env có declare infra_start không?
INFRA_START=$(yq ".environments.${ENV}.infra_start // \"\"" .claude/vg.config.md 2>/dev/null)
INFRA_STOP=$(yq ".environments.${ENV}.infra_stop // \"\"" .claude/vg.config.md 2>/dev/null)
INFRA_STATUS=$(yq ".environments.${ENV}.infra_status // \"\"" .claude/vg.config.md 2>/dev/null)
if [ -z "$INFRA_START" ]; then
echo "ℹ Env '${ENV}' không declare infra_start — bỏ qua (infra không do review quản lý)"
else
# Gate 3: hạ tầng đã chạy sẵn chưa? (idempotent check)
ALREADY_RUNNING=false
if [ -n "$INFRA_STATUS" ]; then
if eval "$INFRA_STATUS" 2>/dev/null | grep -qiE "running|up|ok|online"; then
ALREADY_RUNNING=true
echo "✓ Hạ tầng đã chạy sẵn (infra_status detect)"
fi
fi
if [ "$ALREADY_RUNNING" = "false" ]; then
# Gate 4: khởi động hạ tầng + trap EXIT để dọn
narrate_phase "Phase 2a-infra — Khởi động hạ tầng" "Chạy infra_start + trap cleanup"
echo " Command: $INFRA_START"
# Chạy, capture exit code
eval "$INFRA_START"
INFRA_START_RC=$?
if [ $INFRA_START_RC -ne 0 ]; then
# Hard block — không skip theo cổng 100%
echo "⛔ infra_start THẤT BẠI (exit $INFRA_START_RC) — review không tiếp tục."
echo " Nguyên nhân khả dĩ: port conflict, resource thiếu, config sai."
echo " Debug: chạy '${INFRA_START}' thủ công xem stderr."
echo " Override: /vg:review ${PHASE_NUMBER} --legacy-mode (DEPRECATED, expire 2 milestones)"
exit 1
fi
echo " ✓ infra_start OK — trap infra_stop đã cài"
# Trap: auto dọn khi review thoát (normal/error/interrupt)
if [ -n "$INFRA_STOP" ]; then
trap "echo ' ♻ Dọn hạ tầng (infra_stop)...'; eval \"$INFRA_STOP\" 2>/dev/null || true" EXIT INT TERM
fi
# Chờ hạ tầng ready (retry health 30s)
for i in {1..30}; do
if eval "$INFRA_STATUS" 2>/dev/null | grep -qiE "running|up|ok|online"; then
echo " ✓ Hạ tầng ready sau ${i}s"
break
fi
sleep 1
done
# Emit telemetry
if type -t telemetry_emit >/dev/null 2>&1; then
telemetry_emit "review_infra_start_success" "{\"env\":\"${ENV}\",\"duration_s\":${i}}"
fi
fi
fi
fi
Tại sao không có AskUserQuestion:
Đây là autonomous action — config đã khai try_infra_start: true nghĩa là user OK. Nếu user không muốn auto-start → set false trong config. Giữa đêm chạy review mà lại hỏi user = anti-pattern.
Cleanup guarantee:
Trap EXIT INT TERM bắt mọi đường thoát (normal / error / Ctrl+C). Hạ tầng sẽ stop khi review kết thúc dù success hay fail. Ngoại lệ: SIGKILL (process killed) → trap không chạy → user phải thủ công infra_stop.
Cổng cứng:
infra_start fail → BLOCK. Không có "try again later" hay "skip INFRA_PENDING". Đây là điểm khác biệt cốt lõi với v1.13 — không cho phép defer hạ tầng.
Review fix loop can only fix CODE bugs. Infra failures (missing config, app down, domain unreachable) must be fixed BEFORE review can work.
Before entering Phase 2 browser discovery, verify:
PRE-FLIGHT CHECKLIST:
[ ] Build succeeded (exit 0, no TS/Rust compile errors)
[ ] Restart succeeded (pm2/systemd/dev_command exited 0, service running)
[ ] Health endpoint(s) return 200 — all entries in config.services[ENV]
[ ] All role domains from config.credentials[ENV] resolve + return any response (not ERR_CONNECTION)
[ ] At least 1 role can login successfully (curl auth endpoint, or MCP smoke login)
If ANY pre-flight fails → BLOCK review with DIAGNOSTIC + FIX GUIDANCE:
⛔ PRE-FLIGHT FAILED — review cannot proceed.
The review step fixes code bugs, not infrastructure. Fix the infra issue below, then re-run.
Issues detected:
[1] {category}: {specific error}
Example: "Build: ecosystem.config.js missing at apps/api/"
Example: "Health: api.{domain}/health returned 502"
Example: "Domain: advertiser.{domain} ERR_CONNECTION_REFUSED"
Example: "Login: admin@{domain} POST /auth/login returned 500"
┌─ What to fix (by category) ─────────────────────────────────┐
│ Build failure → Check compile errors, missing files, │
│ dependency conflicts. Fix then retry. │
│ Common: missing ecosystem.config.js, │
│ .env, turbo task, tsconfig paths. │
│ │
│ Health endpoint → Service didn't start or crashed. │
│ Check logs: pm2 logs / journalctl / │
│ dev server output. Usually missing │
│ env var, DB down, port conflict. │
│ │
│ Domain unreachable → Hostname not resolving or not served. │
│ Local: check /etc/hosts + dev proxy. │
│ Sandbox: check DNS + HAProxy/nginx. │
│ │
│ Login failure → Auth broken server-side (not code bug │
│ review can catch later). Check DB │
│ seed ran, user exists, JWT secret set. │
└─────────────────────────────────────────────────────────────┘
Next actions — choose scenario that matches your error, follow the exact commands:
First: read deploy log to identify exact error
`cat ${PLANNING_DIR}/phases/{phase}/deploy-review.json`
┌─────────────────────────────────────────────────────────────────────────┐
│ Scenario A — Deploy command WRONG in config │
│ (e.g., pm2 but no ecosystem.config.js, dev_command points to missing │
│ script, services[ENV] lists non-existent health endpoint) │
│ │
│ Fix: edit `.claude/vg.config.md` → environments.{ENV}.deploy.* │
│ or run: /vg:init (interactive config wizard) │
│ Then: /vg:review {phase} │
└─────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────┐
│ Scenario B — Service crashed / code error │
│ (logs show stack trace, 500 errors, module not found, port in use) │
│ │
│ Fix: inspect logs (pm2 logs / journalctl / dev output), fix code │
│ Then: /vg:review {phase} --retry-failed │
│ (--retry-failed only re-scans failed views → 5-10× faster) │
└─────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────┐
│ Scenario C — Feature genuinely NOT BUILT (status UNREACHABLE) │
│ Verify first: grep code for expected page file / route / handler. │
│ If grep returns NOTHING → truly not built. │
│ Symptoms: route missing, page file doesn't exist, sidebar link absent │
│ │
│ Fix: /vg:build {phase} --gaps-only (builds missing plans) │
│ Then: /vg:review {phase} --retry-failed │
└─────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────┐
│ Scenario C2 — Code BUILT but review didn't replay (status NOT_SCANNED) │
│ Verify first: grep confirms page file/route/handler EXIST. │
│ Common causes: │
│ • Multi-step wizard / mutation flow needs dedicated browser session │
│ • Orphan route not linked from sidebar → discovery missed it │
│ • Haiku scan timed out / hit max_actions for that view │
│ • --retry-failed was run but goal wasn't in the retry scope │
│ │
│ Fix: pick by cause: │
│ (a) Complex flow → /vg:test {phase} │
│ (codegen + Playwright auto-walks wizard, fills all steps) │
│ (b) Orphan route → add sidebar link or update nav-discovery seed, │
│ then /vg:review {phase} --retry-failed │
│ (c) Timeout/scope → /vg:review {phase} --retry-failed │
│ (fresh re-scan of only failed views, bypass cache) │
│ │
│ DO NOT run /vg:build --gaps-only — it'll regenerate plans for code │
│ that already exists and waste tokens. │
└─────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────┐
│ Scenario D — Auth/DB setup missing │
│ (login 500, seed user not found, JWT signature invalid) │
│ │
│ Fix: run project seed (e.g., pnpm db:seed), verify .env has secrets │
│ Then: /vg:review {phase} --retry-failed │
└─────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────┐
│ Scenario E — Cross-CLI (reduce token cost by splitting work) │
│ │
│ Discovery (cheap, any CLI with browser): │
│ $vg-review {phase} --retry-failed --discovery-only (Codex) │
│ /vg-review {phase} --retry-failed --discovery-only (Gemini) │
│ Evaluate + fix (Claude only): │
│ /vg:review {phase} --evaluate-only │
└─────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────┐
│ Scenario F — External infra unavailable (ClickHouse, Kafka, pixel srv) │
│ Some goals need services not running on current ENV. │
│ Symptoms: 500 on events/stats endpoints, 502 on postback test, │
│ ClickHouse table not found, Kafka ECONNREFUSED. │
│ │
│ This is NOT a code bug — code is correct but infra missing. │
│ │
│ ⚠ ANTI-PATTERN WARNING (v1.9.1 R2 + v1.9.2 P4): │
│ Do NOT fall back to "list 3 options (A/B/C) and wait". │
│ Use `block_resolve` helper — L1 auto-try `--skip`, L2 architect │
│ proposal for cross-env retry, L3 provider-native prompt if needed. │
│ │
│ Block-resolver handler: │
└─────────────────────────────────────────────────────────────────────────┘
```bash
# v1.9.2 P4 — Scenario F resolver (replaces legacy A/B/C prompt)
source "${REPO_ROOT}/.claude/commands/vg/_shared/lib/block-resolver.sh" 2>/dev/null || true
if type -t block_resolve >/dev/null 2>&1; then
export VG_CURRENT_PHASE="$PHASE_NUMBER" VG_CURRENT_STEP="review.infra-unavailable"
BR_GATE_CONTEXT="External infra (${UNAVAILABLE_SERVICES:-unknown}) not reachable on env='${ENV}'. ${INFRA_PENDING_GOALS:-?} goals blocked. User must choose: continue local with skip, switch to sandbox, or partial (local + sandbox retry)."
BR_EVIDENCE=$(printf '{"env":"%s","unavailable":"%s","pending_goals":"%s"}' "$ENV" "${UNAVAILABLE_SERVICES:-unknown}" "${INFRA_PENDING_GOALS:-0}")
BR_CANDIDATES='[
{"id":"skip-infra-goals","cmd":"echo \"Setting infra_deps.unmet_behavior=skip for this run\" && export CONFIG_INFRA_DEPS_UNMET_BEHAVIOR=skip","confidence":0.75,"rationale":"Skip infra-dependent goals = valid strategy for code-only review passes"}
]'
BR_RESULT=$(block_resolve "infra-unavailable" "$BR_GATE_CONTEXT" "$BR_EVIDENCE" "$PHASE_DIR" "$BR_CANDIDATES")
BR_LEVEL=$(echo "$BR_RESULT" | ${PYTHON_BIN} -c "import json,sys; print(json.loads(sys.stdin.read()).get('level',''))" 2>/dev/null)
case "$BR_LEVEL" in
L1) echo "✓ L1 resolved — continuing local review with infra goals skipped" >&2 ;;
L2) block_resolve_l2_handoff "infra-unavailable" "$BR_RESULT" "$PHASE_DIR"; exit 2 ;;
*) block_resolve_l4_stuck "infra-unavailable" "All candidates failed + no architect proposal"; exit 1 ;;
esac
fi
Semantic fallback (if resolver unavailable — provider-native prompt):
Verify smoke test before any re-run: curl {config.services[ENV][0].health} # must return 200 curl -I https://{config.credentials[ENV][0].domain} # NOT ERR_CONNECTION
**Only when ALL pre-flight checks pass** → proceed to Phase 2b Browser Discovery.
API integration check — curl each endpoint in API-CONTRACTS.md:
For each endpoint parsed from API-CONTRACTS.md: If endpoint requires auth → include auth token header curl endpoint on target → record status code + response shape
### 2b: Discovery — 2-Tier Deep Scan (Opus + Haiku)
**Architecture: Opus discovers views (minimal browser), Haiku agents scan exhaustively (1 per view).**
- **Opus**: list views (1 sidebar snapshot + read SPECS), spawn Haiku, merge results, evaluate
- **Haiku**: fixed workflow scanner — click ALL elements, fill ALL forms, recurse into ALL modals. Context tiny → no lazy behavior.
**Why Haiku, not Sonnet**: AI laziness correlates with context length. Haiku agents receive a short prompt + 1 URL = near-zero context = maximum depth. Each Haiku scans 1 view exhaustively rather than skimming many views.
**MCP Server Selection:** Each Haiku agent auto-claims its own Playwright server via lock manager.
Up to 5 parallel browser sessions (5 Playwright slots configured).
#### 2b-0: Seed Data (if configured)
Read vg.config.md → check if seed_command exists for current ENV IF seed_command exists: Run: {RUN_PREFIX} "{seed_command}" Wait for completion → log output Purpose: ensure diverse data (multiple statuses, types) so Haiku can sample representative rows IF seed_command missing: skip silently (not a blocker)
#### 2b-1: Discover Views (Haiku navigator — Opus does NOT touch browser)
Opus reads files only (no browser):
Read SPECS.md → extract "In Scope" → grep route patterns Read PLAN.md → extract task descriptions → grep URL patterns Read SUMMARY.md → extract "files changed" → map to routes → expected_views = ["/sites", "/sites/:id", "/ad-units", ...]
⛔ REGISTERED ROUTES scan (tightened 2026-04-17 — fix critical miss):
Sidebar DOM chỉ show top-level nav. Sub-routes đăng ký trong router config
(ví dụ React Router <Route path="...">, Next.js app/pages dir, Vue Router,
Flutter GoRouter) thường KHÔNG hiện trong sidebar → scanner miss → mark UNREACHABLE.
Trước khi spawn navigator, đọc route registrations từ code — pure config-driven, no defaults:
REGISTERED_ROUTES=""
# Source 1 (preferred): graphify query — chỉ chạy khi có cả graph + predicate
ROUTE_PRED="${config.graphify.route_predicate:-}"
if [ "$GRAPHIFY_ACTIVE" = "true" ] && [ -n "$ROUTE_PRED" ]; then
REGISTERED_ROUTES=$(ROUTE_PRED="$ROUTE_PRED" \
ROUTE_EXTRACT="${config.graphify.route_path_extract:-}" \
${PYTHON_BIN} -c "
import json, os, re, sys pred = os.environ.get('ROUTE_PRED', '') extract = os.environ.get('ROUTE_EXTRACT', '') if not pred or not extract: sys.exit(0) # config incomplete → skip graph_path = os.environ.get('GRAPHIFY_GRAPH_PATH') if not graph_path or not os.path.exists(graph_path): sys.exit(0) graph = json.load(open(graph_path, encoding='utf-8')) hits = set() for n in graph.get('nodes', []): blob = ' '.join(str(n.get(k,'')) for k in ('label','type','file')) if not re.search(pred, blob): continue m = re.search(extract, blob) if m: hits.add(m.group(1) if m.groups() else m.group(0)) for h in sorted(hits): print(h) " 2>/dev/null) fi
ROUTE_GLOB="${config.code_patterns.frontend_routes:-}" ROUTE_REGEX="${config.code_patterns.route_path_regex:-}" if [ -z "$REGISTERED_ROUTES" ] && [ -n "$ROUTE_GLOB" ] && [ -n "$ROUTE_REGEX" ]; then REGISTERED_ROUTES=$(grep -rhoE "$ROUTE_REGEX" $ROUTE_GLOB 2>/dev/null | sort -u) fi
if [ -n "$REGISTERED_ROUTES" ]; then COUNT=$(echo "$REGISTERED_ROUTES" | wc -l | tr -d ' ') echo "✓ Found ${COUNT} route registrations từ code (source: $([ "$GRAPHIFY_ACTIVE" = true ] && [ -n "$ROUTE_PRED" ] && echo graphify || echo grep))" elif [ -z "$ROUTE_PRED" ] && [ -z "$ROUTE_GLOB" ]; then echo "⚠ Route discovery KHÔNG được cấu hình (neither config.graphify.route_predicate" echo " nor config.code_patterns.frontend_routes + route_path_regex set)." echo " Review sẽ CHỈ dựa sidebar DOM → CÓ THỂ miss routes không trên menu." echo " Add vào vg.config.md (pick 1 source, ví dụ theo stack của bạn — workflow không đoán hộ):" echo "" echo " # Via grep (universal, cần regex ngôn ngữ):" echo " code_patterns:" echo " frontend_routes: '<glob tới route config files>'" echo " route_path_regex: '<regex extract path với capture group>'" echo "" echo " # HOẶC via graphify knowledge graph:" echo " graphify:" echo " route_predicate: '<regex match node.label/type/file>'" echo " route_path_extract: ''" else echo "⚠ Route config partial (need BOTH pattern+extract hoặc predicate+extract) → skip code scan." fi
**Config keys (pure config-driven, workflow KHÔNG có stack defaults):**
- `code_patterns.frontend_routes` — glob tới file chứa route declarations
- `code_patterns.route_path_regex` — regex với capture group trả về route path
- `graphify.route_predicate` — regex matching graphify node (label/type/file) identify route
- `graphify.route_path_extract` — regex với capture group extract path từ matched node
**Nguyên tắc:** Thiếu cả 2 source → warn + sidebar-only. Project quyết định stack, workflow chỉ là engine.
(Examples per stack để tham khảo user-side; KHÔNG fallback trong code workflow.)
3. Load KNOWN-ISSUES.json (if exists):
Filter: issues where suggested_phase == current phase OR status == "open"
4. Create GOAL-COVERAGE-MATRIX.md (all ⬜ UNTESTED)
5. Spawn 1 Haiku navigator agent (Agent tool, model="haiku"):
prompt = """
You are a navigator agent. Login and extract all navigation URLs.
## CONNECTION
SESSION_ID="haiku-nav-{phase}-$$"
MCP_PREFIX=$(bash "${HOME}/.claude/playwright-locks/playwright-lock.sh" claim "$SESSION_ID")
trap "bash '${HOME}/.claude/playwright-locks/playwright-lock.sh' release \"$SESSION_ID\" 2>/dev/null" EXIT INT TERM
Use returned $MCP_PREFIX as server for all browser tool calls.
## TASK
1. Login: {domain}/login | {email} | {password} (use first role from config)
2. browser_snapshot → read sidebar/nav menu (top-level visible links)
3. Extract ALL visible navigation URLs
4. **⛔ HARD RULE (tightened 2026-04-17): REGISTERED_ROUTES list được inject vào prompt.**
Agent PHẢI visit EVERY route trong REGISTERED_ROUTES list, KHÔNG CHỈ sidebar.
Route không có trong sidebar = "hidden_but_registered" → truy cập qua direct URL.
Nếu visit route bị redirect (ví dụ → /login, → /403), ghi lại reason.
5. For each URL with :id params:
Navigate to list page → snapshot → pick first row → extract real URL
6. Write ${PHASE_DIR}/nav-discovery.json với schema mở rộng:
{
"sidebar_views": ["/sites", "/campaigns"],
"registered_routes_visited": ["/sites", "/audit-log", "/settings/roles", ...],
"hidden_but_registered": ["/audit-log", "/settings/roles"],
"redirected": {"/settings/billing": "/403"},
"actual_views": ["/sites", "/campaigns", "/audit-log", "/settings/roles", ...]
}
7. browser_close
8. bash "${HOME}/.claude/playwright-locks/playwright-lock.sh" release "haiku-nav-{phase}-$$"
## INJECTED DATA
REGISTERED_ROUTES = [{from step 2 above — list from code scan}]
SIDEBAR_ONLY_HINT = false # default: visit all registered routes
"""
6. Wait for Haiku navigator → Read nav-discovery.json
actual_views = parsed JSON .actual_views[] (already union of sidebar + registered)
7. Merge: union(expected_views, actual_views), deduplicated, within phase scope
Flag `hidden_but_registered` routes explicitly trong view-assignments.json
(Haiku scanner phase 2b-2 thấy flag này → biết access qua direct URL, không click sidebar)
8. **IMMEDIATELY write view-assignments.json** — do NOT hold in context:
Write ${PHASE_DIR}/view-assignments.json:
{
"phase": "{phase}",
"generated_at": "{ISO timestamp}",
"views": [
{ "url": "/sites", "roles": ["admin", "publisher"], "param_example": null, "source": "sidebar" },
{ "url": "/sites/123", "roles": ["publisher"], "param_example": "123", "source": "sidebar" },
{ "url": "/audit-log", "roles": ["admin"], "param_example": null, "source": "registered_hidden", "access_via": "direct_url" },
{ "url": "/settings/roles", "roles": ["admin"], "param_example": null, "source": "registered_hidden", "access_via": "direct_url" }
]
}
Trường `source` giúp Haiku scanner biết cách navigate:
- `sidebar` → click từ menu
- `registered_hidden` → `browser_navigate` direct URL (không có menu entry)
After writing: DISCARD view list from context. Read from file when needed.
Output: view-assignments.json written to disk. Context cleared.
<FLUSH_RULE> After step 8 writes view-assignments.json, you MUST NOT keep the view list in your response text. Do NOT summarize the views found. Do NOT repeat the list. Simply write: "view-assignments.json written — {N} views × {M} roles = {K} scan jobs." Then immediately proceed to 2b-2 (spawn Haiku). </FLUSH_RULE>
<DEEPSCAN_OPT_IN_GATE_v2.42.4> v2.42.4+ refactor — Phase 2b-2 default OFF.
Per the 3-tier review/test/roam refactor (see .vg/research/ROAM-RFC-v1.md
and PLAN-vgflow-2026-05-01.md Part D), exhaustive UI exploration moves
to /vg:roam. /vg:review keeps light browser smoke (Phase 2b-1 navigator
Skip 2b-2 entirely UNLESS one of these holds:
$ARGUMENTS contains --with-deepscan, OR$ARGUMENTS contains --full-scan (legacy alias, kept for backward compat), ORCONFIG_REVIEW_DEEPSCAN_DEFAULT is set to on in vg.config.md (per-project opt-back-in)Skip narration:
echo "▸ Phase 2b-2 (Haiku per-view exhaustive scan) skipped — v2.42.4 default off."
echo " Lens-driven exhaustive exploration now lives in /vg:roam (post-test janitor)."
echo " Pass --with-deepscan to opt back in for this run, or set"
echo " config.review.deepscan_default=on in vg.config.md for project-wide opt-in."
After echoing skip narration, jump directly to phase2b-3 (goal sequence recording). The MANDATORY_GATE block below applies ONLY when 2b-2 is gated ON via the conditions above. </DEEPSCAN_OPT_IN_GATE_v2.42.4>
<MANDATORY_GATE> Applies only when 2b-2 is gated ON (--with-deepscan, --full-scan, mobile-*, or config opt-in).
You MUST run the provider-native scanner protocol in step 2b-2 (unless spawn_mode=none for cli-tool/library profiles). This is NOT optional.
codex-inline; optionally use codex-spawn.sh --tier scanner --sandbox read-only only for non-MCP classification over captured snapshots. Do not ask to spawn Haiku on Codex.scan-*.json, RUNTIME-MAP merge, GOAL-COVERAGE-MATRIX impact, and review.haiku_scanner_spawned telemetry semantics.
</MANDATORY_GATE><SPAWN_MODE_RESOLUTION> v1.9.4 R3.3 — Scanner spawn mode (mobile sequential constraint):
Mobile apps (iOS simulator, Android emulator, physical device) can typically run only ONE instance at a time. Spawning 5 parallel Haiku agents on a single emulator causes conflicts / crashes / app state corruption. CLI/library projects have no UI to scan at all.
# Resolve scanner spawn mode BEFORE entering spawn loop
resolve_scanner_spawn_mode() {
local mode="${CONFIG_REVIEW_SCANNER_SPAWN_MODE:-auto}"
if [ "$mode" != "auto" ]; then
echo "$mode"
return
fi
# Auto-derive from config.profile
case "${CONFIG_PROFILE:-web-fullstack}" in
mobile-rn|mobile-flutter|mobile-native-ios|mobile-native-android|mobile-hybrid)
echo "sequential" # Single emulator/simulator/device
;;
cli-tool|library)
echo "none" # No UI to scan
;;
web-fullstack|web-frontend-only|web-backend-only|*)
echo "parallel" # Default — multiple browser contexts supported
;;
esac
}
SPAWN_MODE=$(resolve_scanner_spawn_mode)
echo ""
echo "▸ Scanner spawn mode: ${SPAWN_MODE} (profile: ${CONFIG_PROFILE:-web-fullstack})"
case "$SPAWN_MODE" in
sequential)
echo "📱 Sequential mode — 1 Haiku agent at a time (mobile/single-window constraint)"
echo " Tổng ${TOTAL} view sẽ scan tuần tự; thời gian ~${TOTAL}×5min (1 agent/view)"
;;
parallel)
echo "🌐 Parallel mode — up to 5 Haiku agents concurrent (Playwright lock caps)"
echo " Tổng ${TOTAL} view; thời gian ~${TOTAL}/5 × 5min"
;;
none)
echo "⏭ Spawn mode=none — skipping Phase 2b-2 entirely (profile has no UI scan)"
echo " Backend goals resolved via surface probes in Phase 4a instead."
;;
*)
echo "⚠ Unknown spawn_mode=${SPAWN_MODE} — falling back to parallel" >&2
SPAWN_MODE="parallel"
;;
esac
Behavior branch by mode:
parallel (web default): All Agent(model="haiku", ...) calls in ONE tool_use block → Claude Code harness runs them concurrently. Playwright lock manager caps effective concurrency at 5 slots.
sequential (mobile default): Each Agent(model="haiku", ...) call in SEPARATE messages, awaiting completion before spawning next. Guarantees single emulator/device state. User sees 1/N → 2/N → ... progression serially.
none (cli-tool/library): Skip 2b-2 entirely. Jump to 2b-3 collect phase (will merge 0 scans). Phase 4 goal coverage relies 100% on surface probes (api/data/integration/time-driven) from Phase 4a.
Override via config: Set review.scanner_spawn_mode: "sequential" in vg.config.md to force sequential even for web projects (e.g., if CI has constrained browser resources).
</SPAWN_MODE_RESOLUTION>
<REREAD_REQUIRED>
Before spawning Haiku agents, you MUST re-read view-assignments.json via the Read tool
(fixes I5). The <FLUSH_RULE> in step 2b-1 required discarding the view list from context
to save tokens. That means right now you don't have it — do NOT guess view URLs or roles
from memory. Call Read on ${PHASE_DIR}/view-assignments.json FIRST, then iterate the
parsed .views[] to spawn one Haiku per (view × role) pair.
If --retry-failed mode, read view-assignments-retry.json instead. Both files share
the same schema; iteration logic is identical.
</REREAD_REQUIRED>
Spawn 1 Haiku agent per view using Agent tool with model="haiku".
Each agent scans 1 view exhaustively with a FIXED workflow — no discretion to skip.
Bootstrap rules injection (v1.15.0+): Before spawning each Haiku scanner, render + inject promoted project rules so scanners see project-specific checks (e.g. "verify data persists after mutation" rule L-050 will fire here):
source "${REPO_ROOT:-.}/.claude/commands/vg/_shared/lib/bootstrap-inject.sh"
BOOTSTRAP_RULES_BLOCK=$(vg_bootstrap_render_block "${BOOTSTRAP_PAYLOAD_FILE:-}" "review")
vg_bootstrap_emit_fired "${BOOTSTRAP_PAYLOAD_FILE:-}" "review" "${PHASE_NUMBER}"
Then in each Haiku prompt body, include:
<bootstrap_rules>
${BOOTSTRAP_RULES_BLOCK}
</bootstrap_rules>
Position: after static <scanner_workflow> block, before <view_assignment>.
Scanner skill treats rules as additional per-element checks on top of fixed protocol.
IF --retry-failed: Normalize RETRY_VIEWS[] → view-assignments-retry.json (same schema as view-assignments.json): { "phase": "{phase}", "generated_at": "{ISO}", "mode": "retry-failed", "views": [{"url": "/sites", "roles": ["publisher"], "param_example": null}, ...] } READ view-assignments-retry.json ELSE: READ ${PHASE_DIR}/view-assignments.json (both paths → same schema → downstream code identical)
view_assignments = parsed .views[]
🎬 Pre-spawn briefing (tightened 2026-04-17 — user biết agent sẽ làm gì):
Trước mỗi spawn, orchestrator phải:
start_view == view.url HOẶC flow references viewdescription của Agent tool theo format structured, không freeformbriefing_for_view() {
local VIEW_URL="$1" ROLE="$2" IDX="$3" TOTAL="$4"
# Parse TEST-GOALS.md → collect goals whose start_view or flow touches this view
local BRIEFING=$(${PYTHON_BIN} - <<PY 2>/dev/null
import re, os, sys
view_url = "$VIEW_URL"
phase_dir = os.environ.get("PHASE_DIR", ".")
import glob
tg_files = glob.glob(f"{phase_dir}/*TEST-GOALS*.md")
if not tg_files:
sys.exit(0)
tg = open(tg_files[0], encoding="utf-8").read()
# Parse goal blocks: "## Goal G-XX: title\n...**Start view:** /path\n**Success criteria:** ...\n**Mutation evidence:** ..."
blocks = re.split(r'^##\s*Goal\s+', tg, flags=re.M)
hits = []
for blk in blocks[1:]:
m = re.match(r'(G-\d+)[:\s]+(.+?)\n', blk)
if not m: continue
gid, title = m.group(1), m.group(2).strip()
# Match by start_view OR mention in flow
start = re.search(r'\*\*Start view:\*\*\s*(\S+)', blk)
touches = (start and start.group(1) == view_url) or (view_url in blk)
if not touches: continue
crit = re.search(r'\*\*Success criteria:\*\*\s*(.+?)(?:\n\*\*|\n##|\Z)', blk, re.S)
mut = re.search(r'\*\*Mutation evidence:\*\*\s*(.+?)(?:\n\*\*|\n##|\Z)', blk, re.S)
prio = re.search(r'\*\*Priority:\*\*\s*(\w+)', blk)
hits.append({
"gid": gid, "title": title[:80],
"priority": (prio.group(1) if prio else "important").lower(),
"criteria": (crit.group(1).strip()[:120] if crit else ""),
"mutation": (mut.group(1).strip()[:100] if mut else ""),
})
for h in hits:
print(f"{h['gid']}|{h['priority']}|{h['title']}|{h['criteria']}|{h['mutation']}")
PY
)
echo ""
echo "┌─────────────────────────────────────────────────────────────"
echo "│ [${IDX}/${TOTAL}] Haiku scanner briefing"
echo "├─────────────────────────────────────────────────────────────"
echo "│ 📄 View: ${VIEW_URL}"
echo "│ 👤 Role: ${ROLE}"
if [ -z "$BRIEFING" ]; then
echo "│ 🎯 Goals: (none mapped — exploratory scan, fill gaps)"
else
echo "│ 🎯 Goals sẽ cover:"
while IFS='|' read -r gid prio title crit mut; do
[ -z "$gid" ] && continue
echo "│ • ${gid} [${prio}] ${title}"
[ -n "$crit" ] && echo "│ ✓ Expect: ${crit}"
[ -n "$mut" ] && echo "│ Δ Mutation: ${mut}"
done <<< "$BRIEFING"
fi
echo "│ 🔎 Agent sẽ:"
echo "│ - Login as ${ROLE} → navigate to ${VIEW_URL}"
echo "│ - Snapshot + enumerate all modals/forms/interactive elements"
echo "│ - For each goal above: replay interaction flow, capture evidence"
echo "│ - Log console.error + network 4xx/5xx per step"
echo "│ - Output: scan-${VIEW_URL//\//_}-${ROLE}.json"
echo "└─────────────────────────────────────────────────────────────"
}
Then spawn with structured description (thay vì freeform).
⚠ SPAWN_MODE enforcement (v1.9.4 R3.3) — orchestrator branching:
| SPAWN_MODE | Tool-use pattern | Use case |
|---|---|---|
none | Skip spawn loop entirely, write empty scan-manifest, jump to 2b-3 | cli-tool, library (no UI to scan) |
sequential | Each Agent() call in SEPARATE message, await each complete before next | mobile-* (single emulator/device) |
parallel | All Agent() calls in ONE tool_use block, harness runs concurrent ≤5 | web-* (default, multi-browser contexts) |
When SPAWN_MODE=none: orchestrator writes empty scan-manifest.json then skips to 2b-3:
${PYTHON_BIN} -c "
import json; from pathlib import Path
Path('${PHASE_DIR}/scans').mkdir(exist_ok=True)
Path('${PHASE_DIR}/scan-manifest.json').write_text(json.dumps({
'mode': 'skipped_no_ui',
'profile': '${CONFIG_PROFILE}',
'scans': []
}, indent=2))"
# → proceed to 2b-3 collect (which handles empty scans gracefully)
When SPAWN_MODE=sequential (mobile): iterate view_assignments ONE AT A TIME. Each Agent() call in a separate message — DO NOT batch them into one tool_use block. Narrate [idx/total] spawning <view>@<role>... before each, [idx/total] done (<N goals, <M regressions>) after. User sees serial progression.
When SPAWN_MODE=parallel (web): batch ALL Agent() calls in ONE tool_use block so Claude harness dispatches them concurrently (Playwright lock manager caps at 5 slots).
Common spawn pattern (applies to both sequential and parallel):
For each view in view_assignments:
For each role that can access this view (from config.credentials):
IDX=$((IDX + 1))
briefing_for_view "{view.url}" "{role}" "$IDX" "$TOTAL"
# ─── Phase 15 D-17 telemetry (BEFORE spawn, not after) ──────────────
# Emit `review.haiku_scanner_spawned` IMMEDIATELY before Agent() so the
# event survives Agent failure / run abort. Validator
# verify-haiku-spawn-fired.py (Phase 15 T3.11) reads this in events.db
# to confirm step 2b-2 actually fired for UI-profile phases. Without
# this, a non-deterministic spawn failure could leave the validator
# unable to distinguish "spawn never attempted" from "spawn attempted
# but Agent crashed". Order matters: emit BEFORE Agent call.
#
# Parallel mode: emit per-spawn in a serial bash loop, THEN batch all
# Agent() calls in one tool_use block. Sequential mode: emit
# immediately before each Agent() call individually.
Bash:
${PYTHON_BIN} .claude/scripts/vg-orchestrator emit-event \
"review.haiku_scanner_spawned" \
--step "2b-2" --actor "orchestrator" --outcome "INFO" \
--payload "$(printf '{"view":"%s","role":"%s","idx":%d,"total":%d,"spawn_mode":"%s"}' \
"{view.url}" "{role}" "$IDX" "$TOTAL" "$SPAWN_MODE")" \
2>/dev/null || true
Agent(
model="haiku",
description="[{IDX}/{TOTAL}] {ROLE}@{view.url} — verify {N} goals: {G-XX,G-YY,...}"
):
prompt = """
Read skill: vg-haiku-scanner (at .claude/skills/vg-haiku-scanner/SKILL.md)
Follow it exactly. Inject these args into the workflow:
PHASE = "{phase}"
VIEW_URL = "{view.url — substitute param_example if :id pattern}"
VIEW_SLUG = "{filesystem-safe slug from VIEW_URL}"
ROLE = "{role}"
BOUNDARY = "{allowed URL pattern for this view}"
DOMAIN = "{role.domain from config.credentials[ENV]}"
EMAIL = "{role.email}"
PASSWORD = "{role.password}"
PHASE_DIR = "{absolute ${PHASE_DIR}}"
SCREENSHOTS_DIR= "{absolute ${SCREENSHOTS_DIR}}"
FULL_SCAN = {true if --full-scan flag set else false}
GOALS_COVERED = [{G-XX, G-YY, ...} — from briefing_for_view parse]
GOAL_BRIEFS = {gid: {title, criteria, mutation, priority} — full context for prompts}
The skill contains the full workflow (login, sidebar suppression, STEP 1-5,
element interaction rules, output JSON schema, hard rules, cleanup).
Do NOT invent variations. Execute skill verbatim.
Report progress back in description updates (Agent tool surfaces `description`
in main terminal — update per goal processed so user sees progress).
"""
# Inline prompt collapsed — full workflow lives in skill file to keep context small.
Description format (structured, parseable):
[{idx}/{total}] {role}@{view} — verify {N} goals: {G-list} — lúc spawn[{idx}/{total}] {role}@{view} — G-03/5 filling form... — trong lúc chạy (Haiku update)[{idx}/{total}] {role}@{view} — ✓ 4/5 goals, 1 regression — khi xongUser sẽ thấy banner đầy đủ BEFORE spawn + structured description trong/sau spawn.
**Limits (per Haiku agent):**
- Max 200 actions per view (prevents runaway on huge pages)
- Max 10 min wall time per agent
- Stagnation: same state 3x = stuck, move on
- **Concurrency (v1.9.4 R3.3 SPAWN_MODE aware):**
- `parallel` mode: up to 5 Haiku agents concurrent (Playwright slot cap)
- `sequential` mode: exactly 1 Haiku agent at a time (mobile safety)
- `none` mode: no Haiku agents spawned (cli-tool/library)
</step>
<step name="phase2_5_recursive_lens_probe" profile="web-fullstack,web-frontend-only" mode="full">
#### 2b-2.5: Recursive Lens Probe (v2.40, manager dispatcher)
**Purpose:** After parallel Haiku scanners (2b-2) complete, run the recursive lens probe layer to deep-dive each interesting clickable through bug-class lenses (authz-negative, csrf, idor, ssrf, ...). Manager dispatcher reads scan-*.json, classifies clickables into element classes, picks lenses per class, spawns workers in parallel (auto), generates prompt files (manual), or both (hybrid). Goals discovered by lens probes are merged single-writer into TEST-GOALS-DISCOVERED.md.
**Task 36b dispatch chain (wires Task 26 infrastructure):**
Phase 2b-2.5 now runs a 5-step chain:
1. `emit-dispatch-plan.py` — emit LENS-DISPATCH-PLAN.json (trust anchor, declares all APPLICABLE dispatches before any spawn)
2. `spawn_recursive_probe.py --dispatch-plan` — iterate per dispatch with `lens_tier_dispatcher.select_tier()` per-lens model selection + `plan_hash` anti-reuse stamp
3. `verify-lens-runs-coverage.py` — assert every APPLICABLE dispatch has a matching artifact
4. `lens-coverage-matrix.py` — render LENS-COVERAGE-MATRIX.md (always, even on failure)
5. Coverage failure → `blocking_gate_prompt_emit` (Task 33 wrapper, NOT `exit 1`)
**Eligibility (6 rules — all must pass unless `--skip-recursive-probe` is set):**
1. `.phase-profile` declares `phase_profile ∈ {feature, feature-legacy, hotfix}`
2. `.phase-profile` declares `surface ∈ {ui, ui-mobile}` (NOT visual-only)
3. `CRUD-SURFACES.md` declares ≥1 resource
4. `SUMMARY.md` / `RIPPLE-ANALYSIS.md` lists ≥1 `touched_resources` intersecting CRUD
5. `surface != 'visual'`
6. `ENV-CONTRACT.md` present, `disposable_seed_data: true`, all `third_party_stubs` stubbed
If eligibility fails → write `.recursive-probe-skipped.yaml` and continue to 2b-3 (no error).
<MANDATORY_GATE>
**You MUST run the provider-native user prompt below BEFORE invoking the bash block** — unless `--non-interactive` / `VG_NON_INTERACTIVE=1` is set, OR all three axes (`--recursion`, `--probe-mode`, `--target-env`) were already passed on the `/vg:review` command line.
- Do NOT skip the pre-flight because "defaults look fine" — the operator must explicitly choose recursion depth, probe execution mode, and target environment per run.
- Do NOT delegate the prompt to `spawn_recursive_probe.py` stdin — Claude Code's bash sandbox makes `sys.stdin.isatty()` return False, so script-side prompts silently fall back to defaults.
- The bash block at the end of this section will refuse to launch (loud abort + telemetry) if it detects an interactive run with no env vars set, which means the pre-flight was skipped.
- Claude Code path: use `AskUserQuestion`. Codex path: ask the same concise questions in the main Codex thread or closest available Codex input UI.
- After the prompt answers, emit telemetry event `review.recursive_probe.preflight_asked` (logs the chosen axes for audit).
</MANDATORY_GATE>
**Pre-flight (v2.41.1) — operator config via provider-native prompt:**
> ⚠ Why this lives in the command layer (not script stdin):
> Claude Code wraps bash in a sandbox where `sys.stdin.isatty()` returns `False`,
> so the script-side `input()` prompts in `spawn_recursive_probe.py` silently fall
> back to defaults (`light` / `auto` / `sandbox`) without the operator ever
> seeing them. To deliver an actual interactive UX under Claude Code, the
> command layer asks **before** invoking bash, then exports the answers as
> env vars that bash forwards via flags.
Phase 2b-2.5 has three operator-controlled axes. The orchestrator MUST resolve
all three before invoking bash:
| Env var | Source priority | Default |
|---|---|---|
| `RECURSION_MODE` | (1) `--recursion` CLI flag → (2) provider-native prompt → (3) `light` | `light` |
| `PROBE_MODE` | (1) `--probe-mode` CLI flag → (2) provider-native prompt → (3) `auto` | `auto` |
| `TARGET_ENV` | (1) `--target-env` CLI flag → (2) `vg.config review.target_env` → (3) provider-native prompt → (4) `sandbox` | `sandbox` |
**Resolution procedure (the orchestrator runs these BEFORE the bash block):**
1. **Parse `/vg:review` CLI args.** For each of `--recursion`, `--probe-mode`,
`--target-env` that the operator passed, set the matching env var
(`RECURSION_MODE` / `PROBE_MODE` / `TARGET_ENV`) and skip its prompt.
2. **Skip prompts entirely if `VG_NON_INTERACTIVE=1`** (CI / piped runs) —
downstream defaults apply.
3. **For each axis still unset, run the provider-native prompt** with the spec below.
Ask in this order, ONE call per axis (so operator answers can short-circuit
the next prompt — e.g. picking `skip` for probe-mode means we skip the
target-env question because no probes will fire).
**Question 1 — `RECURSION_MODE` (depth/coverage envelope):**
- `light` *(recommended)* — ~15 workers, depth 2, goal cap 50. Quick coverage on touched resources only.
- `deep` — ~40 workers, depth 3, goal cap 150. Typical dogfood pass.
- `exhaustive` — ~100 workers, depth 4, goal cap 400. Pre-release sweep; expect ≥30min wall-clock.
**Question 2 — `PROBE_MODE` (execution strategy):**
- `auto` *(recommended)* — VG spawns Gemini Flash subprocess workers end-to-end.
- `manual` — VG generates per-tool prompt files (`recursive-prompts/{codex,gemini}/`) for paste; operator runs CLI session, drops artifacts in `runs/<tool>/`, VG verifies. Pick when subprocess sandboxing isn't available.
- `hybrid` — auto for high-confidence lenses (authz-negative, idor, csrf, ...), manual for human-judgment ones (business-logic, ssrf, auth-jwt). Routing comes from `vg.config review.recursive_probe.hybrid_routing`.
- `skip` — emit `.recursive-probe-skipped.yaml` and continue to 2b-3. Logs OVERRIDE-DEBT critical with reason `"interactive: operator chose skip"`. Use when the recursive layer would be redundant (e.g. follow-up review of a phase that already passed 2b-2.5).
**Question 3 — `TARGET_ENV` (deploy environment policy):** *only ask if probe-mode ≠ skip.*
- `local` — full mutations OK, unlimited budget. Pick for local dev runs.
- `sandbox` *(recommended)* — full mutations OK, 50-mutation/phase budget, disposable seed data assumed.
- `staging` — mutations OK, `lens-input-injection` blocked, 25-mutation budget, shared-env hygiene.
- `prod` — **READ-ONLY** (no POST/PUT/PATCH/DELETE), only safe lenses fire. Requires the operator to also pass `--i-know-this-is-prod=<reason>` on the next invocation (hard gate, logs OVERRIDE-DEBT critical).
4. **Export the resolved values** so the bash block sees them:
```bash
export RECURSION_MODE PROBE_MODE TARGET_ENV
skip for probe-mode, also set
SKIP_RECURSIVE_PROBE="interactive: operator chose skip" before bash.Bash invocation:
# v2.41.1 — env vars resolved by the provider-native pre-flight above.
# Bash forwards each axis ONLY if set; the script's argparse defaults apply
# otherwise (matches CI / VG_NON_INTERACTIVE=1 contract).
SKIP_REASON="${SKIP_RECURSIVE_PROBE:-}"
# v2.41.2 — anti-forge guard: if the orchestrator skipped the provider-native prompt
# pre-flight (no env vars set + not in CI), refuse to launch with bare defaults.
# This catches the regression where Phase 2b-2.5 silently ran with light/auto/
# sandbox because the markdown narrative pre-flight was lazy-skipped by the LLM.
if [[ -z "${RECURSION_MODE:-}" && -z "${PROBE_MODE:-}" && -z "${TARGET_ENV:-}" \
&& "${VG_NON_INTERACTIVE:-0}" != "1" ]]; then
echo "" >&2
echo "⛔ Phase 2b-2.5 pre-flight skipped." >&2
echo " The MANDATORY_GATE above requires provider-native prompt to run BEFORE this bash block" >&2
echo " so the operator can choose recursion depth / probe-mode / target-env." >&2
echo " None of the three env vars (RECURSION_MODE / PROBE_MODE / TARGET_ENV) are set." >&2
echo "" >&2
echo " Fix one of the following:" >&2
echo " 1. Run the provider-native prompt to ask the operator (recommended for interactive runs)" >&2
echo " 2. Pass --recursion / --probe-mode / --target-env on the /vg:review CLI" >&2
echo " 3. Set VG_NON_INTERACTIVE=1 to accept defaults (CI / scripted runs only)" >&2
echo " 4. Pass --skip-recursive-probe=<reason> to skip Phase 2b-2.5 entirely" >&2
echo "" >&2
emit_telemetry_v2 "review.recursive_probe.preflight_skipped" "${PHASE_NUMBER}" \
--tag "severity=block" 2>/dev/null || true
exit 2
fi
ARGS=( --phase-dir "$PHASE_DIR" )
if [[ -n "${RECURSION_MODE:-}" ]]; then
ARGS+=( --mode "$RECURSION_MODE" )
fi
if [[ -n "${PROBE_MODE:-}" ]]; then
ARGS+=( --probe-mode "$PROBE_MODE" )
fi
if [[ -n "${TARGET_ENV:-}" ]]; then
ARGS+=( --target-env "$TARGET_ENV" )
fi
if [[ -n "$SKIP_REASON" ]]; then
ARGS+=( --skip-recursive-probe "$SKIP_REASON" )
fi
if [[ "${VG_NON_INTERACTIVE:-0}" == "1" ]]; then
ARGS+=( --non-interactive )
fi
# v2.41.2 — pre-flight succeeded; emit telemetry so audit can confirm prompts ran.
emit_telemetry_v2 "review.recursive_probe.preflight_asked" "${PHASE_NUMBER}" \
--tag "recursion=${RECURSION_MODE:-default}" \
--tag "probe_mode=${PROBE_MODE:-default}" \
--tag "target_env=${TARGET_ENV:-default}" 2>/dev/null || true
# Task 36b — Lens dispatch enforcement (wires Task 26 infrastructure).
# Skip-mode escape (existing user decision — skip probe means skip coverage gate too)
if [ -f "${PHASE_DIR}/.recursive-probe-skipped.yaml" ]; then
echo "▸ Phase 2b-2.5 skipped per .recursive-probe-skipped.yaml — coverage gate bypassed"
else
# 1. Emit dispatch plan FIRST (trust anchor — declares all APPLICABLE dispatches)
"${PYTHON_BIN:-python3}" .claude/scripts/lens-dispatch/emit-dispatch-plan.py \
--phase-dir "${PHASE_DIR}" \
--phase "${PHASE_NUMBER}" \
--profile "$(python3 -c "import yaml,sys; d=yaml.safe_load(open('${PHASE_DIR}/.phase-profile').read()); print(d.get('phase_profile','web-fullstack'))" 2>/dev/null || echo "web-fullstack")" \
--review-run-id "${REVIEW_RUN_ID:-$(date +%s)}" \
--output "${PHASE_DIR}/LENS-DISPATCH-PLAN.json" || {
echo "⛔ Phase 2b-2.5: emit-dispatch-plan.py failed — cannot enforce lens coverage" >&2
exit 1
}
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator emit-event \
"review.lens_dispatch_emitted" \
--payload "{\"phase\":\"${PHASE_NUMBER}\",\"plan_path\":\"${PHASE_DIR}/LENS-DISPATCH-PLAN.json\"}" \
>/dev/null 2>&1 || true
# v2.67.0 #158 — Codex telemetry parity (A9 pattern): Claude gets this
# marker via PostToolUse hook on the bash above; Codex MUST emit it
# explicitly so the contract validator sees the same step coverage.
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator mark-step \
review 2b3_lens_dispatch_complete >/dev/null 2>&1 || true
# 2. Add --dispatch-plan flag so spawn_recursive_probe uses Task 26 tier dispatcher
ARGS+=( --dispatch-plan "${PHASE_DIR}/LENS-DISPATCH-PLAN.json" )
fi
python scripts/spawn_recursive_probe.py "${ARGS[@]}"
# Post-spawn: coverage gate + matrix (only when probe actually ran)
if [ ! -f "${PHASE_DIR}/.recursive-probe-skipped.yaml" ]; then
# 3. Coverage gate — assert every APPLICABLE dispatch has matching artifact
"${PYTHON_BIN:-python3}" .claude/scripts/validators/verify-lens-runs-coverage.py \
--dispatch-plan "${PHASE_DIR}/LENS-DISPATCH-PLAN.json" \
--runs-dir "${PHASE_DIR}/runs" \
--phase "${PHASE_NUMBER}" \
--evidence-out "${PHASE_DIR}/.lens-coverage-evidence.json"
COVERAGE_RC=$?
# 4. Render coverage matrix (always — gives user the picture even on failure)
"${PYTHON_BIN:-python3}" .claude/scripts/aggregators/lens-coverage-matrix.py \
--dispatch-plan "${PHASE_DIR}/LENS-DISPATCH-PLAN.json" \
--runs-dir "${PHASE_DIR}/runs" \
--output "${PHASE_DIR}/LENS-COVERAGE-MATRIX.md" || true
# v2.67.0 #158 — Codex telemetry parity (A9 pattern): mark matrix
# rendered so the contract sees the same coverage Claude gets via hook.
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator mark-step \
review 2b3_lens_matrix_rendered >/dev/null 2>&1 || true
# 5. Coverage failure → Task 33 wrapper (NOT exit 1 — user gets 4 options)
if [ "$COVERAGE_RC" -ne 0 ]; then
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator emit-event \
"review.lens_coverage_blocked" \
--payload "{\"phase\":\"${PHASE_NUMBER}\",\"evidence\":\"${PHASE_DIR}/.lens-coverage-evidence.json\"}" \
>/dev/null 2>&1 || true
# Task 33 wrapper: present 4 options
# [a] auto-fix-spawn-missing-lenses / [s] skip-with-override / [r] amend / [x] abort
source scripts/lib/blocking-gate-prompt.sh
blocking_gate_prompt_emit "lens_coverage_blocked" \
"${PHASE_DIR}/.lens-coverage-evidence.json" \
"error" \
"${PHASE_DIR}/LENS-COVERAGE-MATRIX.md"
# AI controller calls AskUserQuestion → re-invokes Leg 2
# Branch on Leg 2 exit code per blocking-gate-prompt-contract.md
fi
fi
Argparse forwarding (entry point of /vg:review):
# /vg:review accepts these flags. The orchestrator parses them BEFORE the
# Provider-native pre-flight runs and exports the matching env var so the
# operator only gets prompted for axes they didn't pre-supply:
# --recursion={light,deep,exhaustive} → export RECURSION_MODE=$value
# --probe-mode={auto,manual,hybrid} → export PROBE_MODE=$value
# --target-env={local,sandbox,staging,prod} → export TARGET_ENV=$value
# --skip-recursive-probe="<reason>" → export SKIP_RECURSIVE_PROBE=$value
# --non-interactive → export VG_NON_INTERACTIVE=1 (suppress provider prompts + stdin prompts)
# --i-know-this-is-prod="<reason>" → forwarded as-is (prod-safety opt-in)
Manual mode (PROBE_MODE=manual):
The dispatcher writes prompt files to ${PHASE_DIR}/recursive-prompts/MANIFEST.md and pauses. Operator runs each prompt against their preferred CLI agent (gemini/codex/claude), drops artifacts back into ${PHASE_DIR}/runs/<tool>/, then resumes the pipeline. The verifier runs automatically when the user signals completion:
if [[ "$PROBE_MODE" == "manual" ]]; then
echo "Manual prompts written. Follow ${PHASE_DIR}/recursive-prompts/MANIFEST.md, drop artifacts in runs/, then press Enter."
if [[ "${VG_NON_INTERACTIVE:-0}" != "1" ]]; then
read -r _
fi
python scripts/verify_manual_run_artifacts.py --phase-dir "$PHASE_DIR" || exit 1
fi
Hybrid mode: dispatcher routes per-lens to auto vs manual based on vg.config.md → review.recursive_probe.hybrid_routing. See [vg:_shared:config-loader] for resolution.
Aggregation (single-writer, end of 2b-2.5):
python scripts/aggregate_recursive_goals.py --phase-dir "$PHASE_DIR" --mode "$RECURSION_MODE"
# Writes TEST-GOALS-DISCOVERED.md (G-RECURSE-* level-3 entries) + recursive-goals-overflow.json.
Idempotency: Re-running 2b-2.5 reuses existing runs/ artifacts; canonical-key dedup in aggregator prevents duplicate goal stubs.
Failure semantics: Eligibility fail → skip block (continue). Worker fail → recorded in runs/INDEX.json, does not abort pipeline. Manual mode timeout → operator re-runs; no automatic retry.
1. Wait for all Haiku agents to complete
2. Read SUMMARIES ONLY (not full JSON):
For each scan-{view}-{role}.json:
Read only the top-level fields: view, role, elements_total, elements_visited,
elements_stuck, errors[] count, forms[] count, sub_views_discovered[]
→ Build slim overview: { view, visited_pct, error_count, stuck_count }
IF a view has error_count > 0 OR stuck_count > 3 OR visited_pct < 90%:
THEN read that view's full scan-{view}-{role}.json for detail
ELSE: discard full JSON content — do NOT load into context
3. Cross-check coverage vs SPECS:
- SPECS says phase has payments feature → Haiku found /payments? ✓
- PLAN says 3 modals built → Haiku found 3 modals? ✓
- Haiku discovered sub-views not in original list? → note for gap-filling
4. Gaps detected:
- View listed but Haiku couldn't reach → Opus investigates (wrong URL? auth?)
- Haiku found sub-views (e.g., /sites/123/settings) → spawn more Haiku
- Elements marked "stuck" (file upload, complex wizard) → Opus handles or defers
5. Spawn additional Haiku agents if gaps found → collect → merge
6. MERGE all scan results into coverage-map:
views = all Haiku view results
errors = concatenate + deduplicate
stuck = concatenate
forms = concatenate
7. QUALITY CHECK (Opus judgment on Haiku results):
Flag suspicious results:
- elements_visited < elements_total without stuck explanation → mark INCOMPLETE
- Form submitted but no network request recorded → mark SUSPICIOUS
- Console errors present but Haiku didn't report them → mark NEEDS_REVIEW
- elements_total very low for a complex page → mark SHALLOW (Haiku may have missed scroll/lazy-load)
8. UPDATE GOAL-COVERAGE-MATRIX:
For each TEST-GOALS goal, check if Haiku scan results cover it:
- Form submitted matching goal's mutation → ⬜ → 🔍 SCAN-COVERED
- View explored but goal-specific action not triggered → ⬜ → ⚠️ SCAN-PARTIAL
- View not scanned → ⬜ → ❌ NOT-COVERED
Note: Haiku scanners don't pursue goals — they scan exhaustively.
Goal coverage mapping is done by Opus reading scan results.
9. PROBE VARIATIONS (OPT-IN — only runs if --with-probes flag set):
Default OFF: /vg:test generates deterministic Playwright probes via codegen — cheaper,
more reliable than LLM-driven probes, and already covers edit/boundary/repeat patterns.
Only set --with-probes when: test codegen can't cover the mutation (e.g., complex data
setup, external service stubs), or debugging a goal that passed scan but failed probes.
IF NOT --with-probes: skip to step 10.
For each goal marked SCAN-COVERED that involves mutations (create/edit/delete):
Spawn Haiku probe agent (model="haiku"):
"""
You are a probe agent. Test mutation variations for goal: {goal_id}.
URL: {view_url} | Login: {credentials}
Primary action: {what Haiku scan already did — from scan JSON}
Run 3 probes:
Probe 1 — EDIT: Navigate to the record just created/modified.
Open edit form → change 1-2 fields (different valid data) → submit
→ Record: {changed_fields, result, network[], console_errors[]}
Probe 2 — BOUNDARY: Open same form again.
Fill with edge values: empty optional fields, max-length "A"×255,
special chars "O'Brien <script>", zero for numbers, past dates
→ Submit → Record: {values_description, result, validation_errors[]}
Probe 3 — REPEAT: Open same form again.
Fill with EXACT same data as primary scan → submit
→ Expect: success OR proper duplicate error — NOT crash/500
→ Record: {result, is_duplicate_handled}
Write to: {PHASE_DIR}/probe-{goal_id}.json
"""
Collect all probe JSONs → merge into goal_sequences[goal_id].probes[]
Update matrix: SCAN-COVERED + probes passed → 🔍 PROBE-VERIFIED
10. For NOT-COVERED or SHALLOW items:
Opus does targeted investigation using its own MCP Playwright:
- Claim 1 server
- Navigate to specific view/element
- Investigate why Haiku missed it
- Release server
<CHECKPOINT_RULE>
**Atomic artifact per major step — no separate state file (v1.14.4+):**
- Step 2b-1 → writes `${PHASE_DIR}/nav-discovery.json` (atomic)
- Step 2b-2 → writes `${PHASE_DIR}/scan-{view-slug}.json` per Haiku agent (atomic per view)
- Step 2b-3 → writes `${PHASE_DIR}/RUNTIME-MAP.json` + `${PHASE_DIR}/GOAL-COVERAGE-MATRIX.md`
- Steps 8/9/10 → extend RUNTIME-MAP.json + GOAL-COVERAGE-MATRIX.md
If session dies mid-2b-2: re-run `/vg:review {phase}` — nav-discovery.json + partial scan-*.json stay, orchestrator redoes only missing views. Per-view scan is cheap (~30s Haiku call), no need for global state file. Step-level idempotency handled by `.step-markers/*.done`.
</CHECKPOINT_RULE>
Session model (from config):
$SESSION_MODEL = "multi-context": each Haiku agent uses own browser context (natural fit)config.credentials[ENV] — NOT hardcoded3-layer schema: navigation graph + interactive elements + goal action sequences.
No component-type classification (no "modal", "table", "card" types). Elements are binary: interactive or not. State changes are observed via fingerprint diff (URL + element count + DOM hash), not classified.
Write ${PHASE_DIR}/RUNTIME-MAP.json:
{
"phase": "{phase}",
"build_sha": "{sha}",
"discovered_at": "{ISO timestamp}",
"views": {
"{view_path}": {
"role": "{role from config.credentials}",
"arrive_via": "{click sequence to get here — e.g. sidebar > menu item}",
"snapshot_summary": "{free text — AI describes what it sees, chooses best format}",
"fingerprint": { "url": "{url}", "element_count": 0, "dom_hash": "{sha256[:16]}" },
"elements": [
{ "selector": "{from snapshot}", "label": "{visible text}", "visited": false }
],
"issues": [],
"screenshots": ["{phase}-{view}-{state}.png"]
}
},
"goal_sequences": {
"{goal_id}": {
"start_view": "{view_path}",
"result": "passed|failed",
"steps": [
{ "do": "click", "selector": "{from snapshot}", "label": "{text}" },
{ "do": "fill", "selector": "{from snapshot}", "value": "{test data}" },
{ "do": "select", "selector": "{from snapshot}", "value": "{option}" },
{ "do": "wait", "for": "{condition — state_changed|network_idle|element_visible}" },
{ "observe": "{what_changed}", "network": [{"method": "POST", "url": "{observed}", "status": 201}], "console_errors": [] },
{ "assert": "{criterion from TEST-GOALS}", "passed": true }
],
"probes": [
{ "type": "edit", "changed_fields": ["{field}"], "result": "passed|failed", "network": [], "console_errors": [] },
{ "type": "boundary", "values_description": "{what AI tried}", "result": "passed|failed", "network": [], "console_errors": [] },
{ "type": "repeat", "result": "passed|failed", "network": [], "console_errors": [] }
],
"evidence": ["{screenshot paths}"]
}
},
"free_exploration": [
{ "view": "{view_path}", "element_selector": "{selector}", "element_label": "{text}", "result": "{free text}", "issue": null }
],
"errors": [],
"coverage": {
"views": 0,
"goals_attempted": 0,
"goals_passed": 0,
"elements_visited": 0,
"elements_total": 0,
"pass_1_time": "{duration}",
"pass_2_time": "{duration}"
}
}
Schema design principles (from research):
{ selector, label, visited }. AI doesn't classify "button" vs "link" vs "row action". Binary: interactive or not. (browser-use approach)observe steps. (browser-use PageFingerprint approach)do (action) or observe (observation) or assert (verification). Test step replays these 1:1. Codegen converts to .spec.ts nearly 1:1. (Playwright codegen approach)Derive ${PHASE_DIR}/RUNTIME-MAP.md from JSON (human-readable summary):
# Runtime Map — Phase {phase}
Generated from: RUNTIME-MAP.json | Build: {sha}
## Views ({N} discovered)
### {view_path} ({role})
{snapshot_summary}
Elements: {N} interactive ({visited}/{total} visited)
## Goal Sequences ({passed}/{total} passed)
### {goal_id}: {description}
1. {do}: {label} → {observe}
2. {do}: {label} → {observe}
...
Result: {passed|failed}
## Free Exploration ({N} elements, {issues} issues found)
## Errors ({N})
JSON is the source of truth. Markdown is derived. Downstream steps (test, codegen) read JSON.
Phase 15 D-17 — phantom-aware Haiku spawn audit (NEW, 2026-04-27):
Confirm the review.haiku_scanner_spawned event emitted by step 2b-2 is
actually present in events.db for every (view × role) we expected to scan.
The validator (verify-haiku-spawn-fired.py) is phantom-aware: it ignores
events from runs whose signature matches args:"" + 0 step.marked + abort
within 60s (the D-17 hook-triggered noise pattern), so manual /vg:learn
invocations don't show up as false positives.
PHANTOM_VALIDATOR="${REPO_ROOT}/.claude/scripts/validators/verify-haiku-spawn-fired.py"
if [ -x "$PHANTOM_VALIDATOR" ] && [ -f "${REPO_ROOT}/.vg/events.db" ]; then
${PYTHON_BIN} "$PHANTOM_VALIDATOR" --phase "${PHASE_NUMBER}" \
> "${VG_TMP}/haiku-spawn-audit.json" 2>&1 || true
HSV=$(${PYTHON_BIN} -c "import json,sys; print(json.load(open('${VG_TMP}/haiku-spawn-audit.json')).get('verdict','SKIP'))" 2>/dev/null)
case "$HSV" in
PASS) echo "✓ D-17 Haiku-spawn audit: PASS — telemetry confirms scanner fired per view/role" ;;
WARN) echo "⚠ D-17 Haiku-spawn audit: WARN — see ${VG_TMP}/haiku-spawn-audit.json (informational only)" ;;
BLOCK)
echo "⛔ D-17 Haiku-spawn audit: BLOCK — expected scanner spawns missing from events.db." >&2
echo " Inspect ${VG_TMP}/haiku-spawn-audit.json for the per-(view,role) breakdown." >&2
echo " Common cause: orchestrator ran briefing_for_view but Agent() spawn was skipped." >&2
echo " Override: --skip-haiku-audit (logs override-debt as kind=haiku-spawn-audit-skipped)." >&2
if [[ ! "$ARGUMENTS" =~ --skip-haiku-audit ]]; then
exit 1
fi
;;
SKIP|*) echo "ℹ D-17 Haiku-spawn audit: ${HSV} — likely no UI-profile views in this phase" ;;
esac
fi
After RUNTIME-MAP exists, materialize the plugin contract that the remaining review steps must execute. This is the harness-level binding between the visible step list and the smaller checks/lenses.
LENS_PLAN_SCRIPT="${REPO_ROOT}/.claude/scripts/review-lens-plan.py"
if [ -f "$LENS_PLAN_SCRIPT" ]; then
"${PYTHON_BIN:-python3}" "$LENS_PLAN_SCRIPT" \
--phase-dir "$PHASE_DIR" \
--profile "${PROFILE:-${CONFIG_PROFILE:-web-fullstack}}" \
--mode "${REVIEW_MODE:-full}" \
--write
LENS_PLAN_RC=$?
if [ "$LENS_PLAN_RC" -ne 0 ] || [ ! -f "${PHASE_DIR}/REVIEW-LENS-PLAN.json" ]; then
echo "⛔ Review lens plan generation failed — cannot prove plugin checklist coverage." >&2
exit 1
fi
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator emit-event \
"review.lens_plan_generated" \
--payload "{\"phase\":\"${PHASE_NUMBER}\",\"artifact\":\"REVIEW-LENS-PLAN.json\"}" \
>/dev/null 2>&1 || true
else
echo "⛔ Missing review lens planner: $LENS_PLAN_SCRIPT" >&2
exit 1
fi
## Phase 2c — Enrich TEST-GOALS from runtime discovery (v2.34.0+, closes #52)
Bridges the design gap between Step 3 (click many components) and Step 4 (rich goals for test layer) of the original 4-step review architecture. Without this step, every Haiku-discovered button/form/modal/tab/row-action sits dead in views[X].elements[] and the downstream test layer never tests it.
enrich-test-goals.py reads every scan-*.json, classifies elements (modal triggers, mutations, forms, table row actions, paging, tabs), dedupes against existing TEST-GOALS.md interactive_controls, and emits ${PHASE_DIR}/TEST-GOALS-DISCOVERED.md with G-AUTO-* goal stubs. /vg:test codegen (step 5d) reads both files; auto-emitted specs land as auto-{goal-id}.spec.ts for visual distinction.
echo ""
echo "━━━ Phase 2c — Enrich TEST-GOALS from runtime discovery ━━━"
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator step-active phase2c_enrich_test_goals >/dev/null 2>&1 || true
ENRICH_THRESHOLD=$(vg_config_get "review.enrich_min_elements" "3" 2>/dev/null || echo "3")
${PYTHON_BIN:-python3} .claude/scripts/enrich-test-goals.py \
--phase-dir "$PHASE_DIR" \
--threshold "$ENRICH_THRESHOLD"
ENRICH_RC=$?
case "$ENRICH_RC" in
0)
AUTO_COUNT=$(grep -c "^id: G-AUTO-" "$PHASE_DIR/TEST-GOALS-DISCOVERED.md" 2>/dev/null || echo 0)
echo " ✓ Phase 2c: ${AUTO_COUNT} auto-emitted goals → ${PHASE_DIR}/TEST-GOALS-DISCOVERED.md"
emit_telemetry_v2 "review_phase2c_enriched" "${PHASE_NUMBER}" \
"review.2c-enrich" "test_goals_enrichment" "PASS" \
"{\"auto_goals\":${AUTO_COUNT}}" 2>/dev/null || true
;;
*)
echo " ⚠ Phase 2c enrichment failed (rc=${ENRICH_RC}) — TEST-GOALS-DISCOVERED.md not written."
echo " Test layer codegen will fall back to TEST-GOALS.md only (legacy behavior)."
emit_telemetry_v2 "review_phase2c_failed" "${PHASE_NUMBER}" \
"review.2c-enrich" "test_goals_enrichment" "WARN" \
"{\"rc\":${ENRICH_RC}}" 2>/dev/null || true
;;
esac
# Coverage validator: BLOCK if any view had elements scanned but no goals derived.
# This catches the failure mode where Haiku ran but classification missed everything
# (e.g. element schema drift, parser bug). Per-phase override via --skip-enrich-validate.
if [[ ! "$ARGUMENTS" =~ --skip-enrich-validate ]]; then
${PYTHON_BIN:-python3} .claude/scripts/enrich-test-goals.py \
--phase-dir "$PHASE_DIR" \
--threshold "$ENRICH_THRESHOLD" \
--validate-only
VALIDATE_RC=$?
if [ "$VALIDATE_RC" -ne 0 ]; then
echo " ⛔ Phase 2c enrichment validation FAILED."
echo " Either re-run /vg:review {phase} so scanners visit those views,"
echo " or pass --skip-enrich-validate=\"<reason>\" to log OVERRIDE-DEBT."
emit_telemetry_v2 "review_phase2c_coverage_gap" "${PHASE_NUMBER}" \
"review.2c-enrich" "test_goals_enrichment_coverage" "FAIL" \
"{\"rc\":${VALIDATE_RC}}" 2>/dev/null || true
exit 1
fi
fi
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator mark-step review phase2c_enrich_test_goals 2>/dev/null || true
## Phase 2c-pre — Contract completeness + env preflight (v2.39.0+)
Two pre-dispatch gates close Codex critiques #1 (contract validity not gated) + #6 (env state implicit):
verify-contract-completeness.py diffs runtime/code inventory against CRUD-SURFACES.md declared resources. Flags hidden routes, undeclared resources, background jobs, webhooks.verify-env-contract.py reads ENV-CONTRACT.md preflight_checks and verifies each (app reachable, seed data present, login works).If contract incomplete OR env preflight fails → review aborts BEFORE spawning expensive workers (Gemini Flash workers can run $0.30-1.00 per phase; aborting pre-spawn saves token cost when env is broken).
echo ""
echo "━━━ Phase 2c-pre — Contract completeness + env preflight ━━━"
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator step-active phase2c_pre_dispatch_gates >/dev/null 2>&1 || true
# Contract completeness gate (severity warn first release for dogfood)
COMPLETE_SEV=$(vg_config_get "review.contract_completeness.severity" "warn" 2>/dev/null || echo "warn")
${PYTHON_BIN:-python3} .claude/scripts/verify-contract-completeness.py \
--phase-dir "$PHASE_DIR" \
--code-root "${REPO_ROOT}" \
--severity "$COMPLETE_SEV"
COMPLETE_RC=$?
if [ "$COMPLETE_RC" -ne 0 ] && [ "$COMPLETE_SEV" = "block" ]; then
echo "⛔ Contract completeness BLOCK — see CONTRACT-COMPLETENESS.json"
exit 1
fi
# Env contract preflight (mandatory if any kit:crud-roundtrip declared, optional for kit:static-sast)
if grep -q '"kit"\s*:\s*"crud-roundtrip"\|"kit"\s*:\s*"approval-flow"\|"kit"\s*:\s*"bulk-action"' "${PHASE_DIR}/CRUD-SURFACES.md" 2>/dev/null; then
ENV_SEV=$(vg_config_get "review.env_contract.severity" "block" 2>/dev/null || echo "block")
if [[ "$ARGUMENTS" =~ --skip-env-contract=\"([^\"]*)\" ]]; then
ENV_REASON="${BASH_REMATCH[1]}"
echo " ⚠ ENV-CONTRACT skipped: $ENV_REASON (logged to OVERRIDE-DEBT)"
else
${PYTHON_BIN:-python3} .claude/scripts/verify-env-contract.py \
--phase-dir "$PHASE_DIR" \
> "${PHASE_DIR}/.tmp/env-contract-review.txt" 2>&1
ENV_RC=$?
if [ "$ENV_RC" -ne 0 ] && [ "$ENV_SEV" = "block" ]; then
echo "⛔ ENV-CONTRACT preflight FAIL — fix env or pass --skip-env-contract=\"<reason>\""
cat "${PHASE_DIR}/.tmp/env-contract-review.txt" 2>/dev/null || true
DIAG_SCRIPT="${REPO_ROOT}/.claude/scripts/review-block-diagnostic.py"
if [ -f "$DIAG_SCRIPT" ]; then
"${PYTHON_BIN:-python3}" "$DIAG_SCRIPT" \
--gate-id "review.env_contract" \
--phase-dir "$PHASE_DIR" \
--input "${PHASE_DIR}/.tmp/env-contract-review.txt" \
--out-md "${PHASE_DIR}/.tmp/env-contract-diagnostic.md" \
>/dev/null 2>&1 || true
cat "${PHASE_DIR}/.tmp/env-contract-diagnostic.md" 2>/dev/null || true
fi
exit 1
fi
fi
fi
emit_telemetry_v2 "review_phase2c_pre_gates" "${PHASE_NUMBER}" \
"review.2c-pre" "pre_dispatch_gates" "PASS" \
"{\"contract_complete_rc\":${COMPLETE_RC:-0}}" 2>/dev/null || true
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator mark-step review phase2c_pre_dispatch_gates 2>/dev/null || true
## Phase 2d — CRUD round-trip lens dispatch (v2.35.0+, closes #51)
Dispatches Gemini Flash workers per (resource × role) declared with kit: crud-roundtrip in CRUD-SURFACES.md. Each worker runs the 8-step Read→Create→Read→Update→Read→Delete→Read round-trip per commands/vg/_shared/transition-kits/crud-roundtrip.md.
Why Gemini Flash (not Claude Haiku): $0.075/M input vs $1.00/M = 13× cheaper. Already MCP-configured (5 Playwright servers in ~/.gemini/settings.json). Already in cross-CLI plumbing.
Pre-flight: auth fixture must exist. If not, run scripts/review-fixture-bootstrap.py first.
echo ""
echo "━━━ Phase 2d — CRUD round-trip lens dispatch ━━━"
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator step-active phase2d_crud_roundtrip_dispatch >/dev/null 2>&1 || true
# Skip if no CRUD-SURFACES or no resources declare this kit
if [ ! -f "${PHASE_DIR}/CRUD-SURFACES.md" ]; then
echo " (no CRUD-SURFACES.md — skipping Phase 2d)"
elif ! grep -q '"kit"\s*:\s*"crud-roundtrip"' "${PHASE_DIR}/CRUD-SURFACES.md"; then
echo " (no resources with kit: crud-roundtrip — skipping Phase 2d)"
else
# Bootstrap auth tokens if missing
TOKENS_PATH="${PHASE_DIR}/.review-fixtures/tokens.local.yaml"
REPO_TOKENS_PATH="${REPO_ROOT}/.review-fixtures/tokens.local.yaml"
if [ ! -f "$TOKENS_PATH" ] && [ ! -f "$REPO_TOKENS_PATH" ]; then
echo " Bootstrapping auth tokens..."
${PYTHON_BIN:-python3} .claude/scripts/review-fixture-bootstrap.py \
--phase-dir "$PHASE_DIR" || {
echo " ⚠ Auth fixture bootstrap failed — Phase 2d skipped (workers cannot authenticate)"
}
fi
if [ -f "$TOKENS_PATH" ] || [ -f "$REPO_TOKENS_PATH" ]; then
COST_CAP=$(vg_config_get "review.crud_roundtrip.cost_cap_usd" "1.50" 2>/dev/null || echo "1.50")
CONCURRENCY=$(vg_config_get "review.crud_roundtrip.concurrency" "2" 2>/dev/null || echo "2")
${PYTHON_BIN:-python3} .claude/scripts/spawn-crud-roundtrip.py \
--phase-dir "$PHASE_DIR" \
--concurrency "$CONCURRENCY" \
--cost-cap "$COST_CAP"
DISPATCH_RC=$?
if [ "$DISPATCH_RC" -eq 0 ]; then
ARTIFACTS=$(${PYTHON_BIN:-python3} -c "import json; d=json.load(open('${PHASE_DIR}/runs/INDEX.json')); print(d.get('artifacts_present', 0))" 2>/dev/null || echo "0")
echo " ✓ CRUD round-trip dispatch complete: ${ARTIFACTS} run artifact(s)"
emit_telemetry_v2 "review_phase2d_dispatched" "${PHASE_NUMBER}" \
"review.2d-crud-dispatch" "crud_roundtrip" "PASS" \
"{\"artifacts\":${ARTIFACTS}}" 2>/dev/null || true
else
echo " ⚠ CRUD round-trip dispatch failed (rc=${DISPATCH_RC})"
emit_telemetry_v2 "review_phase2d_failed" "${PHASE_NUMBER}" \
"review.2d-crud-dispatch" "crud_roundtrip" "FAIL" \
"{\"rc\":${DISPATCH_RC}}" 2>/dev/null || true
fi
fi
fi
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator mark-step review phase2d_crud_roundtrip_dispatch 2>/dev/null || true
## Phase 2e — Findings derivation (v2.35.0+)
Reads run artifacts from Phase 2d and derives REVIEW-FINDINGS.json (machine-readable, deduped) + REVIEW-BUGS.md (Strix-style human-readable triage doc).
No auto-route to /vg:build in v2.35.0 — manual triage during dogfood per Codex review feedback. Auto-route candidate for v2.37.0 after schema confidence/dedupe quality validated on real findings.
echo ""
echo "━━━ Phase 2e — Findings derivation ━━━"
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator step-active phase2e_findings_merge >/dev/null 2>&1 || true
if [ -d "${PHASE_DIR}/runs" ] && [ -n "$(ls -A ${PHASE_DIR}/runs/*.json 2>/dev/null | grep -v INDEX.json)" ]; then
${PYTHON_BIN:-python3} .claude/scripts/derive-findings.py \
--phase-dir "$PHASE_DIR"
DERIVE_RC=$?
if [ "$DERIVE_RC" -eq 0 ] && [ -f "${PHASE_DIR}/REVIEW-FINDINGS.json" ]; then
FINDING_COUNT=$(${PYTHON_BIN:-python3} -c "import json; d=json.load(open('${PHASE_DIR}/REVIEW-FINDINGS.json')); print(d.get('findings_total', 0))" 2>/dev/null || echo "0")
echo " ✓ ${FINDING_COUNT} finding(s) derived → ${PHASE_DIR}/REVIEW-BUGS.md"
emit_telemetry_v2 "review_phase2e_findings" "${PHASE_NUMBER}" \
"review.2e-findings" "findings_derive" "PASS" \
"{\"findings\":${FINDING_COUNT}}" 2>/dev/null || true
fi
else
echo " (no run artifacts to derive — skipping)"
fi
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator mark-step review phase2e_findings_merge 2>/dev/null || true
## Phase 2e-post — Manager adversarial challenge (v2.39.0+, closes Codex critique #7)
Workers report coverage.passed. This step asks: "do these passes actually imply coverage?". Heuristic adversarial reducer samples N% of run artifacts and challenges each pass step:
pass with empty evidence_ref → downgrade to weak-passpass with empty observed block → downgrade to weak-passpass with observed status mismatching expected → flagged false-pass (severity DEGRADED)Output: ${PHASE_DIR}/COVERAGE-CHALLENGE.json with downgrades + warnings. v2.40 may add LLM-driven challenge for ambiguous claims.
echo ""
echo "━━━ Phase 2e-post — Manager adversarial challenge ━━━"
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator step-active phase2e_post_challenge >/dev/null 2>&1 || true
if [ -d "${PHASE_DIR}/runs" ] && [ -n "$(ls -A ${PHASE_DIR}/runs/*.json 2>/dev/null | grep -v INDEX.json)" ]; then
CHALLENGE_RATE=$(vg_config_get "review.challenge.sample_rate" "25" 2>/dev/null || echo "25")
CHALLENGE_SEV=$(vg_config_get "review.challenge.severity" "warn" 2>/dev/null || echo "warn")
${PYTHON_BIN:-python3} .claude/scripts/challenge-coverage.py \
--phase-dir "$PHASE_DIR" \
--sample-rate "$CHALLENGE_RATE" \
--severity "$CHALLENGE_SEV"
CHALLENGE_RC=$?
if [ "$CHALLENGE_RC" -ne 0 ] && [ "$CHALLENGE_SEV" = "block" ]; then
echo "⛔ Coverage challenge: false-pass steps detected. See COVERAGE-CHALLENGE.json"
emit_telemetry_v2 "review_phase2e_post_challenge_failed" "${PHASE_NUMBER}" \
"review.2e-post" "coverage_challenge" "BLOCK" "{}" 2>/dev/null || true
exit 1
fi
emit_telemetry_v2 "review_phase2e_post_challenge" "${PHASE_NUMBER}" \
"review.2e-post" "coverage_challenge" "PASS" \
"{\"sample_rate\":${CHALLENGE_RATE}}" 2>/dev/null || true
fi
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator mark-step review phase2e_post_challenge 2>/dev/null || true
## Phase 2f — Route findings to /vg:build (v2.37.0+, opt-in)
Reads REVIEW-FINDINGS.json and emits AUTO-FIX-TASKS.md for findings meeting the conservative gate (severity ≥ high, confidence == high, cleanup_status == completed). /vg:build consumes via --include-auto-fix flag (opt-in v2.37, may default-on v2.38 after dogfood).
echo ""
echo "━━━ Phase 2f — Route findings to /vg:build (auto-fix loop) ━━━"
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator step-active phase2f_route_auto_fix >/dev/null 2>&1 || true
if [ -f "${PHASE_DIR}/REVIEW-FINDINGS.json" ]; then
${PYTHON_BIN:-python3} .claude/scripts/route-findings-to-build.py \
--phase-dir "$PHASE_DIR"
ROUTE_RC=$?
if [ "$ROUTE_RC" -eq 0 ] && [ -f "${PHASE_DIR}/AUTO-FIX-TASKS.md" ]; then
TASK_COUNT=$(grep -c "^### Task AF-" "${PHASE_DIR}/AUTO-FIX-TASKS.md" 2>/dev/null || echo 0)
echo " ✓ ${TASK_COUNT} auto-fix task group(s) → AUTO-FIX-TASKS.md"
echo " Run /vg:build ${PHASE_NUMBER} --include-auto-fix to consume"
emit_telemetry_v2 "review_phase2f_routed" "${PHASE_NUMBER}" \
"review.2f-route" "auto_fix_routing" "PASS" \
"{\"task_groups\":${TASK_COUNT}}" 2>/dev/null || true
else
echo " (no qualifying findings to route)"
fi
else
echo " (no REVIEW-FINDINGS.json — skipping)"
fi
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator mark-step review phase2f_route_auto_fix 2>/dev/null || true
## Phase 2-limit: EXPLORATION LIMIT CHECK (R8 enforcement — v1.14.4+)
Counts actions + views + wall-time sau Phase 2 để phát hiện runaway discovery (phát hiện quét vô kiểm soát). WARN (cảnh báo) only — không block (không chặn) vì discovery đã xong. Kết quả ghi vào PIPELINE-STATE.json metrics để test/accept biết RUNTIME-MAP có thể noisy (nhiễu).
Thresholds (ngưỡng):
config.review.max_actions_per_view — default 50config.review.max_actions_total — default 200config.review.max_wall_minutes — default 30RUNTIME_MAP="${PHASE_DIR}/RUNTIME-MAP.json"
if [ ! -f "$RUNTIME_MAP" ]; then
echo "⚠ RUNTIME-MAP.json chưa tồn tại — bỏ qua limit check (Phase 2 có thể skipped hoặc failed)."
(type -t mark_step >/dev/null 2>&1 && mark_step "${PHASE_NUMBER:-unknown}" "phase2_exploration_limits" "${PHASE_DIR}") || touch "${PHASE_DIR}/.step-markers/phase2_exploration_limits.done"
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator mark-step review phase2_exploration_limits 2>/dev/null || true
else
MAX_VIEW="${CONFIG_REVIEW_MAX_ACTIONS_PER_VIEW:-50}"
MAX_TOTAL="${CONFIG_REVIEW_MAX_ACTIONS_TOTAL:-200}"
MAX_WALL="${CONFIG_REVIEW_MAX_WALL_MINUTES:-30}"
PYTHONIOENCODING=utf-8 ${PYTHON_BIN} - "$RUNTIME_MAP" "$MAX_VIEW" "$MAX_TOTAL" "$MAX_WALL" "${PHASE_DIR}" <<'PY'
import json, sys, time
from pathlib import Path
from datetime import datetime, timezone
rm_path = Path(sys.argv[1])
max_view = int(sys.argv[2])
max_total = int(sys.argv[3])
max_wall_min = int(sys.argv[4])
phase_dir = Path(sys.argv[5])
rm = json.loads(rm_path.read_text(encoding="utf-8"))
views = rm.get("views", {}) or {}
seqs = rm.get("goal_sequences", {}) or {}
per_view_actions = {}
total_actions = 0
# Count goal_sequences steps grouped by start_view
for gid, seq in seqs.items():
start = seq.get("start_view") or "<unknown>"
n = len(seq.get("steps", []) or [])
per_view_actions[start] = per_view_actions.get(start, 0) + n
total_actions += n
# Add free_exploration actions if tracked per view
for v_url, v in views.items():
fe = (v.get("free_exploration") or {}).get("actions_count", 0) or 0
per_view_actions[v_url] = per_view_actions.get(v_url, 0) + fe
total_actions += fe
# Wall time — use session-start marker mtime as proxy for discovery start
marker = phase_dir / ".step-markers" / "00_session_lifecycle.done"
wall_min = None
if marker.exists():
wall_min = (time.time() - marker.stat().st_mtime) / 60.0
# Evaluate
warnings = []
for v, n in per_view_actions.items():
if n > max_view:
warnings.append({"type": "view_overflow", "view": v, "count": n, "limit": max_view})
if total_actions > max_total:
warnings.append({"type": "total_overflow", "count": total_actions, "limit": max_total})
if wall_min is not None and wall_min > max_wall_min:
warnings.append({"type": "wall_overflow", "minutes": round(wall_min, 1), "limit": max_wall_min})
# Report
if warnings:
print(f"⚠ R8 exploration limits exceeded ({len(warnings)} signal):")
for w in warnings:
if w["type"] == "view_overflow":
print(f" - view '{w['view']}' → {w['count']} actions vượt limit {w['limit']}")
elif w["type"] == "total_overflow":
print(f" - total → {w['count']} actions vượt limit {w['limit']}")
elif w["type"] == "wall_overflow":
print(f" - wall time (thời gian) → {w['minutes']} min vượt limit {w['limit']}")
print("")
print("Khuyến nghị (recommendation):")
print(" - Review RUNTIME-MAP.json: có action lặp/vô ích không")
print(" - Giảm views scanned hoặc tắt --full-scan (sidebar suppression giúp giảm action)")
print(" - Nếu phase lớn thật, tăng config.review.max_actions_total")
else:
wall_txt = f", {wall_min:.1f} min" if wall_min else ""
print(f"✓ Exploration within limits: {total_actions} actions, {len(per_view_actions)} views{wall_txt}")
# Log to PIPELINE-STATE.json regardless
state_path = phase_dir / "PIPELINE-STATE.json"
state = {}
if state_path.exists():
try:
state = json.loads(state_path.read_text(encoding="utf-8"))
except Exception:
state = {}
state.setdefault("metrics", {})["review_exploration"] = {
"total_actions": total_actions,
"views_scanned": len(per_view_actions),
"wall_minutes": round(wall_min, 1) if wall_min is not None else None,
"thresholds": {"per_view": max_view, "total": max_total, "wall_min": max_wall_min},
"warnings": warnings,
"recorded_at": datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ"),
}
state_path.write_text(json.dumps(state, indent=2, ensure_ascii=False), encoding="utf-8")
PY
(type -t mark_step >/dev/null 2>&1 && mark_step "${PHASE_NUMBER:-unknown}" "phase2_exploration_limits" "${PHASE_DIR}") || touch "${PHASE_DIR}/.step-markers/phase2_exploration_limits.done"
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator mark-step review phase2_exploration_limits 2>/dev/null || true
fi
Hành vi downstream: nếu có warnings, step crossai_review cuối pipeline sẽ include "exploration noisy" flag vào context để CrossAI xem xét kỹ goals liên quan views overflow.
Fires when profile ∈ {mobile-rn, mobile-flutter, mobile-native-ios, mobile-native-android, mobile-hybrid}. Web projects skip this step
because filter-steps.py resolves mobile-* to the 5 mobile profiles.
⛔ Preflight gate. Before any maestro call:
# 1. Verify wrapper present
WRAPPER="${REPO_ROOT}/.claude/scripts/maestro-mcp.py"
if [ ! -f "$WRAPPER" ]; then
echo "⛔ maestro-mcp.py missing. Run vgflow installer."
exit 1
fi
# 2. Check tool availability per host
PREREQ=$(${PYTHON_BIN} "$WRAPPER" --json check-prereqs)
echo "$PREREQ" | jq . >/dev/null 2>&1 || { echo "$PREREQ"; echo "⛔ prereqs JSON malformed"; exit 1; }
CAN_ANDROID=$(echo "$PREREQ" | ${PYTHON_BIN} -c "import json,sys;print(json.load(sys.stdin)['capabilities']['android_flows'])")
CAN_IOS=$(echo "$PREREQ" | ${PYTHON_BIN} -c "import json,sys;print(json.load(sys.stdin)['capabilities']['ios_flows'])")
HOST_OS=$(echo "$PREREQ" | ${PYTHON_BIN} -c "import json,sys;print(json.load(sys.stdin)['host_os'])")
echo "Mobile discovery prereqs: host=${HOST_OS}, android=${CAN_ANDROID}, ios=${CAN_IOS}"
Platform gating vs target_platforms:
Config mobile.target_platforms is the user's intent (what the app
ships to). Host OS caps what this machine can actually discover on.
Combine:
TARGETS=$(${PYTHON_BIN} -c "
import re,pathlib
t = pathlib.Path('.claude/vg.config.md').read_text(encoding='utf-8')
m = re.search(r'^target_platforms:\s*\[([^\]]*)\]', t, re.MULTILINE)
print(m.group(1) if m else '')")
DISCOVERY_PLATFORMS=()
for plat in $(echo "$TARGETS" | tr ',' ' ' | tr -d '"' | tr -d "'"); do
plat=$(echo "$plat" | xargs)
case "$plat" in
ios)
if [ "$CAN_IOS" = "True" ]; then
DISCOVERY_PLATFORMS+=("ios")
else
echo "⚠ target=ios but host cannot run iOS simulator — skipping iOS discovery"
echo " Use /vg:test --sandbox (cloud EAS) for iOS verification."
fi ;;
android)
if [ "$CAN_ANDROID" = "True" ]; then
DISCOVERY_PLATFORMS+=("android")
else
echo "⚠ target=android but adb/maestro missing — skipping Android discovery"
fi ;;
*)
echo " target '${plat}' not exercised by mobile discovery (web/macos defer to other phases)"
;;
esac
done
if [ ${#DISCOVERY_PLATFORMS[@]} -eq 0 ]; then
echo "⛔ No discoverable platforms on this host. Options:"
echo " (a) Install Android SDK platform-tools + Maestro (universal Linux/Mac/Win)"
echo " (b) Run /vg:review on a macOS host for iOS discovery"
echo " (c) Run /vg:test --sandbox to use cloud provider (skips local discovery)"
exit 1
fi
echo "Will discover on: ${DISCOVERY_PLATFORMS[*]}"
Discovery loop — per (platform × role):
For each platform in $DISCOVERY_PLATFORMS and each role in
config.credentials.{ENV} (same role model as web):
# a) Launch app on the target device (name from config.mobile.devices)
if [ "$PLATFORM" = "ios" ]; then
DEVICE=$(awk '/^\s+ios:/{f=1;next} /^\s+[a-z]+:/{f=0} f && /simulator_name:/{gsub(/["'"'"']/,"");print $2;exit}' .claude/vg.config.md)
elif [ "$PLATFORM" = "android" ]; then
DEVICE=$(awk '/^\s+android:/{f=1;next} /^\s+[a-z]+:/{f=0} f && /emulator_name:/{gsub(/["'"'"']/,"");print $2;exit}' .claude/vg.config.md)
fi
if [ -z "$DEVICE" ]; then
echo "⚠ Device name empty for $PLATFORM in config.mobile.devices — skipping"
continue
fi
BUNDLE_ID=$(node -e "console.log(require('./app.json').expo?.ios?.bundleIdentifier || require('./app.json').expo?.android?.package || '')" 2>/dev/null)
[ -z "$BUNDLE_ID" ] && {
echo "⚠ bundle_id not detectable from app.json — user must provide via MAESTRO_APP_ID env"
BUNDLE_ID="${MAESTRO_APP_ID:-}"
}
${PYTHON_BIN} "$WRAPPER" --json launch-app --bundle-id "$BUNDLE_ID" --device "$DEVICE" > "${PHASE_DIR}/launch-${PLATFORM}.json"
# b) Discovery snapshot per goal from TEST-GOALS.md
for GOAL_ID in $(grep -oE 'G-[0-9]+' "${PHASE_DIR}/TEST-GOALS.md" | sort -u); do
narrate_view_scan "${GOAL_ID}@${PLATFORM}" "" "" "${ROLE}" ""
${PYTHON_BIN} "$WRAPPER" --json discover \
--flow "${GOAL_ID}-${PLATFORM}" \
--device "$DEVICE" \
--out-dir "${PHASE_DIR}/discover" \
> "${PHASE_DIR}/discover/${GOAL_ID}-${PLATFORM}.json"
# Output gets: { artifacts: { screenshot, hierarchy } }
# Pass both to Haiku scanner (see step phase2_haiku_scan_mobile below)
done
Haiku scanner — mobile variant:
The scanner skill (vg-haiku-scanner) accepts either browser snapshot
(web path) or Maestro screenshot+hierarchy (mobile path). When mobile
artifacts are present, skill reads hierarchy.json (Maestro's view
hierarchy export) as element tree instead of DOM snapshot. See
vgflow/skills/vg-haiku-scanner/SKILL.md section "Mobile input mode".
Per goal, spawn a Haiku agent with prompt:
Context:
Goal: {G-XX title + success criteria from TEST-GOALS.md}
Platform: {ios|android}
Screenshot: {PHASE_DIR}/discover/{G-XX}-{PLATFORM}.png
Hierarchy: {PHASE_DIR}/discover/{G-XX}-{PLATFORM}.hierarchy.json
Mode: mobile
Output: scan-{G-XX}-{PLATFORM}.json with findings per same schema as web
(view_found, elements_count, issues[], goal_status).
Bounded parallelism:
Same as web — cap at 5 concurrent Haiku agents to avoid rate-limit. Device concurrency is 1 per physical/simulator instance (maestro holds exclusive connection), so platforms run sequentially per device but multiple devices (iOS sim + Android emu) can run parallel.
Artifact contract (MUST match web schema):
Every mobile scan writes scan-{G-XX}-{PLATFORM}.json identical in
shape to web scan-{G-XX}.json. Downstream steps (phase3_fix_loop,
phase4_goal_comparison, /vg:test codegen) do not branch on profile
at artifact-read level — they read scan-*.json agnostic of source.
This keeps Phase 3/4 code zero-touch in the mobile rollout.
## Phase 2.5: VISUAL INTEGRITY CHECKConfig gate: Read visual_checks from vg.config.md. If visual_checks.enabled != true → skip.
Prereq: Phase 2 must have produced RUNTIME-MAP.json with at least 1 view. Missing → skip.
MCP Server: Reuse same $PLAYWRIGHT_SERVER from Phase 2. Do NOT claim new lock.
VISUAL_ISSUES=()
VISUAL_SCREENSHOTS_DIR="${PHASE_DIR}/visual-checks"
mkdir -p "$VISUAL_SCREENSHOTS_DIR"
For each view in RUNTIME-MAP.json:
browser_evaluate:
JavaScript: |
await document.fonts.ready;
const failed = [...document.fonts].filter(f => f.status !== 'loaded');
return failed.map(f => ({ family: f.family, weight: f.weight, style: f.style, status: f.status }));
browser_evaluate:
JavaScript: |
const overflowed = [];
document.querySelectorAll('*').forEach(el => {
const style = getComputedStyle(el);
if (['scroll','auto'].includes(style.overflowY) || ['scroll','auto'].includes(style.overflowX)) return;
if (style.display === 'none' || style.visibility === 'hidden') return;
const vO = el.scrollHeight > el.clientHeight + 2 && style.overflowY === 'hidden';
const hO = el.scrollWidth > el.clientWidth + 2 && style.overflowX === 'hidden';
if (vO || hO) {
const rect = el.getBoundingClientRect();
if (rect.width === 0 || rect.height === 0) return;
overflowed.push({
selector: el.tagName.toLowerCase() + (el.id ? '#'+el.id : '') +
(el.className && typeof el.className === 'string' ? '.'+el.className.trim().split(/\s+/).join('.') : ''),
type: vO ? 'vertical' : 'horizontal',
rect: { top: rect.top, left: rect.left, width: rect.width, height: rect.height }
});
}
});
return overflowed;
browser_resize: { width: viewport_width, height: 900 }
browser_evaluate: "await new Promise(r => setTimeout(r, 500)); return null;"
browser_take_screenshot: { path: "${VISUAL_SCREENSHOTS_DIR}/${view_slug}-${viewport_width}w.png" }
browser_evaluate:
JavaScript: |
return {
hasHorizontalScroll: document.body.scrollWidth > window.innerWidth,
clippedElements: [...document.querySelectorAll('*')]
.filter(el => { const r = el.getBoundingClientRect(); return r.right > window.innerWidth + 5 && r.width > 0 && r.height > 0; })
.slice(0, 10)
.map(el => ({ selector: el.tagName + (el.id ? '#'+el.id : ''), overflow: Math.round(el.getBoundingClientRect().right - window.innerWidth) }))
};
After all viewports: browser_resize: { width: 1920, height: 900 }
For each modal: trigger open → check topmost via document.elementFromPoint at center + corners → screenshot → close.
[{"view":"...","check_type":"font_load_failure","severity":"MAJOR","element":"Inter","viewport":null}]
Issues feed into Phase 3 fix loop: MAJOR = priority fix, MINOR = logged.
Phase 2.5 Visual Integrity:
Views: {N}, Font: {pass}/{total}, Overflow: {pass}/{total}
Responsive: {viewports} x {views} ({issues} issues)
Z-index: {modals} modals ({issues} issues)
MAJOR: {N} → Phase 3 fix loop | MINOR: {N} → logged
After the legacy visual checks (font/overflow/responsive/z-index), run the
Phase 15 visual-fidelity gates. Threshold comes from .fidelity-profile.lock
written by /vg:blueprint step 2_fidelity_profile_lock (D-08).
6a. D-12c — UI flag presence (cheap precondition, runs first):
if [ -x "${REPO_ROOT}/.claude/scripts/validators/verify-phase-ui-flag.py" ]; then
${PYTHON_BIN} "${REPO_ROOT}/.claude/scripts/validators/verify-phase-ui-flag.py" \
--phase "${PHASE_NUMBER}" > "${VG_TMP}/ui-flag.json" 2>&1
UIF=$(${PYTHON_BIN} -c "import json,sys; print(json.load(open('${VG_TMP}/ui-flag.json')).get('verdict','SKIP'))" 2>/dev/null)
case "$UIF" in
PASS|WARN) echo "✓ D-12c UI-flag check: $UIF" ;;
BLOCK) echo "⛔ D-12c UI-flag check: BLOCK — phase declared UI work but UI-MAP.md/design assets missing" >&2; exit 1 ;;
*) echo "ℹ D-12c UI-flag check: $UIF — phase has no UI declaration" ;;
esac
fi
6b. D-12b — Wave-scoped structural drift (per wave that has owned UI subtree):
if [ -f "${PHASE_DIR}/UI-MAP.md" ] \
&& [ -x "${REPO_ROOT}/.claude/scripts/verify-ui-structure.py" ]; then
THRESHOLD=$(${PYTHON_BIN} "${REPO_ROOT}/.claude/scripts/lib/threshold-resolver.py" \
--phase "${PHASE_NUMBER}" 2>/dev/null || echo "0.85")
# Discover waves with owned subtrees by inspecting planner UI-MAP for owner_wave_id values.
WAVES=$(${PYTHON_BIN} -c "
import json, re
text = open('${PHASE_DIR}/UI-MAP.md', encoding='utf-8').read()
m = re.search(r'\`\`\`json\s*\n([\s\S]*?)\n\`\`\`', text)
if not m:
raise SystemExit
data = json.loads(m.group(1))
seen = set()
def walk(n):
if isinstance(n, dict):
if n.get('owner_wave_id'):
seen.add(n['owner_wave_id'])
for c in n.get('children', []) or []:
walk(c)
walk(data.get('root', data))
print(' '.join(sorted(seen)))
" 2>/dev/null)
WAVE_BLOCK=0
for WAVE_ID in $WAVES; do
${PYTHON_BIN} "${REPO_ROOT}/.claude/scripts/verify-ui-structure.py" \
--phase "${PHASE_NUMBER}" \
--scope "owner-wave-id=${WAVE_ID}" \
--threshold "${THRESHOLD}" \
> "${VG_TMP}/drift-${WAVE_ID}.json" 2>&1 || true
V=$(${PYTHON_BIN} -c "import json,sys; print(json.load(open('${VG_TMP}/drift-${WAVE_ID}.json')).get('verdict','SKIP'))" 2>/dev/null)
case "$V" in
PASS|WARN) echo "✓ D-12b drift ${WAVE_ID}: $V (threshold=${THRESHOLD})" ;;
BLOCK)
echo "⛔ D-12b drift ${WAVE_ID}: BLOCK — see ${VG_TMP}/drift-${WAVE_ID}.json" >&2
WAVE_BLOCK=1
;;
*) echo "ℹ D-12b drift ${WAVE_ID}: $V" ;;
esac
done
if [ "$WAVE_BLOCK" = "1" ] && [[ ! "$ARGUMENTS" =~ --allow-wave-drift ]]; then
echo " Override: --allow-wave-drift (logs override-debt as kind=wave-drift-relaxed)" >&2
exit 1
fi
fi
6c. D-12e — Holistic phase-wide drift (runs once at phase end):
if [ -x "${REPO_ROOT}/.claude/scripts/verify-holistic-drift.py" ] \
&& [ -f "${PHASE_DIR}/UI-MAP.md" ]; then
${PYTHON_BIN} "${REPO_ROOT}/.claude/scripts/verify-holistic-drift.py" \
--phase "${PHASE_NUMBER}" \
> "${VG_TMP}/holistic-drift.json" 2>&1 || true
HV=$(${PYTHON_BIN} -c "import json,sys; print(json.load(open('${VG_TMP}/holistic-drift.json')).get('verdict','SKIP'))" 2>/dev/null)
case "$HV" in
PASS|WARN) echo "✓ D-12e holistic drift: $HV" ;;
BLOCK)
echo "⛔ D-12e holistic drift: BLOCK — see ${VG_TMP}/holistic-drift.json" >&2
echo " Wave gates passed but phase-wide composition drifted (e.g., layout fight between waves)." >&2
echo " Override: --allow-holistic-drift" >&2
if [[ ! "$ARGUMENTS" =~ --allow-holistic-drift ]]; then exit 1; fi
;;
*) echo "ℹ D-12e holistic drift: $HV" ;;
esac
fi
6e. L4 — Design-fidelity SSIM gate (NEW, 2026-04-28):
Final safety net for the 4-layer pixel pipeline. L1 (executor reads PNG) +
L2 (LAYOUT-FINGERPRINT) + L3 (build-time render vs baseline) all run
during /vg:build. This gate runs during /vg:review using the live browser
session — if any of the upstream layers were skipped or overridden, this
catches the drift before it leaves the phase. Severity = BLOCK by
design; override --allow-design-drift consumes a rationalization-guard
slot and logs override-debt.
DF_THRESHOLD="$(vg_config_get visual_checks.design_fidelity_threshold_pct 5.0 2>/dev/null || echo 5.0)"
if [ -f "${PHASE_DIR}/RUNTIME-MAP.json" ]; then
DF_PAIRS=$(PYTHONPATH="${REPO_ROOT}/.claude/scripts/lib:${REPO_ROOT}/scripts/lib:${PYTHONPATH:-}" ${PYTHON_BIN} - "${PHASE_DIR}/RUNTIME-MAP.json" "${PHASE_DIR}" "${REPO_ROOT}" "${REPO_ROOT}/.claude/vg.config.md" <<'PY'
import json, sys
from pathlib import Path
from design_ref_resolver import first_screenshot, parse_config_file, resolve_design_assets
rt = json.load(open(sys.argv[1], encoding="utf-8"))
phase_dir = Path(sys.argv[2])
repo_root = Path(sys.argv[3])
config = parse_config_file(Path(sys.argv[4]))
views = rt.get("views") or rt.get("routes") or []
for v in views:
slug = v.get("design_ref") or v.get("design_slug") or v.get("slug")
if not slug:
continue
png = first_screenshot(resolve_design_assets(slug, repo_root=repo_root, phase_dir=phase_dir, config=config))
if not png:
continue
label = v.get("label") or v.get("path") or v.get("url") or slug
url = v.get("url") or v.get("path") or "/"
print(f"{slug}\t{url}\t{png}\t{label}")
PY
)
DF_ISSUES=()
DF_CHECKS=0
if [ -n "$DF_PAIRS" ]; then
mkdir -p "${PHASE_DIR}/visual-fidelity" 2>/dev/null
while IFS=$'\t' read -r DF_SLUG DF_URL DF_BASELINE DF_LABEL; do
[ -z "$DF_SLUG" ] && continue
DF_CHECKS=$((DF_CHECKS + 1))
DF_CURRENT="${PHASE_DIR}/visual-fidelity/${DF_SLUG}.current.png"
DF_DIFF="${PHASE_DIR}/visual-fidelity/${DF_SLUG}.diff.png"
# Reuse the Phase 2 browser session — already navigated + logged in.
# MCP step (orchestrator runs in-context):
# browser_navigate { url: $DF_URL }
# browser_evaluate "await new Promise(r => setTimeout(r, 500))"
# browser_take_screenshot { path: $DF_CURRENT }
# If an MCP step is unavailable, the diff falls back to SKIP and the
# next phase 2.5 sweep will pick the slug up.
if [ ! -f "$DF_CURRENT" ]; then
echo "ℹ L4 fidelity ${DF_SLUG}: SKIP — current screenshot not produced (MCP browser step missing)"
continue
fi
DF_PCT=$(${PYTHON_BIN} - "$DF_CURRENT" "$DF_BASELINE" "$DF_DIFF" <<'PY'
import sys
try:
from PIL import Image
from pixelmatch.contrib.PIL import pixelmatch
except ImportError:
print("-1")
sys.exit(0)
a = Image.open(sys.argv[1]).convert("RGBA")
b = Image.open(sys.argv[2]).convert("RGBA")
if a.size != b.size:
b = b.resize(a.size)
diff = Image.new("RGBA", a.size)
mismatch = pixelmatch(a, b, diff, threshold=0.1)
total = a.size[0] * a.size[1]
pct = (mismatch / total) * 100 if total else 0
diff.save(sys.argv[3])
print(f"{pct:.3f}")
PY
)
if [ "$DF_PCT" = "-1" ]; then
echo "ℹ L4 fidelity ${DF_SLUG}: SKIP — pixelmatch+PIL not installed"
continue
fi
DF_VERDICT=$(${PYTHON_BIN} -c "import sys; print('PASS' if float(sys.argv[1]) <= float(sys.argv[2]) else 'BLOCK')" "$DF_PCT" "$DF_THRESHOLD")
cat > "${PHASE_DIR}/visual-fidelity/${DF_SLUG}.json" <<JSON
{"slug":"${DF_SLUG}","url":"${DF_URL}","label":"${DF_LABEL}","diff_pct":${DF_PCT},"threshold_pct":${DF_THRESHOLD},"verdict":"${DF_VERDICT}","current":"${DF_CURRENT}","baseline":"${DF_BASELINE}","diff":"${DF_DIFF}"}
JSON
if [ "$DF_VERDICT" = "BLOCK" ]; then
DF_ISSUES+=("${DF_SLUG} (${DF_PCT}% > ${DF_THRESHOLD}%)")
echo "⛔ L4 fidelity ${DF_SLUG}: ${DF_PCT}% drift > ${DF_THRESHOLD}% → see ${DF_DIFF}"
else
echo "✓ L4 fidelity ${DF_SLUG}: ${DF_PCT}% (≤ ${DF_THRESHOLD}%)"
fi
done <<< "$DF_PAIRS"
fi
if [ ${#DF_ISSUES[@]} -gt 0 ]; then
echo "⛔ L4 design-fidelity gate: ${#DF_ISSUES[@]} view(s) drift past ${DF_THRESHOLD}%:"
for i in "${DF_ISSUES[@]}"; do echo " - $i"; done
echo " Diffs: ${PHASE_DIR}/visual-fidelity/*.diff.png"
echo " Override: --allow-design-drift (rationalization-guard + override-debt)"
if type -t emit_telemetry_v2 >/dev/null 2>&1; then
emit_telemetry_v2 "review_l4_fidelity" "${PHASE_NUMBER}" "review.phase2_5" \
"design_fidelity" "BLOCK" "{\"count\":${#DF_ISSUES[@]},\"threshold\":${DF_THRESHOLD}}"
fi
if [[ ! "$ARGUMENTS" =~ --allow-design-drift ]]; then exit 1; fi
echo "⚠ --allow-design-drift set — drift accepted; override-debt logged."
elif [ "${DF_CHECKS:-0}" -gt 0 ]; then
echo "✓ L4 design-fidelity gate: ${DF_CHECKS} view(s) within ${DF_THRESHOLD}% of baseline"
fi
fi
Final action: (type -t mark_step >/dev/null 2>&1 && mark_step "${PHASE_NUMBER:-unknown}" "phase2_5_visual_checks" "${PHASE_DIR}") || touch "${PHASE_DIR}/.step-markers/phase2_5_visual_checks.done"
Config gate:
Read visual_checks.enabled from vg.config.md. If not true → skip with message
and jump to Phase 3.
Prereq: phase2_mobile_discovery produced screenshots in ${PHASE_DIR}/discover/.
Missing → skip + warn: "No mobile discovery artifacts — visual checks require device snapshot first."
Why this step differs from web: mobile devices have fixed viewports per
model (an iPhone 15 Pro IS its viewport). There's no browser resize loop.
Instead we capture multi-device if user listed multiple emulators/simulators
in config.mobile.devices.<plat>[], or re-check the already-captured
screenshots against per-platform sanity rules.
VISUAL_ISSUES=()
VIS_DIR="${PHASE_DIR}/visual-checks"
mkdir -p "$VIS_DIR"
WRAPPER="${REPO_ROOT}/.claude/scripts/maestro-mcp.py"
Parse each ${PHASE_DIR}/discover/*.hierarchy.json. For every text node
with non-empty text, verify corresponding element has frame.height > 0
(i.e. rendered, not invisible font). Missing → MINOR (font not loaded or
style override hiding text).
for HIER in "${PHASE_DIR}"/discover/*.hierarchy.json; do
[ -f "$HIER" ] || continue
MISSING=$(${PYTHON_BIN} - "$HIER" <<'PY'
import json, sys
from pathlib import Path
h = json.loads(Path(sys.argv[1]).read_text(encoding='utf-8'))
def walk(node, out):
if isinstance(node, dict):
text = (node.get('text') or node.get('attributes', {}).get('text') or '').strip()
frame = node.get('frame') or node.get('bounds') or {}
hgt = frame.get('height') if isinstance(frame, dict) else None
if text and isinstance(hgt, (int, float)) and hgt <= 0:
out.append({'text': text[:40], 'height': hgt})
for c in (node.get('children') or []):
walk(c, out)
elif isinstance(node, list):
for c in node:
walk(c, out)
out = []
walk(h, out)
print(json.dumps(out))
PY
)
echo "$MISSING" > "$VIS_DIR/font-missing-$(basename "$HIER" .hierarchy.json).json"
done
Severity: any text-with-zero-height = MINOR (log in VISUAL_ISSUES).
Parse frame coordinates. For each element with frame.y + frame.height > device_height
or frame.x + frame.width > device_width, flag as MAJOR if it's in main
content area, MINOR if near navigation bar.
Device dimensions come from screenshot metadata (PIL image size) — no hardcoded per-device map needed.
If config.mobile.devices.android.emulator_name lists N values (as array
rather than single string), capture a screenshot on each and compare:
... or ellipsis heuristic)?Single-device setups skip this check.
If any hierarchy shows a node with role=Modal or accessibilityTrait=modal,
verify its frame covers the center of the screen AND no sibling has higher
z-order. Maestro hierarchy exposes sibling order via array index; elements
later in array are on top.
cat > "${PHASE_DIR}/visual-issues.json" <<EOF
{
"platform_coverage": $(ls "${PHASE_DIR}"/discover/*.hierarchy.json 2>/dev/null | wc -l),
"issues": [ /* MINOR/MAJOR items collected */ ],
"summary": {"major": N, "minor": N}
}
EOF
MAJOR → handled in Phase 3 fix loop. MINOR → logged only.
Final action: (type -t mark_step >/dev/null 2>&1 && mark_step "${PHASE_NUMBER:-unknown}" "phase2_5_mobile_visual_checks" "${PHASE_DIR}") || touch "${PHASE_DIR}/.step-markers/phase2_5_mobile_visual_checks.done"
→ narrate_phase "Phase 2.7 — URL state sync" "Kiểm tra interactive_controls trong TEST-GOALS"
Purpose: validate every list/table/grid view goal in TEST-GOALS.md
declares interactive_controls block (filter/sort/pagination/search +
URL sync assertion). This is the static-side complement to runtime
browser probing — declaration must exist before runtime can verify.
CRUD surface precheck (v2.12): before URL-state checks, validate
${PHASE_DIR}/CRUD-SURFACES.md. Review compares runtime observations against
the resource/platform contract first, then uses interactive_controls as the
web-list extension pack. Missing CRUD contract means the reviewer has no
authoritative list of expected headings, filters, columns, states, row actions,
delete confirmations, or security/abuse expectations.
CRUD_FLAGS=""
[[ "${ARGUMENTS:-}" =~ --allow-no-crud-surface ]] && CRUD_FLAGS="--allow-missing"
CRUD_VAL="${REPO_ROOT}/.claude/scripts/validators/verify-crud-surface-contract.py"
if [ -x "$CRUD_VAL" ]; then
mkdir -p "${PHASE_DIR}/.tmp"
"${PYTHON_BIN:-python3}" "$CRUD_VAL" --phase "${PHASE_NUMBER}" \
--config "${REPO_ROOT}/.claude/vg.config.md" ${CRUD_FLAGS} \
> "${PHASE_DIR}/.tmp/crud-surface-review.json" 2>&1
CRUD_RC=$?
if [ "$CRUD_RC" != "0" ]; then
echo "⛔ CRUD surface contract missing/incomplete — see ${PHASE_DIR}/.tmp/crud-surface-review.json"
echo " Fix blueprint artifact CRUD-SURFACES.md or rerun /vg:blueprint."
exit 2
fi
fi
Why: modern dashboard UX baseline (executor R7) requires list view state synced to URL search params. Without declaration, AI executors build local-state-only filters and ship apps that lose state on refresh. This validator catches the gap at /vg:review time, before user sees it.
Severity: config-driven via vg.config.md → ui_state_conventions.severity_phase_cutover
(default 14). Phase number < cutover → WARN (grandfather). Phase ≥ cutover
→ BLOCK (mandatory). Override with --allow-no-url-sync to log soft OD
debt entry.
PYTHON_BIN="${PYTHON_BIN:-python3}"
"${PYTHON_BIN}" .claude/scripts/validators/verify-url-state-sync.py \
--phase "${PHASE_NUMBER}" \
--enforce-required-lenses \
> "${PHASE_DIR}/.tmp/url-state-sync.json" 2>&1
URL_SYNC_RC=$?
if [ "${URL_SYNC_RC}" != "0" ]; then
if [[ "${RUN_ARGS:-}" == *"--allow-no-url-sync"* ]]; then
"${PYTHON_BIN}" .claude/scripts/vg-orchestrator override \
--flag skip-url-state-sync \
--reason "URL state sync waived for ${PHASE_NUMBER} via --allow-no-url-sync (soft debt logged)"
echo "⚠ URL state sync gate waived via --allow-no-url-sync"
else
echo "⛔ URL state sync declarations missing — see ${PHASE_DIR}/.tmp/url-state-sync.json"
cat "${PHASE_DIR}/.tmp/url-state-sync.json"
DIAG_SCRIPT="${REPO_ROOT}/.claude/scripts/review-block-diagnostic.py"
if [ -f "$DIAG_SCRIPT" ]; then
"${PYTHON_BIN:-python3}" "$DIAG_SCRIPT" \
--gate-id "review.url_state_sync" \
--phase-dir "$PHASE_DIR" \
--input "${PHASE_DIR}/.tmp/url-state-sync.json" \
--out-md "${PHASE_DIR}/.tmp/url-state-sync-diagnostic.md" \
>/dev/null 2>&1 || true
cat "${PHASE_DIR}/.tmp/url-state-sync-diagnostic.md" 2>/dev/null || true
fi
echo ""
echo "Fix options:"
echo " 1. Add interactive_controls blocks to TEST-GOALS.md per goal."
echo " Schema: .claude/commands/vg/_shared/templates/TEST-GOAL-enriched-template.md (Phase J section)."
echo " 2. If state is genuinely local-only, declare url_sync: false + url_sync_waive_reason."
echo " 3. Override (last resort): re-run with --allow-no-url-sync (logs soft OD debt)."
exit 2
fi
fi
Future runtime probe (deferred to v2.9): once RUNTIME-MAP.json is populated by phase 2 browser discovery, a follow-up validator can click each declared control via MCP Playwright + snapshot URL pre/post + assert reload-survives. Static declaration check is the foundation that makes runtime probe meaningful.
Final action: (type -t mark_step >/dev/null 2>&1 && mark_step "${PHASE_NUMBER:-unknown}" "phase2_7_url_state_sync" "${PHASE_DIR}") || touch "${PHASE_DIR}/.step-markers/phase2_7_url_state_sync.done"
→ narrate_phase "Phase 2.8 — URL state runtime probe" "Click từng control + snapshot URL để verify declaration vs implementation"
Purpose: verify that the static interactive_controls declarations
(checked at phase 2.7) match actual application behaviour. AI drives MCP
Playwright through every declared control, captures URL params before/after
each interaction, writes the result to
${PHASE_DIR}/url-runtime-probe.json. Validator reads that artifact and
flags coverage gaps (WARN) or declaration drift (BLOCK).
Why: static declarations close ~50% of URL-state bugs; runtime probe
catches the remaining drift class — declaration says ?status=... but
the route handler ships ?state=..., or the filter pretends to sync but
no pushState actually fires.
Skip conditions:
interactive_controls.url_sync: true → skip silently.${RUN_ARGS} contains --skip-runtime → run validator with the same flag (logs OD debt).--skip-runtime.For every goal in ${PHASE_DIR}/TEST-GOALS.md that declares
interactive_controls.url_sync: true:
${PHASE_DIR}/RUNTIME-MAP.json (key
matching the goal id) or, when the goal frontmatter carries an explicit
route: field, prefer that.goal.actor (default admin) using the standard
review-phase auth helper.interactive_controls:
values[0], click the filter
control, snapshot URL, then prove visible rows and/or network response
match the selected value. Example: status=pending must not show flagged,
approved, rejected, or failed rows unless the contract explicitly says
flagged is an orthogonal boolean.infinite-scroll),
snapshot URL, then prove the result window changed without duplicated
first-page rows.debounce_ms + 100ms,
snapshot URL, then prove returned rows contain/match the query.${PHASE_DIR}/CRUD-SURFACES.md
platforms.web.list: heading/description presence, declared table columns,
row actions, empty/loading/error/unauthorized states where reachable, and
delete confirmation if a delete action is declared.url-runtime-probe.json.Artifact schema (${PHASE_DIR}/url-runtime-probe.json):
{
"generated_at": "2026-04-26T10:30:00Z",
"goals": [
{
"goal_id": "G-01",
"url": "/admin/campaigns",
"controls": [
{
"kind": "filter",
"name": "status",
"value": "active",
"url_before": "https://app.local:5173/admin/campaigns",
"url_after": "https://app.local:5173/admin/campaigns?status=active",
"url_params_after": {"status": "active"},
"result_semantics": {
"passed": true,
"rows_checked": 20,
"violations": []
}
}
]
}
]
}
kind is one of filter | sort | pagination | search. name matches the
declared control name (or normalised — page for pagination, search for
search, sort for sort). url_params_after is the parsed search-param
dict. For filters, result_semantics is mandatory; URL-only success is not
enough because it misses the class where a Pending tab still renders Flagged
records.
PYTHON_BIN="${PYTHON_BIN:-python3}"
EXTRA_FLAGS=""
if [[ "${RUN_ARGS:-}" == *"--skip-runtime"* ]] || [[ -z "${VG_BROWSER_AVAILABLE:-1}" ]]; then
EXTRA_FLAGS="--skip-runtime"
fi
"${PYTHON_BIN}" .claude/scripts/validators/verify-url-state-runtime.py \
--phase "${PHASE_NUMBER}" ${EXTRA_FLAGS} \
> "${PHASE_DIR}/.tmp/url-state-runtime.json" 2>&1
URL_RUNTIME_RC=$?
if [ "${URL_RUNTIME_RC}" != "0" ]; then
if [[ "${RUN_ARGS:-}" == *"--allow-runtime-drift"* ]]; then
"${PYTHON_BIN}" .claude/scripts/vg-orchestrator override \
--flag skip-url-state-runtime \
--reason "URL state runtime drift waived for ${PHASE_NUMBER} via --allow-runtime-drift (soft debt logged)"
echo "⚠ URL state runtime drift waived via --allow-runtime-drift"
else
echo "⛔ URL state runtime drift detected — see ${PHASE_DIR}/.tmp/url-state-runtime.json"
cat "${PHASE_DIR}/.tmp/url-state-runtime.json"
DIAG_SCRIPT="${REPO_ROOT}/.claude/scripts/review-block-diagnostic.py"
if [ -f "$DIAG_SCRIPT" ]; then
"${PYTHON_BIN:-python3}" "$DIAG_SCRIPT" \
--gate-id "review.url_state_runtime" \
--phase-dir "$PHASE_DIR" \
--input "${PHASE_DIR}/.tmp/url-state-runtime.json" \
--out-md "${PHASE_DIR}/.tmp/url-state-runtime-diagnostic.md" \
>/dev/null 2>&1 || true
cat "${PHASE_DIR}/.tmp/url-state-runtime-diagnostic.md" 2>/dev/null || true
fi
echo ""
echo "Fix options:"
echo " 1. Implementation drift — fix the route handler / UI so declared url_param actually appears in URL after interaction."
echo " 2. Declaration drift — declared url_param is wrong; update TEST-GOALS.md interactive_controls block."
echo " 3. Override (last resort): re-run with --allow-runtime-drift (logs soft OD debt)."
exit 2
fi
fi
Final action: (type -t mark_step >/dev/null 2>&1 && mark_step "${PHASE_NUMBER:-unknown}" "phase2_8_url_state_runtime" "${PHASE_DIR}") || touch "${PHASE_DIR}/.step-markers/phase2_8_url_state_runtime.done"
→ narrate_phase "Phase 2.9 — API error-message runtime lens" "Trigger API error paths and prove toast/form errors show API body messages, not HTTP transport text"
Purpose: catch the P3.2 class of bug where the backend returns a useful
domain/validation message but the frontend toast shows Request failed with status 403, statusText, or another generic transport message.
This is a plugin/lens inside review, not a second full browser discovery pass. Reuse the authenticated browser session and routes already discovered by Phase 2. For each API+UI mutation or protected action that can safely fail, drive one negative path and record API body + visible UI message.
For API+UI phases:
${PHASE_DIR}/INTERFACE-STANDARDS.md, ${PHASE_DIR}/API-DOCS.md,
${PHASE_DIR}/API-CONTRACTS.md, and ${PHASE_DIR}/RUNTIME-MAP.json.error.user_message -> error.message -> message -> network_fallback.${PHASE_DIR}/error-message-probe.json.Artifact schema:
{
"generated_at": "2026-05-02T10:30:00Z",
"checks": [
{
"goal_id": "G-01",
"route": "/admin/billing/topup-queue",
"action": "submit invalid filter or mutation",
"request": {"method": "POST", "path": "/api/example"},
"status": 400,
"api_error": {
"code": "VALIDATION_ERROR",
"message": "Amount is required",
"user_message": "Amount is required"
},
"api_user_message": "Amount is required",
"visible_message": "Amount is required",
"passed": true
}
]
}
If a phase has API contracts and UI goals but no reachable negative path, write
the artifact with checks: [] plus blocked_reason, then run the diagnostic.
Do not silently skip.
PYTHON_BIN="${PYTHON_BIN:-python3}"
mkdir -p "${PHASE_DIR}/.tmp" 2>/dev/null
"${PYTHON_BIN}" .claude/scripts/validators/verify-error-message-runtime.py \
--phase "${PHASE_NUMBER}" \
> "${PHASE_DIR}/.tmp/error-message-runtime.json" 2>&1
ERROR_MESSAGE_RC=$?
if [ "${ERROR_MESSAGE_RC}" != "0" ]; then
echo "⛔ API error-message runtime lens failed — see ${PHASE_DIR}/.tmp/error-message-runtime.json"
cat "${PHASE_DIR}/.tmp/error-message-runtime.json"
DIAG_SCRIPT="${REPO_ROOT}/.claude/scripts/review-block-diagnostic.py"
if [ -f "$DIAG_SCRIPT" ]; then
"${PYTHON_BIN:-python3}" "$DIAG_SCRIPT" \
--gate-id "review.error_message_runtime" \
--phase-dir "$PHASE_DIR" \
--input "${PHASE_DIR}/.tmp/error-message-runtime.json" \
--out-md "${PHASE_DIR}/.tmp/error-message-runtime-diagnostic.md" \
>/dev/null 2>&1 || true
cat "${PHASE_DIR}/.tmp/error-message-runtime-diagnostic.md" 2>/dev/null || true
fi
echo ""
echo "Fix options:"
echo " 1. Backend drift — return the standard API error envelope from INTERFACE-STANDARDS.md."
echo " 2. Frontend drift — use shared error adapter: error.user_message || error.message, never statusText/AxiosError.message."
echo " 3. Probe gap — rerun full review with a safe negative path and write error-message-probe.json."
exit 2
fi
(type -t mark_step >/dev/null 2>&1 && mark_step "${PHASE_NUMBER:-unknown}" "phase2_9_error_message_runtime" "${PHASE_DIR}") || touch "${PHASE_DIR}/.step-markers/phase2_9_error_message_runtime.done"
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator mark-step review phase2_9_error_message_runtime 2>/dev/null || true
## Phase 3: FIX LOOP (max 3 iterations)
→ narrate_phase "Phase 3 — Fix loop (iteration ${I}/3)" "Sửa bug MINOR, escalate MODERATE/MAJOR"
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator step-active phase3_fix_loop >/dev/null 2>&1 || true
If no errors found in Phase 2 → skip to Phase 4. If --fix-only → load RUNTIME-MAP, find errors, fix them.
Collect errors from ALL sources:
errors[] array + per-view issues[] + failed goal_sequences + free_exploration issues${PHASE_DIR}/REVIEW-FEEDBACK.md (if exists — written by /vg:test when MODERATE/MAJOR issues found):
Parse issues table → add to error list with severity from test classification
These are issues test couldn't fix — review MUST address them in this fix loop${PLANNING_DIR}/KNOWN-ISSUES.json: issues matching current phase/views (already loaded at init)For each error:
${PLANNING_DIR}/KNOWN-ISSUES.json (see below)When ≥3 SPEC_GAP errors accumulate, or any critical-priority goal maps to SPEC_GAP, emit ${PHASE_DIR}/SPEC-GAPS.md and surface to user with a concrete re-plan command:
# Spec Gaps — Phase {phase}
Detected during /vg:review phase 3b. Listed issues trace to missing CONTEXT decisions or un-tasked PLAN items — not code bugs. Review cannot fix these; blueprint must re-plan.
## Gaps
| # | Observed Issue | Related Goal | Likely Missing | Source Evidence |
|---|----------------|--------------|----------------|-----------------|
| 1 | Site delete has no confirmation modal | G-08 (delete site) | D-XX: "delete requires confirmation" decision | screenshot {phase}-sites-delete-error.png |
| 2 | Bulk import UI absent | G-12 (bulk import) | Task for CSV upload handler + FE form | grep "bulk" in code returns 0 matches |
...
## Recommended action
This is NOT a code bug. Re-run blueprint in patch mode to append tasks covering these gaps:
/vg:blueprint {phase} --from=2a
This spawns planner with the gap list as input. Existing tasks preserved; missing ones appended. Then re-run build → review.
Do NOT attempt to fix these in the review fix loop — the fix loop targets code bugs, not missing scope.
Threshold + auto-suggestion:
SPEC_GAP_COUNT=$(count of SPEC_GAP-classified errors)
CRITICAL_SPEC_GAPS=$(count where related goal is priority:critical)
if [ $SPEC_GAP_COUNT -ge 3 ] || [ $CRITICAL_SPEC_GAPS -ge 1 ]; then
echo "⚠ ${SPEC_GAP_COUNT} spec gaps detected (${CRITICAL_SPEC_GAPS} critical)."
echo "See: ${PHASE_DIR}/SPEC-GAPS.md"
echo ""
echo "This is a planning gap, not a code bug. Recommended:"
echo " /vg:blueprint ${PHASE} --from=2a (re-plan with gap feedback)"
echo ""
echo "Review fix loop will continue for code bugs only; spec gaps stay open until blueprint re-run."
fi
Do NOT block review — let fix loop handle code bugs. Just surface spec gaps with the right next command.
Shared file across all phases: ${PLANNING_DIR}/KNOWN-ISSUES.json
Read existing KNOWN-ISSUES.json (create if missing)
For each PRE-EXISTING error:
Check if already recorded (match by view + description)
IF new → append:
{
"id": "KI-{auto_increment}",
"found_in_phase": "{current phase}",
"view": "{view_path where observed}",
"description": "{what's wrong}",
"evidence": { "network": [...], "console_errors": [...], "screenshot": "..." },
"affects_views": ["{list of views where this issue appears}"],
"suggested_phase": "{phase that owns this area — AI infers from code_patterns}",
"severity": "low|medium|high",
"status": "open"
}
Write back KNOWN-ISSUES.json
Future phases auto-consume: At the start of every review (Phase 2, before discovery), read KNOWN-ISSUES.json → filter issues where suggested_phase matches current phase OR affects_views overlaps with views being reviewed → display to AI as "known issues to verify/fix in this phase".
🎯 3-tier fix routing (tightened 2026-04-17 — cost + context isolation):
Sau khi bug classified ở 3a/3b (MINOR/MODERATE/MAJOR + size metadata), route tới model phù hợp theo config. Main model KHÔNG tự fix mọi thứ — MODERATE phải spawn để isolate context và save main-model tokens.
Config (pure user-side, workflow không giả định model vendor/tier):
# vg.config.md
models:
# Existing keys: planner, executor, debugger
review_fix_inline: <model-id> # model cho MINOR inline (thường = main/planner tier)
review_fix_spawn: <model-id> # model cheaper cho MODERATE + MINOR-big-scope
review:
fix_routing:
minor:
inline_when:
max_files: <int>
max_loc_estimate: <int>
else: "spawn" # route to models.review_fix_spawn
moderate:
action: "spawn" # always route to models.review_fix_spawn
parallel: <bool>
max_concurrent: <int>
major:
action: "escalate" # REVIEW-FEEDBACK.md, không auto-fix
tripwire:
minor_bloat_loc: <int>
action: "warn|rollback"
Workflow CHỈ đọc model id từ config.models.review_fix_inline / review_fix_spawn. Không hardcode tên vendor (Claude/GPT/Gemini), tier (Opus/Sonnet/Haiku, o3/gpt-4o), hay capability.
Thiếu config → fallback: inline = main model hiện tại, spawn = cùng model (degraded — không có cost optimization nhưng vẫn có context isolation).
Algorithm per CODE BUG:
1. Load severity từ error classification (step 3b)
2. Estimate fix scope trước khi fix:
- files_to_touch = heuristic từ error location + related callers
- loc_estimate = peek file around error line, count context
3. Route theo severity:
MINOR + small scope → inline (fast path, main model):
If severity == MINOR AND files <= config.review.fix_routing.minor.inline_when.max_files
AND loc_estimate <= config.review.fix_routing.minor.inline_when.max_loc_estimate:
Main model reads file + edits inline (current behavior)
narrate_fix "[inline] MINOR ${bug_title} (${files} files, ~${loc} LOC)"
MINOR big scope OR MODERATE → spawn (config-driven model):
SPAWN_MODEL="${config.models.review_fix_spawn:-${config.models.executor}}"
Agent(
model="$SPAWN_MODEL",
description="[fix ${idx}/${total}] ${severity} ${file}:${line} — ${bug_type}"
):
prompt = """
Fix this reviewed bug. Focused scope — no tangent changes.
## BUG
Severity: ${severity}
Observed: ${error_description}
Expected: ${expected_behavior}
View: ${view_url}
File hint: ${suspected_file}
Evidence: ${console_errors}, ${network_failures}, ${screenshot}
## CONSTRAINTS
- Touch only files related to this bug
- No refactor/rename unless required for fix
- Write test if missing (project convention)
- Commit: fix(${phase}): ${short description}
- Per CONTEXT.md D-XX OR Covers goal: G-XX in commit body
## RETURN
- Files changed (list)
- LOC delta
- One-line summary
"""
narrate_fix "[spawn:sonnet] ${severity} ${bug_title}"
MAJOR → escalate (no auto-fix):
Append to REVIEW-FEEDBACK.md:
| bug_id | view | severity | description | why_escalated |
narrate_fix "[escalated] MAJOR ${bug_title} → REVIEW-FEEDBACK.md"
Parallel spawning:
Nếu config.review.fix_routing.moderate.parallel: true và có >1 MODERATE bugs độc lập (no shared files):
config.review.fix_routing.moderate.max_concurrent at oncePost-fix tripwire (catch misclassification):
TRIPWIRE_LOC="${config.review.fix_routing.tripwire.minor_bloat_loc:-0}"
TRIPWIRE_ACTION="${config.review.fix_routing.tripwire.action:-warn}"
if [ "$TRIPWIRE_LOC" -gt 0 ]; then
# Check each MINOR-routed-inline fix
for commit in $MINOR_INLINE_COMMITS; do
ACTUAL_LOC=$(git show --stat "$commit" | tail -1 | grep -oE '[0-9]+ insertion' | grep -oE '^[0-9]+')
if [ "${ACTUAL_LOC:-0}" -gt "$TRIPWIRE_LOC" ]; then
case "$TRIPWIRE_ACTION" in
rollback)
echo "⛔ MINOR inline fix bloated ($ACTUAL_LOC > $TRIPWIRE_LOC LOC) — rolling back, re-route Sonnet"
git reset --hard "${commit}^"
# Re-queue bug với severity upgrade → MODERATE → spawn Sonnet
;;
warn|*)
echo "⚠ MINOR fix ($commit) bloated: $ACTUAL_LOC LOC > $TRIPWIRE_LOC threshold. Consider re-classify."
echo "tripwire: $commit actual_loc=$ACTUAL_LOC severity=MINOR" >> "${PHASE_DIR}/build-state.log"
;;
esac
fi
done
fi
Narration format:
▶ Fix 1/5: [inline] MINOR edit button label mismatch
✓ Fixed 1 file, 2 LOC
▶ Fix 2/5: [spawn] MODERATE form validation missing on /sites/new
✓ Agent completed: 3 files, 24 LOC (model: ${SPAWN_MODEL})
▶ Fix 3/5: [escalated] MAJOR bulk import UI absent
→ REVIEW-FEEDBACK.md
▶ Fix 4/5: [inline] MINOR CSS overflow on mobile
⚠ Tripwire hit: 45 LOC > 15 threshold — flagged for re-classify
Narrator chỉ hiển thị model id user đã config, KHÔNG hardcode "Sonnet"/"GPT-4o"/etc.
Then for each fixed bug (inline OR via Sonnet):
if [ "$GRAPHIFY_ACTIVE" = "true" ]; then
# Get files changed by this fix
FIXED_FILES=$(git diff --name-only HEAD)
echo "$FIXED_FILES" > "${PHASE_DIR}/.fix-ripple-input.txt"
# Run ripple analysis on fixed files
${PYTHON_BIN} .claude/scripts/build-caller-graph.py \
--changed-files-input "${PHASE_DIR}/.fix-ripple-input.txt" \
--config .claude/vg.config.md \
--graphify-graph "$GRAPHIFY_GRAPH_PATH" \
--output "${PHASE_DIR}/.fix-ripple.json"
# Check if fix affects callers outside the fixed file
RIPPLE_COUNT=$(${PYTHON_BIN} -c "
import json
d = json.load(open('${PHASE_DIR}/.fix-ripple.json'))
callers = d.get('affected_callers', [])
print(len(callers))
")
if [ "$RIPPLE_COUNT" -gt 0 ]; then
echo "⚠ Fix ripple: ${RIPPLE_COUNT} callers may be affected by this change"
echo " Adding caller views to re-verify list (step 3d)"
# Map caller files → views for re-verification in step 3d
RIPPLE_VIEWS=$(${PYTHON_BIN} -c "
import json
d = json.load(open('${PHASE_DIR}/.fix-ripple.json'))
for c in d.get('affected_callers', []):
print(c)
")
fi
fi
Without graphify: step 3d re-verifies affected views by git diff only (may miss indirect callers).fix({phase}): {description}After all fixes:
Redeploy using env-commands.md deploy(env)
Health check → if fail → rollback
After fix+redeploy, spawn Sonnet agents to re-verify affected views + ripple zones:
1. Get new SHA: git rev-parse HEAD
2. git diff old_sha..new_sha → list changed files
3. Map changed files to views (using code_patterns from config):
- Changed API routes → views that call those endpoints
- Changed page components → those specific views
- Graphify ripple callers (from step 3c) → views importing those callers
4. Group affected views + ripple views into zones
5. Spawn Sonnet agents (parallel) for affected zones ONLY:
Agent prompt: "Re-verify these fixed actions in {zone}.
Previous errors: {error list from 3a}
Expected: errors should be resolved.
Test each previously-failed action.
Also check: did the fix break anything else on this view?
Report: {action, was_broken, now_works, new_issues}"
6. Wait all → merge results:
- Fixed errors → update matrix: ❌ → 🔍 REVIEW-PASSED
- Still broken → keep ❌, increment iteration
- New errors from fix → add to error list
- Update RUNTIME-MAP with corrected observations
- Log current build SHA in PIPELINE-STATE.json `steps.review.last_fix_sha`
Repeat 3a-3d until:
Display after each iteration:
Fix iteration {N}/3:
Errors fixed: {N}
Errors remaining: {N} (infra: {N}, spec-gap: {N}, pre-existing: {N})
Sonnet agents spawned: {N} (re-verified {M} views)
New errors found: {N}
Matrix coverage: {review_passed}/{total} goals
Map stable: {YES|NO}
When iter 3 exits with errors STILL remaining (loop hit cap without self-resolving), do NOT silent-BLOCK. Spawn diagnostic_l2 single-advisory fallback:
block_family ∈ {schema_drift, validation_bug,
auth_issue, db_constraint, business_logic, integration_failure,
unknown}.L2Proposal.json with confidence + proposed_fix..l2-proposals/{proposal_id}.json + DEFECT-LOG entry referencing
the proposal.if [ "${ITER:-1}" -eq 3 ] && [ -n "${REMAINING_ERRORS}" ] && \
{ [ -f "${REPO_ROOT}/.claude/scripts/spawn-diagnostic-l2.py" ] || [ -f "${REPO_ROOT}/scripts/spawn-diagnostic-l2.py" ]; }; then
echo "━━━ Phase 3e — Diagnostic L2 fallback (iter 3 hit cap) ━━━"
DIAGNOSTIC_L2="${REPO_ROOT}/.claude/scripts/spawn-diagnostic-l2.py"
[ -f "$DIAGNOSTIC_L2" ] || DIAGNOSTIC_L2="${REPO_ROOT}/scripts/spawn-diagnostic-l2.py"
L2_ARGS=(
--phase "${PHASE_NUMBER}"
--gate-id "review.fix_loop"
--evidence-file "${PHASE_DIR}/.fix-loop-evidence.json"
)
L2_OUT=$("${PYTHON_BIN:-python3}" "$DIAGNOSTIC_L2" \
"${L2_ARGS[@]}" 2>&1)
L2_PROPOSAL_ID=$(echo "$L2_OUT" | ${PYTHON_BIN:-python3} -c "
import json, sys
try: print(json.loads(sys.stdin.read()).get('proposal_id',''))
except: print('')
")
if [ -n "$L2_PROPOSAL_ID" ]; then
echo " L2 proposal generated: $L2_PROPOSAL_ID"
# Open DEFECT-LOG entry referencing the proposal
TESTER_PRO_CLI="${REPO_ROOT}/.claude/scripts/tester-pro-cli.py"
[ -f "$TESTER_PRO_CLI" ] || TESTER_PRO_CLI="${REPO_ROOT}/scripts/tester-pro-cli.py"
if [ -f "$TESTER_PRO_CLI" ]; then
"${PYTHON_BIN:-python3}" "$TESTER_PRO_CLI" defect new \
--phase "${PHASE_NUMBER}" \
--title "[ITER-LIMIT] Fix loop hit max=3, L2 proposal $L2_PROPOSAL_ID" \
--severity major --found-in review \
--notes "L2 proposal at .l2-proposals/${L2_PROPOSAL_ID}.json — user decision pending" \
2>&1 | sed 's/^/ /' || true
fi
# User gate is provider-native after spawn-diagnostic-l2.py:
# Claude Code uses AskUserQuestion; Codex asks in the main thread/UI.
# On accept → run-complete sees applied; on reject → BLOCK below.
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator emit-event \
"review.diagnostic_l2_spawned" \
--payload "{\"phase\":\"${PHASE_NUMBER}\",\"proposal_id\":\"$L2_PROPOSAL_ID\"}" \
>/dev/null 2>&1 || true
fi
fi
## Phase 4: GOAL COMPARISONTại sao không tự apply L2 fix: L2 đã sai trong dogfood 3.2 (propose fix giả mà có vẻ hợp lý). User gate là single source of truth cho fix correctness. Audit trail (
.l2-proposals/) cho phép trace sau-incident: proposal nào được accept/reject, fix tham chiếu commit nào.
→ narrate_phase "Phase 4 — Goal comparison" "So khớp ${N} goals từ TEST-GOALS với views đã khám phá"
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator step-active phase4_goal_comparison >/dev/null 2>&1 || true
For every TEST-GOALS/G-NN.md with goal_type: mutation, run the runtime
gate. BLOCK review on assertion fail (R8 update_did_not_apply, etc).
Action payload comes from per-phase fixture (FIXTURES/G-NN.action.json).
EVIDENCE_DIR="${PHASE_DIR}/.rcrurd-evidence"
mkdir -p "$EVIDENCE_DIR"
RCRURD_FAILED=0
RCRURD_RAN=0
if [ -d "${PHASE_DIR}/TEST-GOALS" ]; then
for goal in "${PHASE_DIR}/TEST-GOALS"/G-*.md; do
[ -f "$goal" ] || continue
grep -qE "goal_type:[[:space:]]*mutation" "$goal" || continue
RCRURD_RAN=$((RCRURD_RAN+1))
ev_out="${EVIDENCE_DIR}/$(basename "$goal" .md).json"
payload="{}"
fixture="${PHASE_DIR}/FIXTURES/$(basename "$goal" .md).action.json"
[ -f "$fixture" ] && payload=$(cat "$fixture")
"${PYTHON_BIN:-python3}" .claude/scripts/validators/verify-rcrurd-runtime.py \
--goal-file "$goal" \
--phase "${PHASE_NUMBER}" \
--action-payload "$payload" \
--auth-header "$(vg_config_get review.rcrurd_auth_header '')" \
--evidence-out "$ev_out" || RCRURD_FAILED=1
done
fi
if [ "$RCRURD_RAN" -gt 0 ]; then
if [ "$RCRURD_FAILED" = "1" ]; then
echo "⛔ Phase 4.0 RCRURD runtime — at least one mutation goal failed (of ${RCRURD_RAN} run)"
echo " Evidence: ${EVIDENCE_DIR}/*.json"
echo " Route through classifier (Task 7) — most are IN_SCOPE for current phase"
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator emit-event \
"review.rcrurd_runtime_failed" \
--payload "{\"phase\":\"${PHASE_NUMBER}\",\"evidence_dir\":\"${EVIDENCE_DIR}\",\"goals_run\":${RCRURD_RAN}}" \
2>/dev/null || true
exit 1
fi
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator emit-event \
"review.rcrurd_runtime_passed" \
--payload "{\"phase\":\"${PHASE_NUMBER}\",\"goals_run\":${RCRURD_RAN}}" \
2>/dev/null || true
fi
Read ${PHASE_DIR}/TEST-GOALS.md (generated by /vg:blueprint).
If missing → generate from CONTEXT.md + API-CONTRACTS.md (fallback).
P1 v2.49+ — Edge case variants: also load ${PHASE_DIR}/EDGE-CASES/G-NN.md
per goal. Status format extended:
G-04: PASS / G-04: FAIL / G-04: NOT_TESTEDG-04: PASS (5/6 variants — G-04-c1 NOT_TESTED [needs concurrency harness])For each variant in EDGE-CASES/G-NN.md:
start_view (per RUNTIME-MAP) with variant's inputSkip variants when:
review.edge_cases_unavailable (severity=warn) + treat goal as 1-variant--skip-low-edge-cases flag setEmit per gate-blocked variant: review.edge_case_variant_blocked with
{goal_id, variant_id, reason} payload.
Parse goals: ID, description, success criteria, mutation evidence, dependencies, priority.
Surface classification (v1.9.1 R1 — lazy migration, runs BEFORE browser discover decisions):
# shellcheck source=_shared/lib/goal-classifier.sh
. .claude/commands/vg/_shared/lib/goal-classifier.sh
set +e
classify_goals_if_needed "${PHASE_DIR}/TEST-GOALS.md" "${PHASE_DIR}"
gc_rc=$?
set -e
# rc=2 → provider-native cheap classifier
# Claude: Haiku Task per row; Codex: read-only scanner adapter over pending TSV
# rc=3 → provider-native prompt (surface list from config), then classify_goals_apply
Parse **Surface:** <name> per goal.
Surface-aware routing (tightened v1.9.1 R1):
For each goal:
surface == "ui" / "ui-mobile" → proceed with existing browser RUNTIME-MAP lookup below.surface ∈ { api, data, time-driven, integration, custom } → skip browser discover for this goal; instead run lightweight surface probe:
api → grep apps/**/src/** for route handler matching contract path → READY if present.data → grep migrations + config.infra_deps for table/collection → READY if present; INFRA_PENDING if service unavailable.time-driven→ grep cron/scheduler registration in apps/workers/**/apps/api/** → READY if handler wired.integration→ check ${PHASE_DIR}/test-runners/fixtures/${gid}.integration.sh exists AND downstream caller found → READY.Result feeds GOAL-COVERAGE-MATRIX with (status, surface, probe_evidence).
Pure-backend fast-path:
UI_GOAL_COUNT=$(grep -c '^\*\*Surface:\*\* ui' "${PHASE_DIR}/TEST-GOALS.md" || echo 0)
if [ "$UI_GOAL_COUNT" -eq 0 ]; then
echo "🧭 Pure-backend phase (không có goal UI) — bỏ qua browser discovery (khám phá trình duyệt), dùng surface probes." >&2
# Emit empty RUNTIME-MAP if not written yet, skip to 4b
[ -f "${PHASE_DIR}/RUNTIME-MAP.json" ] || echo '{"views":{},"goal_sequences":{}}' > "${PHASE_DIR}/RUNTIME-MAP.json"
# Issue #120: runtime_contract still requires one root scan-*.json artifact
# even when backend-only review legitimately skips browser discovery. Emit a
# synthetic backend scan so run-complete does not false-block on must_write.
BACKEND_SCAN_JSON="${PHASE_DIR}/scan-backend-surface-probes.json"
if [ ! -f "$BACKEND_SCAN_JSON" ]; then
"${PYTHON_BIN:-python3}" - "$BACKEND_SCAN_JSON" <<'PY'
import json
import sys
from datetime import datetime, timezone
from pathlib import Path
payload = {
"view": "backend://surface-probes",
"surface": "backend",
"generated_by": "phase4_goal_comparison.pure_backend_fastpath",
"generated_at": datetime.now(timezone.utc).isoformat().replace("+00:00", "Z"),
"results": [],
"forms": [],
"tables": [],
"modal_triggers": [],
"sub_views_discovered": [],
}
Path(sys.argv[1]).write_text(json.dumps(payload, indent=2) + "\n", encoding="utf-8")
PY
fi
fi
Mixed-phase surface probe execution (v1.9.2.3 P3):
For phases có CẢ UI goals (cần browser) VÀ backend goals (api/data/integration/time-driven), browser phase chỉ cover UI goals. Backend goals PHẢI được probe SEPARATELY để avoid rơi vào NOT_SCANNED branch.
# Run surface probes cho goals có surface ≠ ui
source "${REPO_ROOT}/.claude/commands/vg/_shared/lib/surface-probe.sh" 2>/dev/null || true
if type -t run_surface_probe >/dev/null 2>&1; then
PROBE_RESULTS_JSON="${PHASE_DIR}/.surface-probe-results.json"
echo '{"probed_at":"'"$(date -u +%FT%TZ)"'","results":{' > "$PROBE_RESULTS_JSON"
FIRST=true
# Extract goal_id + surface pairs from TEST-GOALS.md
${PYTHON_BIN} -c "
import re
tg = open('${PHASE_DIR}/TEST-GOALS.md', encoding='utf-8').read()
for gid, surface in re.findall(r'^## Goal (G-[\w]+):.*?^\*\*Surface:\*\* (\w[\w-]*)', tg, re.M|re.S):
print(f'{gid} {surface}')
" | while read -r gid surface; do
surface="${surface%$'\r'}"
# Skip UI — browser phase handles them
[ "$surface" = "ui" ] || [ "$surface" = "ui-mobile" ] && continue
PROBE=$(run_surface_probe "$gid" "$surface" "$PHASE_DIR" 2>/dev/null)
STATUS=$(echo "$PROBE" | cut -d'|' -f1)
EVIDENCE=$(echo "$PROBE" | cut -d'|' -f2- | sed 's/"/\\"/g')
[ "$FIRST" = "true" ] && FIRST=false || echo "," >> "$PROBE_RESULTS_JSON"
printf '"%s":{"surface":"%s","status":"%s","evidence":"%s"}' \
"$gid" "$surface" "$STATUS" "$EVIDENCE" >> "$PROBE_RESULTS_JSON"
done
echo '}}' >> "$PROBE_RESULTS_JSON"
# Summary narration
PROBED=$(${PYTHON_BIN} -c "
import json
d = json.load(open('$PROBE_RESULTS_JSON'))['results']
from collections import Counter
c = Counter(r['status'] for r in d.values())
print(f'Phase 4a surface probes: {len(d)} backend goals probed → {dict(c)}')")
echo "▸ $PROBED"
# v2.48.1 (Issue #85) — backfill synthetic goal_sequences[gid] for non-UI
# goals from probe results so verify-matrix-evidence-link.py (which only
# inspects RUNTIME-MAP goal_sequences[]) sees backend evidence. Closes the
# surface-probe schema gap that BLOCKed Phase 3.2 dogfood with 32 non-UI
# READY goals flagged matrix_status_without_runtime_sequence.
# Idempotent: re-runs overwrite synthetic entries by gid, never overwrites
# real browser-recorded sequences.
if [ -f "${REPO_ROOT}/.claude/scripts/backfill-surface-probe-runtime.py" ]; then
"${PYTHON_BIN:-python3}" "${REPO_ROOT}/.claude/scripts/backfill-surface-probe-runtime.py" \
--phase-dir "$PHASE_DIR" 2>&1 | sed 's/^/▸ /' || true
fi
fi
Phase 4b integration: Khi check goal_sequences cho backend goals (surface ≠ ui), trước khi mark NOT_SCANNED hãy check .surface-probe-results.json:
Infra dependency filter (config-driven):
If goal has **Infra deps:** field (e.g., [clickhouse, kafka, pixel_server]):
# Check each infra dep against current environment
for dep in goal.infra_deps:
SERVICE_CHECK=$(read config.infra_deps.services[dep].check_${ENV})
if ! eval "$SERVICE_CHECK" 2>/dev/null; then
goal.status = "INFRA_PENDING"
goal.skip_reason = "${dep} not available on ${ENV}"
fi
done
Goals classified as INFRA_PENDING are excluded from gate calculation (when config.infra_deps.unmet_behavior == "skip"). They don't count as BLOCKED or FAIL — they're simply not testable on current environment.
Display: INFRA_PENDING ({dep}) in matrix with distinct icon.
Console noise filter (config-driven):
When evaluating console errors from Phase 2 discovery, filter against config.console_noise.patterns:
if [ "${config_console_noise_enabled}" = "true" ]; then
for pattern in config.console_noise.patterns:
# Remove matching errors from bug list — classify as INFRA_NOISE
REAL_ERRORS=$(echo "$ALL_CONSOLE_ERRORS" | grep -viE "$pattern")
done
NOISE_COUNT=$((TOTAL_ERRORS - REAL_ERROR_COUNT))
echo "Console: ${REAL_ERROR_COUNT} real errors, ${NOISE_COUNT} infra noise (filtered)"
fi
Only REAL_ERRORS (not matching noise patterns) count as view failures.
For each goal, check goal_sequences in RUNTIME-MAP.json:
For each goal:
IF goal_sequences[goal_id] exists AND result == "passed":
→ STATUS: READY (goal was verified during Pass 2a)
IF goal_sequences[goal_id] exists AND result == "failed":
→ STATUS: BLOCKED (with specific failure steps from goal_sequence)
IF goal_sequences[goal_id] does NOT exist:
# Before marking UNREACHABLE, verify code presence to distinguish
# true "not built" from "built but not scanned"
code_exists = check via grep against config.code_patterns:
- Does goal's expected page file exist? (e.g., FloorRulesListPage.tsx)
- Is the route registered? (e.g., /floor-rules in router)
- Do related API endpoints have handlers? (grep API-CONTRACTS vs apps/api/)
IF code_exists == FALSE:
→ STATUS: UNREACHABLE (feature not built — fix with /vg:build --gaps-only)
IF code_exists == TRUE:
→ STATUS: NOT_SCANNED (intermediate only — MUST resolve before review exits)
Root cause likely one of:
- Multi-step wizard/mutation needs dedicated browser session
- Goal path not reachable from discovered sidebar (orphan route)
- Review ran --retry-failed but this goal wasn't in retry set
- Haiku agent timed out or skipped
- Goal has no UI surface but TEST-GOALS didn't mark infra_deps
→ RESOLUTION (tightened 2026-04-17 — NOT_SCANNED không được defer sang /vg:test):
NOT_SCANNED là trạng thái TRUNG GIAN, KHÔNG phải kết luận hợp lệ.
Review PHẢI resolve thành 1 trong 4 status kết luận: READY | BLOCKED | UNREACHABLE | INFRA_PENDING
Cách resolve (pick 1):
a) /vg:review {phase} --retry-failed với deeper probe (nếu timeout/depth issue)
b) Goal không có UI surface → update TEST-GOALS với `**Infra deps:** [<user-defined no-ui tag>]` → re-classify INFRA_PENDING (tag value do user định nghĩa trong config.infra_deps, workflow không hardcode)
c) Orphan/hidden route → verify config.code_patterns.frontend_routes đã cover pattern đó
d) Genuinely unreachable (feature đã build nhưng UX path không exist) → manually mark UNREACHABLE with reason note
Status semantics (tightened 2026-04-17):
4 status kết luận hợp lệ (chỉ 4 status này được write vào GOAL-COVERAGE-MATRIX final):
| Status | Meaning | Fix command |
|---|---|---|
| READY | Goal verified, evidence in goal_sequences | none |
| BLOCKED | View found, scan ran, criteria failed | fix code → --retry-failed |
| UNREACHABLE | Code not in repo / UX path không exist | /vg:build --gaps-only |
| INFRA_PENDING | Goal needs service/infra not available on ENV | deploy infra or sandbox |
2 status trung gian (PHẢI resolve trước khi exit Phase 4):
| Status | Meaning | Action BẮT BUỘC |
|---|---|---|
| NOT_SCANNED | Code exists, review didn't replay | --retry-failed HOẶC re-classify thành 1 trong 4 status trên |
| FAILED | Scan timeout/exception | check logs → --retry-failed |
⛔ GLOBAL RULE: KHÔNG được defer NOT_SCANNED sang /vg:test.
Lý do: /vg:test codegen LẤY steps từ goal_sequences[] mà review ghi. NOT_SCANNED = review không ghi sequence = codegen không có input. Test không phải fallback cho review miss.
Goals không có UI surface đúng ra phải mark infra_deps: [<no-ui tag>] trong TEST-GOALS (tag value do project config quy ước) → skip ở review (INFRA_PENDING) → test qua integration/unit layer ở build phase, KHÔNG qua /vg:test E2E.
Trước khi chạy weighted gate, PHẢI resolve mọi NOT_SCANNED + FAILED thành 1 trong 4 kết luận.
# OHOK-8 round-4 Codex fix: replace pseudocode with real bash grep.
# Previously `count goals where status == "NOT_SCANNED"` was not executable
# → gate couldn't run → NOT_SCANNED goals slipped through unresolved.
MATRIX="${PHASE_DIR}/GOAL-COVERAGE-MATRIX.md"
NOT_SCANNED_COUNT=$(grep -cE '^\| G-[0-9]+.*\|[[:space:]]*NOT_SCANNED[[:space:]]*\|' "$MATRIX" 2>/dev/null || echo 0)
FAILED_COUNT=$(grep -cE '^\| G-[0-9]+.*\|[[:space:]]*FAILED[[:space:]]*\|' "$MATRIX" 2>/dev/null || echo 0)
INTERMEDIATE=$((NOT_SCANNED_COUNT + FAILED_COUNT))
# Build the list of intermediate goal IDs (used later in override auto-convert)
INTERMEDIATE_GOALS=$(grep -oE '^\| (G-[0-9]+)[^|]*\|[^|]*\|[^|]*\|[[:space:]]*(NOT_SCANNED|FAILED)[[:space:]]*\|' "$MATRIX" 2>/dev/null \
| grep -oE 'G-[0-9]+' | sort -u | tr '\n' ' ')
if [ "$INTERMEDIATE" -gt 0 ]; then
echo "⛔ Review cannot exit Phase 4 — ${INTERMEDIATE} intermediate goals:"
echo " NOT_SCANNED: ${NOT_SCANNED_COUNT}"
echo " FAILED: ${FAILED_COUNT}"
echo ""
echo "Intermediate ≠ conclusion. Resolve before exit:"
echo " a) /vg:review ${PHASE_NUMBER} --retry-failed (deeper probe)"
echo " b) Update TEST-GOALS with 'Infra deps: [backend_only]' nếu goal không có UI"
echo " → re-classify INFRA_PENDING"
echo " c) Fix config.code_patterns.frontend_routes nếu route ẩn khỏi sidebar"
echo " d) Manual re-classify UNREACHABLE (feature không tồn tại) với reason note"
echo ""
echo "⛔ KHÔNG ĐƯỢC defer sang /vg:test để 'cover' NOT_SCANNED goals."
echo " Test codegen lấy input từ goal_sequences review ghi. NOT_SCANNED = no input."
echo ""
echo "Override (NOT RECOMMENDED — creates debt):"
echo " /vg:review ${PHASE_NUMBER} --allow-intermediate"
echo " → Auto-convert remaining NOT_SCANNED → UNREACHABLE với reason='review-skip'"
echo " → Logged to GOAL-COVERAGE-MATRIX.md 'Debt' section"
if [[ ! "$ARGUMENTS" =~ --allow-intermediate ]]; then
# v1.9.1 R2+R4: block-resolver — try L1 auto-fix (re-scan failed goals) before demanding user override.
# If L1 fails, L2 architect proposal is presented through provider-native L3 prompt.
# L4 only when L2 proposal rejected AND no user direction.
# See _shared/lib/block-resolver.sh
source "${REPO_ROOT}/.claude/commands/vg/_shared/lib/block-resolver.sh" 2>/dev/null || true
if type -t block_resolve >/dev/null 2>&1; then
export VG_CURRENT_PHASE="$PHASE_NUMBER" VG_CURRENT_STEP="review.4c-pre"
BR_GATE_CONTEXT="NOT_SCANNED/FAILED goals block review exit. ${INTERMEDIATE} intermediate goals need conclusion (READY/BLOCKED/UNREACHABLE/INFRA_PENDING)."
BR_EVIDENCE=$(printf '{"not_scanned":%d,"failed":%d,"total_intermediate":%d}' "$NOT_SCANNED_COUNT" "$FAILED_COUNT" "$INTERMEDIATE")
BR_CANDIDATES='[{"id":"retry-failed-scan","cmd":"echo retry-failed auto-fix would re-trigger scanner for FAILED goals only; skipping in safe mode","confidence":0.5,"rationale":"retry-failed probe may reclassify goals without human override"}]'
BR_RESULT=$(block_resolve "not-scanned-defer" "$BR_GATE_CONTEXT" "$BR_EVIDENCE" "$PHASE_DIR" "$BR_CANDIDATES")
BR_LEVEL=$(echo "$BR_RESULT" | ${PYTHON_BIN} -c "import json,sys; print(json.loads(sys.stdin.read()).get('level',''))" 2>/dev/null)
if [ "$BR_LEVEL" = "L1" ]; then
echo "✓ Block resolver L1 self-resolved — intermediate goals auto-fixed"
elif [ "$BR_LEVEL" = "L2" ]; then
block_resolve_l2_handoff "not-scanned-defer" "$BR_RESULT" "$PHASE_DIR"
echo " Để proceed sau khi user chấp nhận proposal: re-run /vg:review ${PHASE_NUMBER} --allow-intermediate --reason='<applied proposal>'" >&2
exit 2
else
# L4 truly stuck — print human-direction message
block_resolve_l4_stuck "not-scanned-defer" "L1 failed + L2 produced no actionable proposal"
exit 1
fi
else
exit 1
fi
else
# v1.9.0 T1: rationalization guard — NOT_SCANNED defer is a classic rationalization surface.
RATGUARD_RESULT=$(rationalization_guard_check "not-scanned-defer" \
"NOT_SCANNED = review didn't replay the goal sequence. Deferring = test codegen has no input. Auto-UNREACHABLE hides coverage debt." \
"intermediate_goals=${INTERMEDIATE_GOALS} not_scanned=${NOT_SCANNED_COUNT} failed=${FAILED_COUNT}")
if ! rationalization_guard_dispatch "$RATGUARD_RESULT" "not-scanned-defer" "--allow-intermediate" "$PHASE_NUMBER" "review.4c-pre" "${INTERMEDIATE} intermediate goals"; then
exit 1
fi
# OHOK-8 round-4 Codex fix: update_goal_status was undefined function.
# Replaced with real bash sed that rewrites matrix row in-place.
# Auto-convert intermediate → UNREACHABLE với audit trail.
TS=$(date -u +%FT%TZ)
for gid in $INTERMEDIATE_GOALS; do
# Match row `| G-XX |...|...|...| (NOT_SCANNED|FAILED) |`, replace
# status column only. Preserve other columns. Use | delimiter in sed
# to avoid conflicts with pipe chars in evidence.
sed -i -E "s|^(\| ${gid} \|[^|]+\|[^|]+\|[^|]+\|)[[:space:]]*(NOT_SCANNED\|FAILED)[[:space:]]*\|(.*)$|\1 UNREACHABLE |review-skip-\2 @${TS}\3|" \
"$MATRIX" 2>/dev/null || true
done
echo "intermediate-override: ${INTERMEDIATE} goals auto-converted UNREACHABLE ts=$(date -u +%FT%TZ)" \
>> "${PHASE_DIR}/build-state.log"
fi
fi
# Call matrix-merger.sh helper — reads RUNTIME-MAP + probe-results + TEST-GOALS,
# computes per-goal status with priority precedence (browser > probe > code_exists),
# writes canonical GOAL-COVERAGE-MATRIX.md with summary + by-priority + details + gate.
source "${REPO_ROOT}/.claude/commands/vg/_shared/lib/matrix-merger.sh" 2>/dev/null || true
if type -t merge_and_write_matrix >/dev/null 2>&1; then
MERGE_OUTPUT=$(merge_and_write_matrix "$PHASE_DIR" \
"${PHASE_DIR}/TEST-GOALS.md" \
"${PHASE_DIR}/RUNTIME-MAP.json" \
"${PHASE_DIR}/.surface-probe-results.json" \
"${PHASE_DIR}/GOAL-COVERAGE-MATRIX.md" 2>&1)
# Extract machine-readable counts + verdict
VERDICT=$(echo "$MERGE_OUTPUT" | grep '^VERDICT=' | cut -d= -f2)
READY=$(echo "$MERGE_OUTPUT" | grep '^READY=' | cut -d= -f2)
BLOCKED=$(echo "$MERGE_OUTPUT" | grep '^BLOCKED=' | cut -d= -f2)
NOT_SCANNED=$(echo "$MERGE_OUTPUT" | grep '^NOT_SCANNED=' | cut -d= -f2)
INTERMEDIATE=$(echo "$MERGE_OUTPUT" | grep '^INTERMEDIATE=' | cut -d= -f2)
export VERDICT READY BLOCKED NOT_SCANNED INTERMEDIATE
echo "✓ GOAL-COVERAGE-MATRIX.md: VERDICT=$VERDICT (ready=$READY blocked=$BLOCKED not_scanned=$NOT_SCANNED)"
else
echo "⚠ matrix-merger.sh missing — falling back to manual matrix write (legacy path)"
# Legacy path: orchestrator writes matrix directly using template below
fi
# Defense-in-depth: matrix-merger now downgrades shallow mutation sequences, but
# keep an explicit validator so legacy/hand-written RUNTIME-MAP files cannot
# mark create/update/delete goals READY from list-only evidence.
CRUD_DEPTH_VAL="${REPO_ROOT}/.claude/scripts/validators/verify-runtime-map-crud-depth.py"
if [ -f "$CRUD_DEPTH_VAL" ]; then
mkdir -p "${PHASE_DIR}/.tmp"
"${PYTHON_BIN:-python3}" "$CRUD_DEPTH_VAL" --phase "${PHASE_NUMBER}" \
> "${PHASE_DIR}/.tmp/runtime-map-crud-depth-review.json" 2>&1
CRUD_DEPTH_RC=$?
if [ "$CRUD_DEPTH_RC" != "0" ]; then
echo "⛔ Runtime map CRUD depth gate failed — see ${PHASE_DIR}/.tmp/runtime-map-crud-depth-review.json"
echo " Mutation goals require observed POST/PUT/PATCH/DELETE + persistence proof."
echo " Re-run /vg:review ${PHASE_NUMBER} with deeper CRUD interaction before /vg:test."
exit 1
fi
fi
# v2.35.0 verdict gate hardening (closes #51) — 3 invariants replacing path-existence checks
# Override per-phase: --skip-content-invariants=<reason> logs OVERRIDE-DEBT
if [[ ! "$ARGUMENTS" =~ --skip-content-invariants ]]; then
for VALIDATOR in verify-interface-standards verify-goal-security verify-goal-perf verify-security-baseline verify-haiku-scan-completeness verify-runtime-map-coverage verify-crud-runs-coverage verify-error-message-runtime; do
VAL_PATH="${REPO_ROOT}/.claude/scripts/validators/${VALIDATOR}.py"
if [ -f "$VAL_PATH" ]; then
mkdir -p "${PHASE_DIR}/.tmp"
VAL_OUT="${PHASE_DIR}/.tmp/${VALIDATOR}-diagnostic-input.txt"
case "$VALIDATOR" in
verify-interface-standards)
${PYTHON_BIN:-python3} "$VAL_PATH" --phase "${PHASE_NUMBER}" --profile "${PROFILE:-${CONFIG_PROFILE:-web-fullstack}}" > "$VAL_OUT" 2>&1
;;
verify-error-message-runtime)
${PYTHON_BIN:-python3} "$VAL_PATH" --phase "${PHASE_NUMBER}" > "$VAL_OUT" 2>&1
;;
verify-goal-security|verify-goal-perf)
${PYTHON_BIN:-python3} "$VAL_PATH" --phase "${PHASE_NUMBER}" > "$VAL_OUT" 2>&1
;;
verify-security-baseline)
${PYTHON_BIN:-python3} "$VAL_PATH" --phase "${PHASE_NUMBER}" --scope all > "$VAL_OUT" 2>&1
;;
*)
${PYTHON_BIN:-python3} "$VAL_PATH" --phase-dir "$PHASE_DIR" > "$VAL_OUT" 2>&1
;;
esac
VAL_RC=$?
cat "$VAL_OUT"
if [ "$VAL_RC" -ne 0 ]; then
echo ""
echo "⛔ Verdict gate invariant FAILED: ${VALIDATOR}"
echo " v2.35.0 hardened gate: review cannot PASS with empty/incomplete artifacts."
echo " Either re-run /vg:review ${PHASE_NUMBER} with proper scanner/dispatch coverage,"
echo " or pass --skip-content-invariants=\"<reason>\" to log OVERRIDE-DEBT."
DIAG_SCRIPT="${REPO_ROOT}/.claude/scripts/review-block-diagnostic.py"
if [ -f "$DIAG_SCRIPT" ]; then
"${PYTHON_BIN:-python3}" "$DIAG_SCRIPT" \
--gate-id "review.${VALIDATOR}" \
--phase-dir "$PHASE_DIR" \
--input "$VAL_OUT" \
--out-md "${PHASE_DIR}/.tmp/${VALIDATOR}-diagnostic.md" \
>/dev/null 2>&1 || true
cat "${PHASE_DIR}/.tmp/${VALIDATOR}-diagnostic.md" 2>/dev/null || true
fi
emit_telemetry_v2 "review_verdict_invariant_failed" "${PHASE_NUMBER}" \
"review.4-verdict" "${VALIDATOR}" "BLOCK" "{}" 2>/dev/null || true
exit 1
fi
fi
done
fi
LENS_PLAN_SCRIPT="${REPO_ROOT}/.claude/scripts/review-lens-plan.py"
if [ -f "$LENS_PLAN_SCRIPT" ] && [[ ! "$ARGUMENTS" =~ --skip-lens-plan-gate ]]; then
mkdir -p "${PHASE_DIR}/.tmp" 2>/dev/null
"${PYTHON_BIN:-python3}" "$LENS_PLAN_SCRIPT" \
--phase-dir "$PHASE_DIR" \
--profile "${PROFILE:-${CONFIG_PROFILE:-web-fullstack}}" \
--mode "${REVIEW_MODE:-full}" \
--validate-only \
--json \
> "${PHASE_DIR}/.tmp/review-lens-plan-validation.json" 2>&1
LENS_GATE_RC=$?
if [ "$LENS_GATE_RC" -ne 0 ]; then
echo ""
echo "⛔ Review lens plan gate FAILED — required checklist plugins lack evidence."
echo " See ${PHASE_DIR}/.tmp/review-lens-plan-validation.json"
DIAG_SCRIPT="${REPO_ROOT}/.claude/scripts/review-block-diagnostic.py"
if [ -f "$DIAG_SCRIPT" ]; then
"${PYTHON_BIN:-python3}" "$DIAG_SCRIPT" \
--gate-id "review.lens_plan_gate" \
--phase-dir "$PHASE_DIR" \
--input "${PHASE_DIR}/.tmp/review-lens-plan-validation.json" \
--out-md "${PHASE_DIR}/.tmp/review-lens-plan-diagnostic.md" \
>/dev/null 2>&1 || true
cat "${PHASE_DIR}/.tmp/review-lens-plan-diagnostic.md" 2>/dev/null || true
fi
echo " Re-run /vg:review ${PHASE_NUMBER} --mode=full --force so API docs, browser inventory, URL-state/filter/paging, error-message, visual, and findings lenses execute."
exit 1
fi
fi
Generated matrix format (canonical, from matrix-merger):
# Goal Coverage Matrix — Phase {phase}
**Generated:** {ISO-timestamp}
**Source:** RUNTIME-MAP.json + .surface-probe-results.json
**Merger:** _shared/lib/matrix-merger.sh v1.9.2.4
## Summary
- Total goals: {N}
- READY: {N}
- BLOCKED: {N}
- NOT_SCANNED: {N} (intermediate)
- UNREACHABLE: {N}
- INFRA_PENDING: {N}
- FAILED: {N} (intermediate)
## By Priority
| Priority | Ready | Blocked | Other | Total | Threshold | Pass % | Status |
|----------|-------|---------|-------|-------|-----------|--------|--------|
| critical | {N} | {N} | {N} | {N} | 100% | {X%} | ✅ PASS/⛔ BLOCK |
| important | {N} | {N} | {N} | {N} | 80% | {X%} | ... |
| nice-to-have | {N} | {N} | {N} | {N} | 50% | {X%} | ... |
## Goal Details
| Goal | Priority | Surface | Status | Evidence |
|------|----------|---------|--------|----------|
| G-01 | critical | api | READY | handler=apps/api/src/... |
## Gate: ✅/⛔/⚠️ {VERDICT}
{PASS|BLOCK|INTERMEDIATE message with next-action hints}
Triage chạy inline ngay sau matrix ghi, TRƯỚC 100% gate. Mục đích: đọc scope tag (depends_on_phase, verification_strategy) từ CONTEXT.md, phân loại mỗi UNREACHABLE thành verdict + action_required, rồi áp dụng action nào autonomous được (mark_deferred/mark_manual). Các action cần người quyết định (spawn_fix_agent, draft_amendment_ask, prompt_scope_tag) sẽ ghi vào hàng đợi nhưng vẫn BLOCK gate — không có đường thoát ngụỵ trang.
session_mark_step "4d-inline-triage"
echo ""
echo "🔍 Triage + áp dụng action cho UNREACHABLE goals (v1.14.0+)..."
# v1.14.3 H3 — pre-scan test source for @deferred markers so triage sees them
# alongside scope tags. Fixes gap where executor-written it.skip('@deferred X')
# was ignored (tests were skipped but matrix still BLOCKED).
DEFER_SCANNER=".claude/scripts/scan-deferred-tests.py"
if [ -f "$DEFER_SCANNER" ]; then
echo "▸ Pre-scan: @deferred markers in test source..."
${PYTHON_BIN:-python3} "$DEFER_SCANNER" \
--phase-dir "${PHASE_DIR}" --repo-root "${REPO_ROOT:-.}" 2>&1 | tail -12 || true
# Writes .deferred-tests.json — consumed by unreachable-triage below
fi
# Chạy triage (sinh .unreachable-triage.json + UNREACHABLE-TRIAGE.md)
source "${REPO_ROOT}/.claude/commands/vg/_shared/lib/unreachable-triage.sh" 2>/dev/null || true
if type -t triage_unreachable_goals >/dev/null 2>&1; then
triage_unreachable_goals "$PHASE_DIR" "$PHASE_NUMBER"
else
echo "⚠ unreachable-triage.sh missing — triage bị bỏ qua, 100% gate sẽ hard-block mọi UNREACHABLE." >&2
fi
TRIAGE_JSON="${PHASE_DIR}/.unreachable-triage.json"
MATRIX="${PHASE_DIR}/GOAL-COVERAGE-MATRIX.md"
if [ -f "$TRIAGE_JSON" ] && [ -f "$MATRIX" ]; then
# Áp dụng action_required autonomous: mark_deferred, mark_manual
# Những action còn lại (spawn_fix_agent, draft_amendment_ask, prompt_scope_tag) ghi log + để BLOCK.
PYTHONIOENCODING=utf-8 ${PYTHON_BIN} - "$TRIAGE_JSON" "$MATRIX" "$PHASE_DIR" "$PHASE_NUMBER" <<'PY'
import json, sys, re
from pathlib import Path
from datetime import datetime, timezone
triage_path = Path(sys.argv[1])
matrix_path = Path(sys.argv[2])
phase_dir = Path(sys.argv[3])
phase_num = sys.argv[4]
try:
triage = json.loads(triage_path.read_text(encoding="utf-8"))
except Exception as e:
print(f"⚠ Không đọc được triage JSON: {e}")
sys.exit(0)
verdicts = triage.get("verdicts", {})
# v1.14.3 H3 — merge .deferred-tests.json as additional deferral source
# (test files with it.skip('@deferred X') markers that aren't in CONTEXT.md scope tags).
deferred_tests_path = phase_dir / ".deferred-tests.json"
test_deferrals = {} # ts_id → {reason, kind}
if deferred_tests_path.exists():
try:
dt = json.loads(deferred_tests_path.read_text(encoding="utf-8"))
for entry in dt.get("deferred_tests", []):
ts = entry.get("ts_id")
if ts:
test_deferrals[ts] = entry
except Exception as e:
print(f"⚠ Không đọc được .deferred-tests.json: {e}")
if not verdicts and not test_deferrals:
print("ℹ Không có UNREACHABLE cần triage — skip.")
sys.exit(0)
matrix_text = matrix_path.read_text(encoding="utf-8")
pending_queue = [] # cho các action chờ user / destructive
applied = {"mark_deferred": [], "mark_manual": [], "pending": []}
def update_status_in_matrix(text, gid, new_status, note=""):
# Tìm dòng trong ## Goal Details có `| G-XX | ... | UNREACHABLE | ... |`
# Thay UNREACHABLE → new_status; append note vào evidence nếu có.
pat = re.compile(r'^(\| *' + re.escape(gid) + r' *\|[^|]*\|[^|]*\|) *UNREACHABLE *(\|[^\n]*)', re.M)
def _repl(m):
prefix = m.group(1)
suffix = m.group(2)
if note:
suffix = suffix.rstrip("|").rstrip() + f" ({note})|"
return f"{prefix} {new_status} {suffix}"
return pat.sub(_repl, text, count=1)
for gid, v in verdicts.items():
action = v.get("action_required")
params = v.get("action_params", {})
verdict = v.get("verdict", "")
if action == "mark_deferred":
target = params.get("target_phase", "?")
matrix_text = update_status_in_matrix(matrix_text, gid, "DEFERRED", f"depends_on_phase: {target}")
applied["mark_deferred"].append((gid, target))
elif action == "mark_manual":
strat = params.get("strategy", "manual")
matrix_text = update_status_in_matrix(matrix_text, gid, "MANUAL", f"verification: {strat}")
applied["mark_manual"].append((gid, strat))
elif action in ("spawn_fix_agent", "draft_amendment_ask", "prompt_scope_tag"):
# Giữ UNREACHABLE — gate sẽ block; ghi vào pending queue
pending_queue.append({
"phase": phase_num,
"goal_id": gid,
"verdict": verdict,
"action": action,
"params": params,
"title": v.get("title", "(no title)"),
"queued_at": datetime.now(timezone.utc).isoformat(),
})
applied["pending"].append((gid, action))
# v1.14.3 H3 — apply test-level deferrals (fills gap where scope tags missing
# but test source has it.skip('@deferred X')). Status depends on defer_kind:
# depends_on_phase + test-codegen → DEFERRED
# manual + faketime → MANUAL
# unknown → log as pending, leave as UNREACHABLE
for ts_id, entry in test_deferrals.items():
# ts_id is "TS-XX"; matrix may have goal_ids like "TS-16" or "G-XX" — try TS- first
gid = ts_id
kind = entry.get("defer_kind", "unknown")
reason = entry.get("defer_reason", "")
if kind in ("depends_on_phase", "test-codegen"):
matrix_text = update_status_in_matrix(
matrix_text, gid, "DEFERRED",
f"test.skip @deferred: {reason[:50]}",
)
applied["mark_deferred"].append((gid, f"test-marker:{kind}"))
elif kind in ("manual", "faketime"):
matrix_text = update_status_in_matrix(
matrix_text, gid, "MANUAL",
f"test.skip @deferred: {reason[:50]}",
)
applied["mark_manual"].append((gid, kind))
else:
pending_queue.append({
"phase": phase_num,
"goal_id": gid,
"verdict": "test-level @deferred with unknown kind",
"action": "review_test_defer_reason",
"params": {"reason": reason, "source": entry.get("source_file", "")},
"title": entry.get("test_title", ""),
"queued_at": datetime.now(timezone.utc).isoformat(),
})
applied["pending"].append((gid, "test-defer-unknown"))
# Re-sync header counts trong "## Summary" block nếu có thay đổi
def recount_summary(text):
details = re.search(r'^## Goal Details\s*\n(.*?)(?=^\s*## |\Z)', text, re.M|re.S)
if not details:
return text
body = details.group(1)
counts = {"READY":0, "BLOCKED":0, "UNREACHABLE":0, "INFRA_PENDING":0,
"DEFERRED":0, "MANUAL":0, "NOT_SCANNED":0, "FAILED":0}
for line in body.splitlines():
for k in counts:
# Status cell đứng giữa 2 dấu | — tránh match keyword trong evidence
if re.search(r'\|\s*' + k + r'\s*\|', line):
counts[k] += 1
break
def _rewrite_summary(m):
total = sum(counts.values())
new = [
"## Summary",
f"- Total goals: {total}",
f"- READY: {counts['READY']}",
f"- DEFERRED: {counts['DEFERRED']} (tagged depends_on_phase)",
f"- MANUAL: {counts['MANUAL']} (tagged verification_strategy)",
f"- BLOCKED: {counts['BLOCKED']}",
f"- UNREACHABLE: {counts['UNREACHABLE']}",
f"- INFRA_PENDING: {counts['INFRA_PENDING']}",
f"- NOT_SCANNED: {counts['NOT_SCANNED']} (intermediate)",
f"- FAILED: {counts['FAILED']} (intermediate)",
""
]
return "\n".join(new)
new_text = re.sub(r'^## Summary\n(?:[-*].*\n)+', _rewrite_summary, text, count=1, flags=re.M)
return new_text
matrix_text = recount_summary(matrix_text)
matrix_path.write_text(matrix_text, encoding="utf-8")
# Ghi pending queue vào .vg/PENDING-USER-REVIEW.md (append-only)
if pending_queue:
pending_file = Path(".vg/PENDING-USER-REVIEW.md")
pending_file.parent.mkdir(parents=True, exist_ok=True)
header_needed = not pending_file.exists()
with pending_file.open("a", encoding="utf-8") as f:
if header_needed:
f.write("# Pending user review — hàng đợi quyết định đang chờ\n\n")
f.write("Mỗi mục là một goal cần quyết định (scope tag / fix / amendment). ")
f.write("User review xong → duyệt queue thay vì bị hỏi từng cái.\n\n")
f.write("| Phase | Goal | Verdict | Action cần | Tiêu đề | Queued at |\n")
f.write("|---|---|---|---|---|---|\n")
for p in pending_queue:
f.write(f"| {p['phase']} | {p['goal_id']} | {p['verdict']} | {p['action']} | "
f"{p['title'][:60]} | {p['queued_at']} |\n")
# Narration
print(f"▸ Triage applied: "
f"{len(applied['mark_deferred'])} → DEFERRED, "
f"{len(applied['mark_manual'])} → MANUAL, "
f"{len(applied['pending'])} → chờ người duyệt")
for gid, tgt in applied["mark_deferred"]:
print(f" 🔁 {gid} → DEFERRED (depends_on_phase: {tgt})")
for gid, strat in applied["mark_manual"]:
print(f" ✋ {gid} → MANUAL ({strat})")
for gid, act in applied["pending"]:
print(f" ⏳ {gid} → {act} (giữ UNREACHABLE, gate sẽ BLOCK đến khi giải quyết)")
PY
else
[ -f "$TRIAGE_JSON" ] || echo "ℹ Không có triage JSON (không UNREACHABLE goal nào) — skip apply."
fi
Thay gate trọng số (critical/important/nice-to-have) cũ bằng quy tắc đơn giản:
BLOCKED == 0 VÀ UNREACHABLE == 0 (goals ở trạng thái kết thúc: READY + DEFERRED + MANUAL + INFRA_PENDING).BLOCKED hoặc UNREACHABLE.Không còn grey zone — DEFERRED và MANUAL là hai đường thoát hợp lệ nhưng phải declare ở /vg:scope, không phải review tự gắn.
session_mark_step "4e-100pct-gate"
MATRIX="${PHASE_DIR}/GOAL-COVERAGE-MATRIX.md"
# Đọc config gate threshold (default 100, legacy-mode fallback 80)
GATE_THRESHOLD=$(${PYTHON_BIN} -c "
import re, sys
try:
with open('.claude/vg.config.md', encoding='utf-8') as f:
c = f.read()
m = re.search(r'gate_threshold\s*:\s*(\d+)', c)
print(m.group(1) if m else '100')
except Exception:
print('100')
")
# Legacy-mode override: --legacy-mode flag hoặc config review.gate_threshold_legacy
if [[ "$ARGUMENTS" =~ --legacy-mode ]]; then
GATE_THRESHOLD_LEGACY=$(${PYTHON_BIN} -c "
import re
try:
with open('.claude/vg.config.md', encoding='utf-8') as f:
c = f.read()
m = re.search(r'gate_threshold_legacy\s*:\s*(\d+)', c)
print(m.group(1) if m else '80')
except Exception:
print('80')
")
echo "⚠ --legacy-mode: dùng ngưỡng ${GATE_THRESHOLD_LEGACY}% (pre-v1.14). Flag này sẽ hết hạn sau 2 milestones."
GATE_THRESHOLD="$GATE_THRESHOLD_LEGACY"
fi
# Count statuses từ matrix
if [ -f "$MATRIX" ]; then
GATE_COUNTS=$(PYTHONIOENCODING=utf-8 ${PYTHON_BIN} - "$MATRIX" <<'PY'
import re, sys, json
text = open(sys.argv[1], encoding='utf-8').read()
m = re.search(r'^## Goal Details\s*\n(.*?)(?=^\s*## |\Z)', text, re.M|re.S)
body = m.group(1) if m else ""
buckets = ["READY","DEFERRED","MANUAL","BLOCKED","UNREACHABLE","INFRA_PENDING","NOT_SCANNED","FAILED"]
counts = {b:0 for b in buckets}
for line in body.splitlines():
for b in buckets:
if re.search(r'\|\s*' + b + r'\s*\|', line):
counts[b] += 1
break
print(json.dumps(counts))
PY
)
READY=$(echo "$GATE_COUNTS" | ${PYTHON_BIN} -c "import json,sys;print(json.loads(sys.stdin.read())['READY'])")
DEFERRED=$(echo "$GATE_COUNTS" | ${PYTHON_BIN} -c "import json,sys;print(json.loads(sys.stdin.read())['DEFERRED'])")
MANUAL=$(echo "$GATE_COUNTS" | ${PYTHON_BIN} -c "import json,sys;print(json.loads(sys.stdin.read())['MANUAL'])")
BLOCKED=$(echo "$GATE_COUNTS" | ${PYTHON_BIN} -c "import json,sys;print(json.loads(sys.stdin.read())['BLOCKED'])")
UNREACHABLE=$(echo "$GATE_COUNTS" | ${PYTHON_BIN} -c "import json,sys;print(json.loads(sys.stdin.read())['UNREACHABLE'])")
INFRA_PENDING=$(echo "$GATE_COUNTS"| ${PYTHON_BIN} -c "import json,sys;print(json.loads(sys.stdin.read())['INFRA_PENDING'])")
TOTAL=$((READY + DEFERRED + MANUAL + BLOCKED + UNREACHABLE + INFRA_PENDING))
# Goals được tính là "kết thúc": READY + DEFERRED + MANUAL + INFRA_PENDING
PASS_COUNT=$((READY + DEFERRED + MANUAL + INFRA_PENDING))
FAIL_COUNT=$((BLOCKED + UNREACHABLE))
if [ "$TOTAL" -gt 0 ]; then
PASS_PCT=$(( PASS_COUNT * 100 / TOTAL ))
else
PASS_PCT=0
fi
echo ""
echo "━━━ Cổng kiểm tra (${GATE_THRESHOLD}%) ━━━"
echo " Tổng goals: $TOTAL"
echo " ✅ READY: $READY"
echo " 🔁 DEFERRED: $DEFERRED (tagged depends_on_phase)"
echo " ✋ MANUAL: $MANUAL (tagged verification_strategy)"
echo " ♻ INFRA: $INFRA_PENDING (ngoài ENV hiện tại)"
echo " ⛔ BLOCKED: $BLOCKED"
echo " ❓ UNREACHABLE:$UNREACHABLE"
echo " Tỉ lệ đạt: ${PASS_PCT}% (yêu cầu ≥${GATE_THRESHOLD}%)"
echo ""
export GATE_THRESHOLD PASS_COUNT FAIL_COUNT PASS_PCT TOTAL
export READY DEFERRED MANUAL BLOCKED UNREACHABLE INFRA_PENDING
else
echo "⚠ GOAL-COVERAGE-MATRIX.md không tồn tại — không tính được gate."
export FAIL_COUNT=999 PASS_PCT=0 GATE_THRESHOLD=100
fi
session_mark_step "4f-gate-decision"
# Quy tắc:
# GATE_THRESHOLD == 100 (default) → PASS iff FAIL_COUNT == 0
# GATE_THRESHOLD < 100 (legacy) → PASS iff PASS_PCT >= threshold
if [ "$GATE_THRESHOLD" = "100" ]; then
GATE_PASS=$([ "$FAIL_COUNT" -eq 0 ] && echo "true" || echo "false")
else
GATE_PASS=$([ "$PASS_PCT" -ge "$GATE_THRESHOLD" ] && echo "true" || echo "false")
fi
if [ "$GATE_PASS" = "true" ]; then
echo "✅ Cổng ĐẠT — phase sẵn sàng cho /vg:test ${PHASE_NUMBER}"
echo ""
# v1.14.0+ A.4 — write trigger cho CROSS-PHASE-DEPS aggregator
# Nếu có goal DEFERRED → append vào .vg/CROSS-PHASE-DEPS.md (idempotent)
if [ "$DEFERRED" -gt 0 ]; then
CPD_SCRIPT="${REPO_ROOT}/.claude/scripts/vg_cross_phase_deps.py"
if [ -f "$CPD_SCRIPT" ]; then
PYTHONIOENCODING=utf-8 ${PYTHON_BIN} "$CPD_SCRIPT" append "$PHASE_NUMBER" 2>&1 | sed 's/^/ /'
else
echo "⚠ vg_cross_phase_deps.py missing — DEFERRED entries không được aggregate." >&2
fi
echo "ℹ Có $DEFERRED goal DEFERRED (chờ phase phụ thuộc). Xem .vg/CROSS-PHASE-DEPS.md"
fi
if [ "$MANUAL" -gt 0 ]; then
echo "ℹ Có $MANUAL goal MANUAL. /vg:accept sẽ prompt checklist người dùng."
fi
if [ "$INFRA_PENDING" -gt 0 ]; then
echo "ℹ Có $INFRA_PENDING goal chờ infra (ngoài ENV). Re-run với --sandbox nếu cần."
fi
else
echo "⛔ Cổng BỊ CHẶN — còn $FAIL_COUNT goal chưa kết thúc."
echo ""
# Gợi ý hành động theo loại fail
if [ "$BLOCKED" -gt 0 ]; then
echo " 🛠 $BLOCKED goal BLOCKED (scan chạy nhưng criteria fail):"
echo " → Sửa code → re-run /vg:review ${PHASE_NUMBER} --fix-only"
fi
if [ "$UNREACHABLE" -gt 0 ]; then
echo " ❓ $UNREACHABLE goal UNREACHABLE (không reach được UI hoặc chưa build):"
echo " → Đọc ${PHASE_DIR}/UNREACHABLE-TRIAGE.md — mỗi goal có verdict + action gợi ý"
echo " → cross-phase-pending → /vg:amend ${PHASE_NUMBER} thêm depends_on_phase tag"
echo " → bug-this-phase → /vg:build ${PHASE_NUMBER} --gaps-only"
echo " → scope-amend destructive→ user confirm amendment rồi re-run review"
# Nếu có pending queue, nhắc
if [ -f ".vg/PENDING-USER-REVIEW.md" ]; then
PENDING_CNT=$(grep -c "^| ${PHASE_NUMBER} " ".vg/PENDING-USER-REVIEW.md" 2>/dev/null || echo 0)
[ "$PENDING_CNT" -gt 0 ] && echo " → $PENDING_CNT mục đang chờ người duyệt (.vg/PENDING-USER-REVIEW.md)"
fi
fi
echo ""
echo "Không còn đường thoát tự động — scope tag phải declare ở /vg:scope (không phải review tự gán)."
# Exit với mã lỗi để caller biết gate fail
exit 1
fi
## UNREACHABLE Triage — legacy guard (v1.14.0+)
Từ v1.14.0, triage chạy INLINE trong Phase 4d (ngay trước cổng 100%). Step này chỉ còn là guard cho trường hợp legacy flow đi vòng (ví dụ --skip-discovery + --fix-only nhảy qua 4d). Nếu .unreachable-triage.json đã tồn tại từ 4d → skip; nếu chưa → chạy fallback.
TRIAGE_JSON="${PHASE_DIR}/.unreachable-triage.json"
if [ -f "$TRIAGE_JSON" ]; then
echo "ℹ Triage đã chạy inline ở Phase 4d — skip legacy guard."
else
session_mark_step "4f-unreachable-triage-legacy"
echo ""
echo "🔍 Legacy path: UNREACHABLE triage fallback (4d bị bỏ qua)..."
source "${REPO_ROOT}/.claude/commands/vg/_shared/lib/unreachable-triage.sh" 2>/dev/null || true
if type -t triage_unreachable_goals >/dev/null 2>&1; then
triage_unreachable_goals "$PHASE_DIR" "$PHASE_NUMBER"
else
echo "⚠ unreachable-triage.sh missing — triage skipped" >&2
fi
fi
Lưu ý v1.14.0+: Triage không còn là "report-only cho accept gate". Triage SINH action_required, review 4d ÁP DỤNG autonomous action (mark_deferred/mark_manual) và BLOCK gate cho action cần người duyệt (spawn_fix_agent, draft_amendment_ask, prompt_scope_tag). Xem spec section A.2.
## CrossAI Review (mandatory when CLIs are configured)If config.crossai_clis is empty, emit an explicit skip note and continue. If --skip-crossai is present, it must have override-debt evidence because objective review is otherwise a silent quality downgrade.
Prepare context with RUNTIME-MAP + GOAL-COVERAGE-MATRIX + TEST-GOALS.
Set $LABEL="review-check". Follow crossai-invoke.md exactly: child CLIs run
through the isolated CrossAI runner and the gate consumes normalized
${PHASE_DIR}/crossai/review-check.xml, not raw child XML.
Required evidence when not skipped:
${PHASE_DIR}/crossai/review-check.xmlcrossai.verdict telemetry event
Write order: JSON first, then derive MD from it.
1. ${PHASE_DIR}/RUNTIME-MAP.json — canonical JSON (source of truth). MUST be written FIRST.
2. ${PHASE_DIR}/RUNTIME-MAP.md — derived from JSON (human-readable). Written AFTER JSON.
3. ${PHASE_DIR}/GOAL-COVERAGE-MATRIX.md — from Phase 4
4. ${PHASE_DIR}/element-counts.json — from Phase 1b
After writing all files, verify they exist before committing:
Required files — BLOCK commit if ANY missing:
✓ ${PHASE_DIR}/RUNTIME-MAP.json ← downstream /vg:test reads this, NOT .md
✓ ${PHASE_DIR}/RUNTIME-MAP.md
✓ ${PHASE_DIR}/GOAL-COVERAGE-MATRIX.md
Use Glob to confirm each file exists. If RUNTIME-MAP.json is missing,
you MUST create it before proceeding. The .md alone is NOT sufficient.
Commit:
git add ${PHASE_DIR}/RUNTIME-MAP.json ${PHASE_DIR}/RUNTIME-MAP.md \
${PHASE_DIR}/GOAL-COVERAGE-MATRIX.md ${PHASE_DIR}/element-counts.json \
${SCREENSHOTS_DIR}/
# UNREACHABLE-TRIAGE artifacts (only exist if triage ran — i.e., any UNREACHABLE goal)
[ -f "${PHASE_DIR}/UNREACHABLE-TRIAGE.md" ] && git add "${PHASE_DIR}/UNREACHABLE-TRIAGE.md"
[ -f "${PHASE_DIR}/.unreachable-triage.json" ] && git add "${PHASE_DIR}/.unreachable-triage.json"
git commit -m "review({phase}): RUNTIME-MAP — {views} views, {actions} actions, gate {PASS|BLOCK}"
## End-of-Step Reflection (v1.15.0 Bootstrap Overlay)
Before closing review, spawn the reflector subagent to analyze this step's artifacts + user messages + telemetry and draft learning candidates for user review. Primary path for project self-adaptation.
Skip conditions (reflection does nothing, exit 0):
.vg/bootstrap/ directory absent (project hasn't opted in)config.bootstrap.reflection_enabled == false (user disabled)FAIL with fatal errors (reflect when next review succeeds)BOOTSTRAP_DIR=".vg/bootstrap"
if [ ! -d "$BOOTSTRAP_DIR" ]; then
# Bootstrap not opted in — skip silently
:
else
REFLECT_TS=$(date -u +%Y%m%dT%H%M%SZ)
REFLECT_OUT="${PHASE_DIR}/reflection-review-${REFLECT_TS}.yaml"
USER_MSG_FILE="${VG_TMP}/reflect-user-msgs-${REFLECT_TS}.txt"
# Extract user messages sent during this step from Claude transcript (if accessible).
# If no transcript API, reflector uses artifacts + telemetry + git log only.
# Orchestrator may populate USER_MSG_FILE from session context.
: > "$USER_MSG_FILE"
# Filter telemetry entries to this phase+step within last 4 hours
TELEMETRY_SLICE="${VG_TMP}/reflect-telemetry-${REFLECT_TS}.jsonl"
grep -E "\"phase\":\"${PHASE}\".*\"command\":\"vg:review\"" "${PLANNING_DIR}/telemetry.jsonl" 2>/dev/null \
| tail -200 > "$TELEMETRY_SLICE" || true
# Collect override-debt entries created in this step
OVERRIDE_SLICE="${VG_TMP}/reflect-overrides-${REFLECT_TS}.md"
grep -E "\"step\":\"review\"" "${PLANNING_DIR}/OVERRIDE-DEBT.md" 2>/dev/null > "$OVERRIDE_SLICE" || true
echo "📝 Running end-of-step reflection (Haiku, isolated context)..."
fi
Use Agent tool with skill vg-reflector, model haiku, fresh context:
Agent(
description="End-of-step reflection for review phase {PHASE}",
subagent_type="general-purpose",
prompt="""
Use skill: vg-reflector
Arguments:
STEP = "review"
PHASE = "{PHASE}"
PHASE_DIR = "{PHASE_DIR absolute path}"
USER_MSG_FILE = "{USER_MSG_FILE}"
TELEMETRY_FILE = "{TELEMETRY_SLICE}"
OVERRIDE_FILE = "{OVERRIDE_SLICE}"
ACCEPTED_MD = ".vg/bootstrap/ACCEPTED.md"
REJECTED_MD = ".vg/bootstrap/REJECTED.md"
OUT_FILE = "{REFLECT_OUT}"
Read .claude/skills/vg-reflector/SKILL.md and follow workflow exactly.
Do NOT read parent conversation transcript — echo chamber forbidden.
Output max 3 candidates with evidence to OUT_FILE.
"""
)
After reflector exits, parse OUT_FILE. If candidates found, show to user:
📝 Reflection — review phase {PHASE} found {N} learning(s):
[1] {title}
Type: {type}
Scope: {scope}
Evidence: {count} items — {sample}
Confidence: {confidence}
→ Proposed: {target summary}
[y] ghi sổ tay [n] reject [e] edit inline [s] skip lần này
[2] ...
User gõ: y/n/e/s cho từng item, hoặc "all-defer" để bỏ qua toàn bộ.
For y → delegate to /vg:learn --promote L-{id} internally (validates schema,
dry-run preview, git commit).
For n → append to REJECTED.md with user reason.
For e → interactive field-by-field edit loop (not external editor):
Editing [1]:
(1) title: "{current}"
(2) scope: {current}
(3) prose: "{current}"
(4) target_step: {current}
Field to edit? [1-4/done]: _
For s → leave candidate in .vg/bootstrap/CANDIDATES.md, user reviews later via /vg:learn --review.
emit_telemetry "bootstrap.reflection_ran" PASS \
"{\"step\":\"review\",\"phase\":\"${PHASE}\",\"candidates\":${CANDIDATE_COUNT:-0}}"
Reflector crash or timeout → log warning, continue to complete step. Never block review completion.
⚠ Reflection failed — review completes normally. Check .vg/bootstrap/logs/
**Update PIPELINE-STATE.json:**
```bash
# VG-native state update (no GSD dependency)
PIPELINE_STATE="${PHASE_DIR}/PIPELINE-STATE.json"
${PYTHON_BIN} -c "
import json; from pathlib import Path
p = Path('${PIPELINE_STATE}')
s = json.loads(p.read_text(encoding='utf-8')) if p.exists() else {}
s['status'] = 'reviewed'; s['pipeline_step'] = 'review-complete'
s['updated_at'] = __import__('datetime').datetime.now().isoformat()
p.write_text(json.dumps(s, indent=2))
" 2>/dev/null
```
v2.6.1 (2026-04-26): Auto-resolve hotfix debt entries from prior phases.
If THIS phase's review ran clean (no --allow-orthogonal-hotfix /
--allow-no-bugref / --allow-empty-bugfix overrides hit), prior phases'
OPEN debt entries with matching gate_id auto-resolve. Closes AUDIT.md D2 H4
(hotfix overrides had no natural resolution path → debt piled up forever).
Each gate_id maps to a specific re-run condition that the current clean review proves: if review passed without orthogonal-hotfix override, the "goal-coverage" condition is satisfied for prior phases too.
source "${REPO_ROOT}/.claude/commands/vg/_shared/lib/override-debt.sh" 2>/dev/null || true
if type -t override_auto_resolve_clean_run >/dev/null 2>&1; then
# Only resolve if THIS phase didn't fall back to the override itself
if [[ ! "${ARGUMENTS}" =~ --allow-orthogonal-hotfix ]]; then
override_auto_resolve_clean_run "review-goal-coverage" "${PHASE_NUMBER}" \
"review-clean-${PHASE_NUMBER}-$(date -u +%s)" 2>&1 | sed 's/^/ /'
fi
if [[ ! "${ARGUMENTS}" =~ --allow-no-bugref ]]; then
override_auto_resolve_clean_run "bugfix-bugref-required" "${PHASE_NUMBER}" \
"review-clean-${PHASE_NUMBER}-$(date -u +%s)" 2>&1 | sed 's/^/ /'
fi
if [[ ! "${ARGUMENTS}" =~ --allow-empty-bugfix ]]; then
override_auto_resolve_clean_run "bugfix-code-delta-required" "${PHASE_NUMBER}" \
"review-clean-${PHASE_NUMBER}-$(date -u +%s)" 2>&1 | sed 's/^/ /'
fi
fi
Display — VERDICT-AWARE next steps (MANDATORY format).
The closing message MUST follow this structure regardless of orchestrator (Claude / Codex / Gemini). Every finding section MUST end with a concrete actionable command, not just a description.
Review complete for Phase {N} — PASS.
Goals: {ready}/{total} READY ({pct}%)
Gate: PASS (critical {C}/{C} 100%, important {I}/{I_total} ≥80%)
Artifacts: RUNTIME-MAP.json + GOAL-COVERAGE-MATRIX.md{REVIEW_FEEDBACK_SUFFIX}
Next:
/vg:test {phase} # codegen + run regression suite
Review complete for Phase {N} — FLAG ({N} non-blocking findings).
Goals: {ready}/{total} READY
Gate: PASS-WITH-FLAGS
Findings (improvements — non-blocking):
- [Med] {one-line summary} → fix at {file:line}, then commit
- [Low] {one-line summary} → defer or fix at {file:line}
... (full detail in REVIEW-FEEDBACK.md)
Next (pick one):
/vg:test {phase} # proceed — flags are advisory
edit {file:line}; git commit; /vg:next # fix flags first, then continue
Review complete for Phase {N} — BLOCK.
Goals: {ready}/{total} READY ({blocked} BLOCKED, {failed} FAILED, {unreach} UNREACHABLE)
Gate: BLOCK ({reason — e.g., "critical goal G-03 FAILED" or "infra success_criteria 1/8 READY"})
Findings (severity-grouped — full detail in REVIEW-FEEDBACK.md):
⛔ Critical/Nghiêm trọng ({N}):
1. {one-line summary}
↳ Fix: {concrete action — file:line, command, or workflow}
↳ Verify: {how to confirm — curl, test, diff}
2. ...
⚠ High/Cao ({N}):
... (same format)
ⓘ Medium/Trung bình ({N}):
... (same format)
Next steps (pick the matching path — DO NOT just re-run /vg:review blindly):
A. Fix code bugs found → re-review:
# Edit affected files (paths above), then stage + commit as SEPARATE
# steps (v2.5.2.7: don't chain staging with commit — if commit-msg
# hook BLOCKs on missing citation, prior `git add` success gets
# masked by the red "Exit 1" UI label):
git add path/to/fixed-file.ts # stage intentional files
git commit -m "fix({phase}-XX): {summary}
Per CONTEXT.md D-XX OR Per API-CONTRACTS.md" # body must cite
/vg:review {phase} --retry-failed # only re-scan failed goals (faster)
# OR /vg:review {phase} # full re-scan if many fixes
B. If findings need scope discussion (architectural, decision change):
/vg:amend {phase} # mid-phase change request workflow
# then re-blueprint + re-build before re-review
C. If findings are infra/env (services down, config missing):
/vg:doctor # diagnose env + service health
# fix infra → /vg:review {phase}
D. If finding is BUG in /vg:review tooling itself (not phase code):
/vg:bug-report # surface to vietdev99/vgflow
E. If you DISAGREE with verdict (false positive):
# Open REVIEW-FEEDBACK.md, dispute specific finding with evidence
/vg:review {phase} --override-reason "..." --allow-failed=G-XX
# Will register in OVERRIDE-DEBT — re-evaluated at /vg:accept
apps/api/src/plugins/health.ts:23), NOT absolute (/D/Workspace/...). Absolute paths waste 60% of terminal width on repeated prefixes.{N}. [Severity] {ONE LINE root-cause}
↳ Fix: {file:line edit OR shell command OR workflow}
↳ Verify: {1-line check command OR test ID}
↳ Refs: {file:line, file:line} (only if 2+ refs needed)
# v2.2 — complete step marker + terminal emit + run-complete
mkdir -p "${PHASE_DIR}/.step-markers" 2>/dev/null
(type -t mark_step >/dev/null 2>&1 && mark_step "${PHASE_NUMBER:-unknown}" "complete" "${PHASE_DIR}") || touch "${PHASE_DIR}/.step-markers/complete.done"
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator mark-step review complete 2>/dev/null || true
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator mark-step review 0_parse_and_validate 2>/dev/null || true
READY_COUNT=$(grep -c "READY" "${PHASE_DIR}/GOAL-COVERAGE-MATRIX.md" 2>/dev/null || echo 0)
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator emit-event "review.completed" --payload "{\"phase\":\"${PHASE_NUMBER}\",\"goals_ready\":${READY_COUNT}}" >/dev/null
# v2.38.0 — Flow compliance audit
if [[ "$ARGUMENTS" =~ --skip-compliance=\"([^\"]*)\" ]]; then
COMP_REASON="${BASH_REMATCH[1]}"
else
COMP_REASON=""
fi
COMP_SEV=$(vg_config_get "flow_compliance.severity" "warn" 2>/dev/null || echo "warn")
COMP_ARGS=( "--phase-dir" "$PHASE_DIR" "--command" "review" "--severity" "$COMP_SEV" )
[ -n "$COMP_REASON" ] && COMP_ARGS+=( "--skip-compliance=$COMP_REASON" )
${PYTHON_BIN:-python3} .claude/scripts/verify-flow-compliance.py "${COMP_ARGS[@]}"
COMP_RC=$?
if [ "$COMP_RC" -ne 0 ] && [ "$COMP_SEV" = "block" ]; then
echo "⛔ Review flow compliance failed. See .flow-compliance-review.yaml or pass --skip-compliance=\"<reason>\"."
exit 1
fi
# v2.45 fail-closed-validators PR: matrix↔runtime evidence cross-check.
# Phase 3.2 dogfood found GOAL-COVERAGE-MATRIX.md fabricating READY status
# even when goal_sequences[].result == "blocked" or sequence missing entirely.
# This validator catches the fabrication BEFORE review exits, so /vg:test
# never sees a lying matrix.
MATRIX_LINK_VAL=".claude/scripts/validators/verify-matrix-evidence-link.py"
if [ -f "$MATRIX_LINK_VAL" ]; then
${PYTHON_BIN:-python3} "$MATRIX_LINK_VAL" --phase-dir "$PHASE_DIR" --severity block
MATRIX_LINK_RC=$?
if [ "$MATRIX_LINK_RC" -ne 0 ]; then
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator emit-event "review.matrix_evidence_link_blocked" --payload "{\"phase\":\"${PHASE_NUMBER}\"}" >/dev/null 2>&1 || true
source scripts/lib/blocking-gate-prompt.sh
EVIDENCE_PATH="${PHASE_DIR}/.vg/matrix-evidence-link-evidence.json"
mkdir -p "$(dirname "$EVIDENCE_PATH")"
cat > "$EVIDENCE_PATH" <<JSON
{
"gate": "matrix_evidence_link",
"summary": "Review matrix-evidence-link gate failed — GOAL-COVERAGE-MATRIX.md asserts goal status that runtime evidence does not support",
"fix_hint": "1. Re-run /vg:review ${PHASE_NUMBER} --retry-failed (record real sequences); 2. OR reclassify goals to UNREACHABLE/INFRA_PENDING/DEFERRED with justification"
}
JSON
blocking_gate_prompt_emit "matrix_evidence_link" "$EVIDENCE_PATH" "warn"
# AI controller calls AskUserQuestion → resolve via Leg 2.
# Leg 2 exit codes: 0=continue, 1=continue-with-debt, 2=route-amend (exit 0), 3=abort (exit 1), 4=re-prompt.
fi
fi
# RFC v9 PR-D2 Codex-HIGH-1 fix: post_state lifecycle gate AFTER scanner ran.
# Pre_state checked at Phase 0.5 (pre-scanner); post_state must verify the
# action actually mutated state correctly. Without this leg, RCRURD is half-
# wired (Codex review found pre-only).
# Re-resolve base URL from config (PRE_BASE local to Phase 0.5; not in scope).
# Codex-R4-HIGH-2 fix: read base_url from ENV-CONTRACT.md (canonical),
# not from vg.config.md step_env (env NAME, not URL).
POST_BASE_RC=$("${PYTHON_BIN:-python3}" -c "
import re
try:
text = open('${PHASE_DIR}/ENV-CONTRACT.md', encoding='utf-8').read()
m = re.search(r'^target:\s*\n((?:[ \t].*\n)+)', text, re.MULTILINE)
if m:
bm = re.search(r'^\s*base_url:\s*[\"\\']?([^\"\\'\s#]+)', m.group(1), re.MULTILINE)
if bm: print(bm.group(1))
except FileNotFoundError: pass
" 2>/dev/null)
[ -z "$POST_BASE_RC" ] && POST_BASE_RC="${VG_BASE_URL:-}"
HAS_POST_LIFECYCLE=$(grep -lE '^\s*post_state:' "${PHASE_DIR}/FIXTURES"/*.yaml 2>/dev/null | wc -l | tr -d ' ')
# Codex-HIGH-5-bis fix: post_state setup error blocks even without runner exec
if [ "$HAS_POST_LIFECYCLE" -gt 0 ] && [ -z "$POST_BASE_RC" ] && \
[ "${VG_RCRURD_POST_SEVERITY:-block}" = "block" ]; then
echo "⛔ RCRURD post_state setup error — FIXTURES declare post_state"
echo " blocks but no sandbox base_url available at run-complete."
exit 1
fi
VG_SCRIPT_ROOT="${REPO_ROOT}/.claude/scripts"
[ -f "${VG_SCRIPT_ROOT}/rcrurd-preflight.py" ] || VG_SCRIPT_ROOT="${REPO_ROOT}/scripts"
if [ -f "${VG_SCRIPT_ROOT}/rcrurd-preflight.py" ] && \
[ -d "${PHASE_DIR}/FIXTURES" ] && [ -n "$POST_BASE_RC" ]; then
# Codex-HIGH-1-ter fix: snapshot must exist when delta assertions present
POST_NEEDS_SNAP=$(grep -lE 'increased_by_at_least|decreased_by_at_least' \
"${PHASE_DIR}/FIXTURES"/*.yaml 2>/dev/null | wc -l | tr -d ' ')
if [ "$POST_NEEDS_SNAP" -gt 0 ] && [ ! -f "${PHASE_DIR}/.rcrurd-pre-snapshot.json" ] && \
[ "${VG_RCRURD_POST_SEVERITY:-block}" = "block" ]; then
echo "⛔ RCRURD post_state — pre-snapshot missing but ${POST_NEEDS_SNAP}"
echo " fixture(s) declare delta assertions. Pre-mode at Phase 0.5"
echo " should have captured it. Re-run /vg:review from scratch."
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator emit-event "review.rcrurd_post_snapshot_missing" --payload "{\"phase\":\"${PHASE_NUMBER}\"}" >/dev/null 2>&1 || true
exit 1
fi
POST_OUT=$("${PYTHON_BIN:-python3}" "${VG_SCRIPT_ROOT}/rcrurd-preflight.py" \
--phase "$PHASE_NUMBER" --base-url "$POST_BASE_RC" \
--mode post --severity "${VG_RCRURD_POST_SEVERITY:-block}" \
--pre-snapshot "${PHASE_DIR}/.rcrurd-pre-snapshot.json" 2>&1)
POST_RC=$?
echo "▸ RCRURD post_state: $(echo "$POST_OUT" | "${PYTHON_BIN:-python3}" -c '
import json, sys
try:
d = json.loads(sys.stdin.read())
v = d.get("verdict","?")
if v == "BLOCK":
print(f"BLOCK ({d.get(\"failed\",0)}/{d.get(\"checked\",0)} post-state assertions failed)")
elif v == "WARN":
print(f"WARN ({d.get(\"failed\",0)}/{d.get(\"checked\",0)} failed)")
elif v == "PASS":
print(f"PASS ({d.get(\"checked\",0)} fixtures)")
else:
print(f"ERROR: {d.get(\"error\",\"unknown\")[:200]}")
except: print("parse-error")
')"
if [ "$POST_RC" -eq 2 ]; then
echo "⛔ RCRURD post_state setup error — cannot proceed:"
echo "$POST_OUT" | "${PYTHON_BIN:-python3}" -c '
import json, sys
try:
d = json.loads(sys.stdin.read())
print(f" {d.get(\"error\",\"unknown\")[:300]}")
except: print(" (could not parse error)")
'
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator emit-event "review.rcrurd_post_setup_error" --payload "{\"phase\":\"${PHASE_NUMBER}\"}" >/dev/null 2>&1 || true
exit 1
fi
if [ "$POST_RC" -eq 1 ] && [ "${VG_RCRURD_POST_SEVERITY:-block}" = "block" ]; then
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator emit-event "review.rcrurd_post_state_blocked" --payload "{\"phase\":\"${PHASE_NUMBER}\"}" >/dev/null 2>&1 || true
source scripts/lib/blocking-gate-prompt.sh
EVIDENCE_PATH="${PHASE_DIR}/.vg/rcrurd-post-state-evidence.json"
mkdir -p "$(dirname "$EVIDENCE_PATH")"
cat > "$EVIDENCE_PATH" <<JSON
{
"gate": "rcrurd_post_state",
"summary": "RCRURD post_state gate BLOCK — fixture lifecycle assertions failed after the scanner action. State did not transition as expected.",
"fix_hint": "Re-run scanner if action genuinely succeeded; or fix the action's expected_network/post_state assertion drift."
}
JSON
blocking_gate_prompt_emit "rcrurd_post_state" "$EVIDENCE_PATH" "error"
# AI controller calls AskUserQuestion → resolve via Leg 2.
# Leg 2 exit codes: 0=continue, 1=continue-with-debt, 2=route-amend (exit 0), 3=abort (exit 1), 4=re-prompt.
fi
fi
# v2.46-wave3.2 matrix-staleness final gate: after matrix is BUILT (Phase 4),
# re-run staleness validator at review-complete time. Step 0 entry pass catches
# stale entries from PRIOR runs; this catches stale entries created by THIS run
# (e.g. scanner recorded sequence but skipped submit, then matrix wrote READY).
# Phase 3.2 dogfood found 36/39 mutation goals stale despite verdict=PASS.
MATRIX_STALE_VAL=".claude/scripts/validators/verify-matrix-staleness.py"
if [ -f "$MATRIX_STALE_VAL" ]; then
STALE_SEV="block"
[[ "${ARGUMENTS}" =~ --allow-stale-matrix ]] && STALE_SEV="warn"
${PYTHON_BIN:-python3} "$MATRIX_STALE_VAL" --phase "${PHASE_NUMBER}" --severity "$STALE_SEV"
STALE_RC=$?
if [ "$STALE_RC" -ne 0 ] && [ "$STALE_SEV" = "block" ]; then
SUSPECTED_N=$(${PYTHON_BIN:-python3} -c "
import json
try: print(json.load(open('${PHASE_DIR}/.matrix-staleness.json'))['suspected_count'])
except: print('?')
")
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator emit-event "review.matrix_staleness_blocked" --payload "{\"phase\":\"${PHASE_NUMBER}\",\"suspected\":${SUSPECTED_N}}" >/dev/null 2>&1 || true
source scripts/lib/blocking-gate-prompt.sh
EVIDENCE_PATH="${PHASE_DIR}/.vg/matrix-staleness-evidence.json"
mkdir -p "$(dirname "$EVIDENCE_PATH")"
cat > "$EVIDENCE_PATH" <<JSON
{
"gate": "matrix_staleness",
"summary": "Matrix-staleness gate failed — ${SUSPECTED_N} mutation goal(s) marked READY without submit/2xx evidence",
"fix_hint": "1. /vg:review ${PHASE_NUMBER} --retry-failed; 2. /vg:review ${PHASE_NUMBER} --re-scan-goals=G-XX,G-YY; 3. /vg:review ${PHASE_NUMBER} --dogfood; 4. /vg:review ${PHASE_NUMBER} --allow-stale-matrix --override-reason=... (debt)"
}
JSON
blocking_gate_prompt_emit "matrix_staleness" "$EVIDENCE_PATH" "warn"
# AI controller calls AskUserQuestion → resolve via Leg 2.
# Leg 2 exit codes: 0=continue, 1=continue-with-debt, 2=route-amend (exit 0), 3=abort (exit 1), 4=re-prompt.
fi
fi
# v2.46-wave3.2.3 (RFC v9 D10) — evidence provenance gate. Mutation steps
# claiming success (action + 2xx network) MUST carry structured evidence:
# {source, artifact_hash, captured_at, schema_version, scanner_run_id |
# layer2_proposal_id}. Closes the trust hole where executor agents could
# fabricate evidence to flip matrix-staleness bidirectional sync.
#
# Codex-HIGH-4 fix: default to BLOCK (was warn). Migration grace via
# explicit `review.provenance.enforcement: warn` in vg.config.md OR
# --allow-legacy-provenance flag for phases pre-dating RFC v9.
PROV_VAL=".claude/scripts/validators/verify-evidence-provenance.py"
if [ -f "$PROV_VAL" ]; then
# Resolve enforcement from config — env var wins, then grep config, default block
PROV_MODE="${VG_PROVENANCE_ENFORCEMENT:-}"
if [ -z "$PROV_MODE" ] && [ -n "${CONFIG_RAW:-}" ]; then
PROV_MODE=$(echo "$CONFIG_RAW" | grep -A2 '^review:' | grep -E '^\s*provenance:' -A2 | \
grep -E '^\s*enforcement:' | head -1 | sed 's/.*enforcement:\s*//;s/[\"'\'']//g' | tr -d ' ')
fi
[ -z "$PROV_MODE" ] && PROV_MODE="block"
PROV_FLAGS="--severity ${PROV_MODE}"
# During migration window, allow legacy phases without provenance
[[ "${ARGUMENTS}" =~ --allow-legacy-provenance ]] && PROV_FLAGS="$PROV_FLAGS --allow-legacy"
${PYTHON_BIN:-python3} "$PROV_VAL" --phase "${PHASE_NUMBER}" $PROV_FLAGS
PROV_RC=$?
if [ "$PROV_RC" -ne 0 ] && [ "$PROV_MODE" = "block" ]; then
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator emit-event "review.evidence_provenance_blocked" --payload "{\"phase\":\"${PHASE_NUMBER}\"}" >/dev/null 2>&1 || true
source scripts/lib/blocking-gate-prompt.sh
EVIDENCE_PATH="${PHASE_DIR}/.vg/evidence-provenance-evidence.json"
mkdir -p "$(dirname "$EVIDENCE_PATH")"
cat > "$EVIDENCE_PATH" <<JSON
{
"gate": "evidence_provenance",
"summary": "Evidence provenance gate failed — mutation steps claim success without structured provenance object (RFC v9 D10). Possible fabricated evidence.",
"fix_hint": "1. Re-run scanner: /vg:review ${PHASE_NUMBER} --retry-failed; 2. For legacy phases: --allow-legacy-provenance; 3. Set review.provenance.enforcement: warn in vg.config.md to defer enforcement"
}
JSON
blocking_gate_prompt_emit "evidence_provenance" "$EVIDENCE_PATH" "error"
# AI controller calls AskUserQuestion → resolve via Leg 2.
# Leg 2 exit codes: 0=continue, 1=continue-with-debt, 2=route-amend (exit 0), 3=abort (exit 1), 4=re-prompt.
fi
fi
# v2.46 anti-performative-review: ép scanner phải submit mutation goals,
# không được Cancel modal rồi mark passed. Phase 3.2 dogfood (2026-05-01) tìm
# 5 false-pass goals (G-31/G-34/G-35/G-44/G-52) modal opened nhưng chưa bao giờ
# submit. Validator này check goal_sequences.steps[] có submit click + 2xx
# network entry trước khi cho phép run-complete.
MUT_SUBMIT_VAL=".claude/scripts/validators/verify-mutation-actually-submitted.py"
if [ -f "$MUT_SUBMIT_VAL" ]; then
MUT_FLAGS="--severity block"
if [[ "${ARGUMENTS}" =~ --allow-cancel-only-mutations ]]; then
MUT_FLAGS="--severity block --allow-cancel-only-mutations"
fi
${PYTHON_BIN:-python3} "$MUT_SUBMIT_VAL" --phase "${PHASE_NUMBER}" $MUT_FLAGS
MUT_RC=$?
if [ "$MUT_RC" -ne 0 ]; then
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator emit-event "review.mutation_submit_blocked" --payload "{\"phase\":\"${PHASE_NUMBER}\"}" >/dev/null 2>&1 || true
source scripts/lib/blocking-gate-prompt.sh
EVIDENCE_PATH="${PHASE_DIR}/.vg/mutation-submit-evidence.json"
mkdir -p "$(dirname "$EVIDENCE_PATH")"
cat > "$EVIDENCE_PATH" <<JSON
{
"gate": "mutation_submit",
"summary": "Review mutation-actually-submitted gate failed — mutation goals marked passed without actual submit click + 2xx network",
"fix_hint": "Re-run /vg:review ${PHASE_NUMBER} with scanner prompt requiring SUBMIT, or use --allow-cancel-only-mutations override (logs OVERRIDE-DEBT)"
}
JSON
blocking_gate_prompt_emit "mutation_submit" "$EVIDENCE_PATH" "error"
# AI controller calls AskUserQuestion → resolve via Leg 2.
# Leg 2 exit codes: 0=continue, 1=continue-with-debt, 2=route-amend (exit 0), 3=abort (exit 1), 4=re-prompt.
fi
fi
# v2.46 Phase 6 enrichment: traceability + RCRURD enforcement.
# Closes "AI bịa goal/decision/business-rule" + "scanner stops too early".
# Migration: VG_TRACEABILITY_MODE=warn for pre-2026-05-01 phases (set in
# vg.config.md). New phases default to block.
TRACE_MODE="${VG_TRACEABILITY_MODE:-block}"
# v2.46 L4 — RCRURD step depth (per goal_class threshold)
RCRURD_VAL=".claude/scripts/validators/verify-rcrurd-depth.py"
if [ -f "$RCRURD_VAL" ]; then
RCRURD_FLAGS="--severity ${TRACE_MODE}"
[[ "${ARGUMENTS}" =~ --allow-shallow-scans ]] && RCRURD_FLAGS="$RCRURD_FLAGS --allow-shallow-scans"
${PYTHON_BIN:-python3} "$RCRURD_VAL" --phase "${PHASE_NUMBER}" $RCRURD_FLAGS
RCRURD_RC=$?
if [ "$RCRURD_RC" -ne 0 ] && [ "$TRACE_MODE" = "block" ]; then
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator emit-event "review.rcrurd_depth_blocked" --payload "{\"phase\":\"${PHASE_NUMBER}\"}" >/dev/null 2>&1 || true
source scripts/lib/blocking-gate-prompt.sh
EVIDENCE_PATH="${PHASE_DIR}/.vg/rcrurd-depth-evidence.json"
mkdir -p "$(dirname "$EVIDENCE_PATH")"
cat > "$EVIDENCE_PATH" <<JSON
{
"gate": "rcrurd_depth",
"summary": "RCRURD depth gate failed — scanner stopped too early on mutation goals",
"fix_hint": "See scanner-report-contract.md 'RCRURD Lifecycle Protocol'. Goal class drives min steps."
}
JSON
blocking_gate_prompt_emit "rcrurd_depth" "$EVIDENCE_PATH" "warn"
# AI controller calls AskUserQuestion → resolve via Leg 2.
# Leg 2 exit codes: 0=continue, 1=continue-with-debt, 2=route-amend (exit 0), 3=abort (exit 1), 4=re-prompt.
fi
fi
# v2.46 L4 — asserted_quote vs business rule similarity
ASSERTED_VAL=".claude/scripts/validators/verify-asserted-rule-match.py"
if [ -f "$ASSERTED_VAL" ]; then
ASSERTED_FLAGS="--severity ${TRACE_MODE}"
[[ "${ARGUMENTS}" =~ --allow-asserted-drift ]] && ASSERTED_FLAGS="$ASSERTED_FLAGS --allow-asserted-drift"
${PYTHON_BIN:-python3} "$ASSERTED_VAL" --phase "${PHASE_NUMBER}" $ASSERTED_FLAGS
ASSERTED_RC=$?
if [ "$ASSERTED_RC" -ne 0 ] && [ "$TRACE_MODE" = "block" ]; then
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator emit-event "review.asserted_drift_blocked" --payload "{\"phase\":\"${PHASE_NUMBER}\"}" >/dev/null 2>&1 || true
source scripts/lib/blocking-gate-prompt.sh
EVIDENCE_PATH="${PHASE_DIR}/.vg/asserted-drift-evidence.json"
mkdir -p "$(dirname "$EVIDENCE_PATH")"
cat > "$EVIDENCE_PATH" <<JSON
{
"gate": "asserted_drift",
"summary": "Asserted-rule-match gate failed — scanner asserted_quote drifts from BR-NN text",
"fix_hint": "Align scanner asserted_quote fields with the business rule text in BUSINESS-RULES.md or use --allow-asserted-drift override"
}
JSON
blocking_gate_prompt_emit "asserted_drift" "$EVIDENCE_PATH" "error"
# AI controller calls AskUserQuestion → resolve via Leg 2.
# Leg 2 exit codes: 0=continue, 1=continue-with-debt, 2=route-amend (exit 0), 3=abort (exit 1), 4=re-prompt.
fi
fi
# v2.46 L4 — replay-evidence (structural + optional curl replay)
REPLAY_VAL=".claude/scripts/validators/verify-replay-evidence.py"
if [ -f "$REPLAY_VAL" ]; then
REPLAY_FLAGS="--severity warn" # default warn — auth fixture not always available
[[ "${ARGUMENTS}" =~ --enable-replay ]] && REPLAY_FLAGS="--severity ${TRACE_MODE} --enable-replay"
${PYTHON_BIN:-python3} "$REPLAY_VAL" --phase "${PHASE_NUMBER}" $REPLAY_FLAGS
REPLAY_RC=$?
if [ "$REPLAY_RC" -ne 0 ] && [ "$TRACE_MODE" = "block" ] && [[ "${ARGUMENTS}" =~ --enable-replay ]]; then
echo "⛔ Replay-evidence gate failed — scanner network claims can't be verified."
exit 1
fi
fi
# v2.46 L4 — cross-phase decision validity (D-XX from earlier phase still active)
CROSS_VAL=".claude/scripts/validators/verify-cross-phase-decision-validity.py"
if [ -f "$CROSS_VAL" ]; then
CROSS_FLAGS="--severity ${TRACE_MODE}"
[[ "${ARGUMENTS}" =~ --allow-stale-decisions ]] && CROSS_FLAGS="$CROSS_FLAGS --allow-stale-decisions"
${PYTHON_BIN:-python3} "$CROSS_VAL" --phase "${PHASE_NUMBER}" $CROSS_FLAGS
CROSS_RC=$?
if [ "$CROSS_RC" -ne 0 ] && [ "$TRACE_MODE" = "block" ]; then
echo "⛔ Cross-phase decision validity failed — goal cites revoked/missing D-XX."
exit 1
fi
fi
# v2.46 L6 — adversarial scanner-business-alignment verifier
# Two-phase: emit prompts → orchestrator spawns Haiku verifier per prompt →
# re-run validator with --verifier-results to gate.
ALIGN_VAL=".claude/scripts/validators/verify-scanner-business-alignment.py"
if [ -f "$ALIGN_VAL" ]; then
PROMPTS_FILE="${PHASE_DIR}/.tmp/business-alignment-prompts.jsonl"
RESULTS_FILE="${PHASE_DIR}/.tmp/business-alignment-results.jsonl"
mkdir -p "$(dirname "$PROMPTS_FILE")" 2>/dev/null
${PYTHON_BIN:-python3} "$ALIGN_VAL" --phase "${PHASE_NUMBER}" --prompts-out "$PROMPTS_FILE" 2>&1 | head -3
PROMPT_COUNT=$(wc -l < "$PROMPTS_FILE" 2>/dev/null | tr -d ' ' || echo 0)
if [ "$PROMPT_COUNT" -gt 0 ]; then
echo ""
echo "📋 Business alignment verifier needs ${PROMPT_COUNT} adversarial check(s)."
echo " Orchestrator should spawn isolated Haiku per prompt + write JSONL results to:"
echo " ${RESULTS_FILE}"
echo " Then re-run review with --verifier-results=${RESULTS_FILE}"
echo ""
# If results file exists from prior orchestrator pass, gate now
if [ -f "$RESULTS_FILE" ]; then
ALIGN_FLAGS="--severity ${TRACE_MODE} --verifier-results ${RESULTS_FILE}"
[[ "${ARGUMENTS}" =~ --allow-business-drift ]] && ALIGN_FLAGS="$ALIGN_FLAGS --allow-business-drift"
${PYTHON_BIN:-python3} "$ALIGN_VAL" --phase "${PHASE_NUMBER}" $ALIGN_FLAGS
ALIGN_RC=$?
if [ "$ALIGN_RC" -ne 0 ] && [ "$TRACE_MODE" = "block" ]; then
echo "⛔ Business alignment gate failed — adversarial verifier flagged drift."
exit 1
fi
fi
fi
fi
# RFC v9 D21 — DEFECT-LOG.md generation (tester pro).
# After GOAL-COVERAGE-MATRIX is final, parse the matrix and create one
# Defect entry per goal with status ∈ {BLOCKED, UNREACHABLE, FAILED, SUSPECTED}
# that does NOT already have an open defect in .tester-pro/defects.json.
# Severity inferred from priority + block_family heuristics.
TESTER_PRO_CLI="${REPO_ROOT}/.claude/scripts/tester-pro-cli.py"
[ -f "$TESTER_PRO_CLI" ] || TESTER_PRO_CLI="${REPO_ROOT}/scripts/tester-pro-cli.py"
if [ -f "$TESTER_PRO_CLI" ] && [ -f "${PHASE_DIR}/GOAL-COVERAGE-MATRIX.md" ]; then
echo "━━━ D21 — Defect log generation ━━━"
TESTER_PRO_CLI="$TESTER_PRO_CLI" ${PYTHON_BIN:-python3} - <<'PYDEFECT' 2>&1 | sed 's/^/ D21: /' || true
import json, os, re, subprocess, sys
phase_dir = os.environ['PHASE_DIR']
phase_no = os.environ['PHASE_NUMBER']
matrix = open(os.path.join(phase_dir, 'GOAL-COVERAGE-MATRIX.md'),
encoding='utf-8').read()
cli = os.environ.get(
'TESTER_PRO_CLI',
os.path.join(os.environ['REPO_ROOT'], 'scripts', 'tester-pro-cli.py'),
)
# Parse rows `| G-XX | priority | surface | STATUS | evidence |`
row_re = re.compile(
r"^\|\s*(G-[\w.-]+)\s*\|\s*([^|]+?)\s*\|\s*([^|]+?)\s*\|"
r"\s*([A-Z_]+)\s*\|\s*(.+?)\s*\|", re.MULTILINE,
)
fail_states = {"BLOCKED", "UNREACHABLE", "FAILED", "SUSPECTED"}
# Load existing open defects to skip duplicates by (goal, title-prefix)
defects_path = os.path.join(phase_dir, '.tester-pro', 'defects.json')
existing = []
if os.path.exists(defects_path):
try: existing = json.load(open(defects_path, encoding='utf-8'))
except: pass
def is_open_for(gid, title_prefix):
return any(
d.get('related_goals') and gid in d['related_goals']
and d.get('title','').startswith(title_prefix)
and not d.get('closed_at')
for d in existing
)
opened = 0
for m in row_re.finditer(matrix):
gid, prio, surf, status, ev = m.groups()
if status not in fail_states:
continue
title_prefix = f"[{status}]"
if is_open_for(gid, title_prefix):
continue
# Severity heuristic: critical priority → critical; backend mutation → major;
# else minor.
prio_l = prio.strip().lower()
surf_l = surf.strip().lower()
if prio_l == 'critical':
sev = 'critical'
elif any(s in surf_l for s in ('api', 'data', 'integration')):
sev = 'major'
else:
sev = 'minor'
title = f"[{status}] {gid} — {ev[:80]}"
cmd = [
sys.executable, cli, 'defect', 'new',
'--phase', phase_no, '--title', title,
'--severity', sev, '--found-in', 'review',
'--goals', gid,
'--notes', f"surface={surf} priority={prio} status={status}. Auto-opened from GOAL-COVERAGE-MATRIX.",
]
r = subprocess.run(cmd, capture_output=True, text=True)
if r.returncode == 0:
opened += 1
print(f"opened {opened} new defect(s) from matrix")
PYDEFECT
fi
"${PYTHON_BIN:-python3}" .claude/scripts/vg-orchestrator run-complete
RUN_RC=$?
if [ $RUN_RC -ne 0 ]; then
echo "⛔ review run-complete BLOCK — review orchestrator output + fix before /vg:test" >&2
exit $RUN_RC
fi
<success_criteria>