Run any Skill in Manus with one click

verify-changes

Verify all recent changes: review diffs, check that unit/e2e tests cover the changes, run all tests, manually verify UI changes with playwright-cli, fix problems, re-verify until clean, then report with recommendations.

Run Skill in Manus

Overview

Install command

npx skills add https://github.com/zeveck/zskills-dev --skill verify-changes

Copy and paste this command into Claude Code to install the skill

Source

zeveck/zskills-dev

Stars1

Forks0

UpdatedJune 3, 2026 at 23:29

SKILL.md

readonly

/verify-changes [ | last [N]] — Verify, Test & Fix Changes

Thoroughly verify all recent changes in the working tree (or a specified scope). Reviews diffs, checks test coverage, runs tests, manually verifies UI changes, fixes any problems found, re-verifies recursively until clean, then reports results with next-step recommendations.

Ultrathink throughout. Use careful, thorough reasoning at every step. Read the code, understand what changed and why, verify correctness — don't just skim.

NEVER verify from memory. Read actual diffs, run actual tests. You may have just written the code being verified — that's exactly why YOU should not be the verifier without checking the artifacts. Dispatch agents to do the actual verification work. Fresh agents have no memory of the implementation and will catch things you'd miss because you "know" what the code does.

Dispatch protocol

Dispatch shape (top-level invocation). When /verify-changes is invoked at the top level AND the orchestrator has the Agent tool, dispatched sub-agents use subagent_type: "verifier"; pipe each agent's response through bash "$CLAUDE_PROJECT_DIR/.claude/hooks/verify-response-validate.sh"; on exit 1 STOP the sub-flow. When invoked from within a verifier subagent (the typical case after this plan lands), the Agent tool is unavailable categorically — fall through to the inline path (:43-61). Document the freshness mode in the verification report ("multi-agent" / "single-context fresh-subagent" / "inline self-review").

Check your tool list first. Whether you can dispatch fresh sub-agents for verification depends on whether you have the Agent (or Task) tool:

If Agent is in your tool list (you are running at the top level — the user invoked you directly, or a parent skill cron-fired you in chunked mode), dispatch fresh sub-agents per the protocol below using subagent_type: "verifier" at every dispatch site. Each sub-agent is a sibling of any other sub-agent dispatched by you, and they have independent contexts. This is the multi-agent verification mode and gives the strongest fresh-eyes guarantees.

Layer 3 — verifier response validation. After each verifier sub-agent dispatch returns:
```
printf '%s' "$VERIFIER_RESPONSE" | bash "$CLAUDE_PROJECT_DIR/.claude/hooks/verify-response-validate.sh"
VALIDATE_EXIT=$?
```
On VALIDATE_EXIT=1 — STOP this sub-flow. Do NOT roll up the sub-agent's output into the verification report as a PASS. Surface the failure to the dispatcher (or to the user if you ARE the top-level orchestrator). Do not auto-retry — re-dispatching with the same agent type hits the same wall.
If Agent is NOT in your tool list (you are running as a dispatched subagent yourself — Claude Code subagents do not have the Agent/Task tool, by Anthropic's design at https://code.claude.com/docs/en/sub-agents), execute the verification workflow inline in your current context. Be explicit about freshness:
- You ARE fresh relative to your dispatcher IF your dispatcher is a top-level orchestrator and you were dispatched as a separate subagent (typical case after chunked /run-plan dispatches you for verification — you have no memory of the implementation work because it happened in a different subagent).
- You are NOT fresh relative to your dispatcher if you were Skill-loaded inline (your dispatcher's context IS your context, including any implementation work).
- Document the freshness mode in your verification report so the user knows what kind of verification they got: "multi-agent" (you had dispatch and used it), "single-context fresh-subagent" (you didn't have dispatch but you were a fresh subagent), or "inline self-review" (you were Skill-loaded into the implementer's context — limited assurance, flag for the user).

At minimum, dispatch an agent (with subagent_type: "verifier") for Phase 1-2 (read diffs, audit coverage) and run tests yourself (Phase 3) when dispatch is available. Run everything inline when it isn't. For complex changes, dispatch separate agents for different verification concerns (diff review, test coverage, manual testing) — each with subagent_type: "verifier" and each piped through Layer 3.

Do not produce a verification report based on what you remember doing. Context compaction means your memory of what you changed may be incomplete or wrong. The ENTIRE POINT of /verify-changes is fresh eyes on the actual state of the code, not a rubber stamp of what you think you did.

Past failure: a verification was done entirely from memory without reading a single diff or dispatching any agents — it just confirmed "looks good" based on session recall.

Test-command resolution (BEFORE Phase 3)

Resolve $FULL_TEST_CMD explicitly — do NOT search the repo for test scripts, and do NOT default to npm run test:all, scripts/test-all.sh, or any template file. Three-case decision tree:

MAIN_ROOT=$(cd "$(git rev-parse --git-common-dir 2>/dev/null)/.." 2>/dev/null && pwd)
CONFIG_FILE="$MAIN_ROOT/.claude/zskills-config.json"

FULL_TEST_CMD=""
if [ -f "$CONFIG_FILE" ] \
   && [[ "$(cat "$CONFIG_FILE")" =~ \"full_cmd\"[[:space:]]*:[[:space:]]*\"([^\"]*)\" ]]; then
  FULL_TEST_CMD="${BASH_REMATCH[1]}"
fi

if [ -n "$FULL_TEST_CMD" ]; then
  # Case 1 — config sets a test command. Use it verbatim.
  TEST_MODE="config"
else
  # Config is empty or missing. Check whether the project has test infra
  # at all (same detection list as run-plan's preflight hook-placeholder gate):
  # package.json with a "test" script, vitest.config.*, jest.config.*,
  # pytest.ini, .mocharc.*, Makefile, tests/*.sh, tests/*.py, tests/*.js.
  TEST_INFRA_DETECTED=0
  [ -f "$MAIN_ROOT/package.json" ] && grep -q '"test"[[:space:]]*:' "$MAIN_ROOT/package.json" && TEST_INFRA_DETECTED=1
  ls "$MAIN_ROOT"/vitest.config.* "$MAIN_ROOT"/jest.config.* "$MAIN_ROOT"/pytest.ini \
     "$MAIN_ROOT"/.mocharc.* "$MAIN_ROOT"/Makefile 2>/dev/null | grep -q . && TEST_INFRA_DETECTED=1
  ls "$MAIN_ROOT/tests"/*.sh "$MAIN_ROOT/tests"/*.py "$MAIN_ROOT/tests"/*.js 2>/dev/null | grep -q . && TEST_INFRA_DETECTED=1

  if [ "$TEST_INFRA_DETECTED" -eq 1 ]; then
    # Case 2 — tests exist but no command configured. Misconfigured
    # project. Refuse to claim verification; direct caller to fix config.
    echo "ERROR: /verify-changes: test infra detected (tests/ files, package.json" >&2
    echo "  test script, vitest/jest/pytest config, or Makefile present) but" >&2
    echo "  testing.full_cmd is empty in .claude/zskills-config.json. Tests must" >&2
    echo "  not be silently skipped. Run /update-zskills (or edit the config" >&2
    echo "  directly) to set testing.full_cmd, then re-run /verify-changes." >&2
    exit 1
  else
    # Case 3 — no test infra and no configured command. Legitimate state
    # for a docs-only, greenfield, or all-manual-testing project. Skip
    # the test gate and RECORD THE SKIP EXPLICITLY in the verification
    # report so a reviewer can see that tests were not run by design.
    TEST_MODE="skipped"
    echo "/verify-changes: no test infra detected and no testing.full_cmd" >&2
    echo "  configured — skipping Phase 3 tests. This MUST be recorded in" >&2
    echo "  the verification report under 'Tests: skipped — no test infra'." >&2
  fi
fi

Summary of the three cases (report format uses exactly these strings):

TEST_MODE=config — use $FULL_TEST_CMD verbatim in Phase 3 (resolved by the three-case decision tree above; helper source: zskills-resolve-config.sh)
TEST_MODE=skipped — do NOT run tests; explicitly note "Tests: skipped — no test infra" in the final report and tracking marker
Error exit — misconfiguration; no verification attempted, caller must fix

The silent-fallback path (guess a command, search the repo, use a template file) is removed. Silence is acceptable ONLY when TEST_MODE=skipped with an explicit report entry — never as a stand-in for "couldn't figure out what to run."

Arguments

/verify-changes [scope]

scope (optional) — what changes to verify:
- (omit) — all uncommitted changes in the working tree (default)
- worktree — changes in the current worktree vs its base branch
- branch — all commits on the current branch vs main
- last — only the last commit (same as last 1)
- last N — the last N commits

Examples: /verify-changes, /verify-changes worktree, /verify-changes last 3

Cron-fired top-level example (final cross-branch verification at the end of a /research-and-go pipeline):

"Run /verify-changes branch tracking-id=meta-add-dark-mode"

Parses as SCOPE=branch, TRACKING_ID=meta-add-dark-mode, and on successful completion writes .zskills/tracking/fulfilled.verify-changes.final.meta-add-dark-mode, matching the requires.verify-changes.final.meta-add-dark-mode lockdown marker created by /research-and-go Step 0.

Parsing $ARGUMENTS

SCOPE=""
TRACKING_ID=""
for tok in $ARGUMENTS; do
  case "$tok" in
    tracking-id=*) TRACKING_ID="${tok#tracking-id=}" ;;
    worktree|branch|last) SCOPE="$tok" ;;
    [0-9]*) [ "$SCOPE" = "last" ] && SCOPE="last $tok" ;;
  esac
done

Lets /verify-changes branch tracking-id=X parse correctly when fired as a cron-fired top-level turn.

Tracking Fulfillment

On entry, if a tracking ID was passed by the parent skill, create the fulfillment marker in the MAIN repo. verify-changes is always invoked as a delegatee (or as a cron-fired top-level turn from a pipeline); the fulfillment marker must land in the parent's subdir so the parent's requires.* and our fulfilled.* meet at the same path.

PIPELINE_ID resolution — 3-tier priority (same block applies at every PIPELINE_ID assignment below):

$ZSKILLS_PIPELINE_ID env — set by the parent in shell-inheritance cases (rare under Claude Code subagent dispatch since env does not inherit across agents, but covers cron-fired top-level turns and tests that export it explicitly).
.zskills-tracked in the worktree (cwd) — the parent skill (run-plan, research-and-go, etc.) writes its own PIPELINE_ID into the worktree's .zskills-tracked when it sets up the worktree. This is the primary delegation channel under Claude Code.
Fallback verify-changes.$TRACKING_ID (or verify-changes.final.<id> when SCOPE=branch) — only for TRULY standalone invocations with no parent pipeline. verify-changes becomes its own pipeline owner.

if [ -f "${CLAUDE_PLUGIN_ROOT}/skills/update-zskills/scripts/zskills-resolve-config.sh" ]; then
  export CLAUDE_PLUGIN_ROOT="${CLAUDE_PLUGIN_ROOT}"
  . "${CLAUDE_PLUGIN_ROOT}/skills/update-zskills/scripts/zskills-resolve-config.sh"
else
  . "$CLAUDE_PROJECT_DIR/.claude/skills/update-zskills/scripts/zskills-resolve-config.sh"
fi
MAIN_ROOT=$(cd "$(git rev-parse --git-common-dir)/.." && pwd)
MARKER_STEM="verify-changes"
[ "$SCOPE" = "branch" ] && MARKER_STEM="verify-changes.final"
# 3-tier PIPELINE_ID resolution: env, then worktree .zskills-tracked
# (parent's PIPELINE_ID inherited via the worktree file), then fallback
# to own-skill standalone identity.
PIPELINE_ID="${ZSKILLS_PIPELINE_ID:-}"
if [ -z "$PIPELINE_ID" ] && [ -f ".zskills-tracked" ]; then
  PIPELINE_ID=$(tr -d '[:space:]' < ".zskills-tracked")
fi
: "${PIPELINE_ID:=$MARKER_STEM.$TRACKING_ID}"
PIPELINE_ID=$(bash "$ZSKILLS_SKILLS_ROOT/create-worktree/scripts/sanitize-pipeline-id.sh" "$PIPELINE_ID")
[ -n "$PIPELINE_ID" ] || { echo "tracking: empty PIPELINE_ID — refusing flat write" >&2; exit 1; }
mkdir -p "$MAIN_ROOT/.zskills/tracking/$PIPELINE_ID"
printf 'skill: verify-changes\nid: %s\nscope: %s\nstatus: started\ndate: %s\n' \
  "$TRACKING_ID" "$SCOPE" "$(TZ="${TIMEZONE:-UTC}" date -Iseconds)" \
  > "$MAIN_ROOT/.zskills/tracking/$PIPELINE_ID/fulfilled.$MARKER_STEM.$TRACKING_ID"

If no tracking ID was passed (standalone invocation), skip tracking.

When $SCOPE = "branch" this writes to fulfilled.verify-changes.final.<id> — matching the requires.verify-changes.final.<id> lockdown marker created by /research-and-go Step 0.

Phase 1 — Inventory Changes

Determine the diff scope based on the argument:
- Default: git diff + git diff --cached + git status -s
- worktree: git diff $(git merge-base HEAD main)..HEAD + git diff + git diff --cached
- branch: git log $(git merge-base origin/main HEAD)..HEAD --oneline + git diff $(git merge-base origin/main HEAD)..HEAD
- last: git diff HEAD~1..HEAD
- last N: git diff HEAD~N..HEAD
List all changed files with their change type (added/modified/deleted):
```
git diff --name-status [scope]
```
Read and understand every changed file. For each file:
- What was the change? (new feature, bug fix, refactor, test, config)
- What is the intent? (read commit messages, issue references, comments)
- Does the change look correct? (logic errors, edge cases, off-by-ones)
- Any security concerns? (XSS, injection, unsanitized input)
- Any CLAUDE.md rule violations? (weakened tests, external deps, etc.)
- Scope vs plan: does this change stay within the plan's stated goal? Flag any file touched that is not mentioned by the plan's Work Items or Acceptance Criteria, AND any deletion/rewrite of features unrelated to the plan's purpose. (Regression guard: commit faab84b silently deleted unrelated features because no reviewer asked this question.)
Produce a change inventory table:

File Change Type Description Tests Needed

Phase 2 — Test Coverage Audit

For each changed file, verify appropriate tests exist:

Source code changes (src/**):
- Find the corresponding test file(s) in tests/
- Verify test cases exist that exercise the changed code paths
- Check that tests are meaningful — not just smoke tests, but actually testing the specific behavior that changed
- If a bug fix: does a regression test exist that would have caught the bug?
- If a new feature: do tests cover the happy path AND edge cases?
Component/module changes — find corresponding test files and verify output computation, parameter handling, edge cases are tested.
Engine/solver changes — check analytical verification tests where applicable. Verify numerical accuracy assertions (tolerances, convergence).
Codegen/build changes — verify compile tests exist. Check behavioral parity between source and generated code.
UI/editor changes — check for E2E tests. Flag for manual verification in Phase 4.
Produce a coverage assessment:

File Test File(s) Coverage Gaps

Coverage ratings: Good (meaningful tests exist), Partial (some paths untested), Missing (no tests), N/A (config, docs, etc.)

Phase 3 — Run Tests

If TEST_MODE=skipped (from "Test-command resolution" above), skip this entire phase and write one bullet in the Phase 6 report under Tests: Tests: skipped — no test infra detected; no testing.full_cmd configured. Then proceed to Phase 4. Do not search for an alternate test command.

Otherwise (TEST_MODE=config, $FULL_TEST_CMD set):

Bash-tool timeout for the full suite (CRITICAL). When invoking the test command via the Bash tool, pass timeout: 600000 (10 min). The default 120000ms is shorter than the suite's actual runtime (~3-4 min in zskills). If the Bash call times out, do NOT recover by retrying with run_in_background: true + Monitor / BashOutput polling — wake events for background processes do not reliably deliver to subagents, so the wait never returns and the dispatch hangs at "Tests are running. Let me wait for the monitor." Past failure: 6+ subagent crashes with exactly that phrase across 2026-04-29 and 2026-04-30 sessions. Always foreground-Bash with explicit long timeout; capture to file as below; read the file when the call returns.

Run the full test suite with output captured to a file (canonical form — maintainers: see references/canonical-config-prelude.md §1 in the zskills source):

if [ -f "${CLAUDE_PLUGIN_ROOT}/skills/update-zskills/scripts/zskills-resolve-config.sh" ]; then
  export CLAUDE_PLUGIN_ROOT="${CLAUDE_PLUGIN_ROOT}"
  . "${CLAUDE_PLUGIN_ROOT}/skills/update-zskills/scripts/zskills-resolve-config.sh"
else
  . "$CLAUDE_PROJECT_DIR/.claude/skills/update-zskills/scripts/zskills-resolve-config.sh"
fi
TEST_OUT="/tmp/zskills-tests/$(basename "$(pwd)")"
mkdir -p "$TEST_OUT"
$FULL_TEST_CMD > "$TEST_OUT/${TEST_OUTPUT_FILE:-.test-results.txt}" 2>&1

Never pipe through | tail, | head, | grep — it loses output and forces re-runs. Capture once, then read "$TEST_OUT/${TEST_OUTPUT_FILE:-.test-results.txt}" to find failures. This runs unit tests (~4,000), then auto-detects whether the dev server and cargo are available for E2E and codegen tests.

If tests fail, diagnose with targeted runs. Read "$TEST_OUT/${TEST_OUTPUT_FILE:-.test-results.txt}" to identify the failing test file, then run ONLY that file (the exact single-file invocation depends on the project's test framework — infer from the failing file's extension: node --test tests/x.test.js, pytest tests/x_test.py, bash tests/x.sh, etc.). Do NOT re-run $FULL_TEST_CMD to diagnose — that wastes minutes when the single file takes seconds. Use the full command only as the final gate after fixes.
Pre-existing failure protocol. If a test fails in code you didn't touch, it may be pre-existing — not caused by the changes you're verifying. This also applies when you've hit the max 2 fix attempts on a failure and suspect the root cause predates your changes.

a. Verify it's pre-existing: check git log --oneline -5 -- <test-file> and git log --oneline -5 -- <source-file>. If neither was modified by the changes under review, the failure predates your work.

b. Research and file a GitHub issue (gh issue create) with:
- Title: Test failure: <test name>
- Verbatim error output from "$TEST_OUT/${TEST_OUTPUT_FILE:-.test-results.txt}"
- The exact assert.* line from the test source (read the test code)
- Reproduction command: node --test tests/<file>.test.js
- git log evidence that the failure predates current changes
- Label: test-restore
c. Mark the test skipped referencing the issue:
- Change it('name', to it.skip('name // #NNN',
- Commit the skip and the issue number together
d. Re-run the failing test file to confirm the skip works, then run the final gate:
```
if [ -f "${CLAUDE_PLUGIN_ROOT}/skills/update-zskills/scripts/zskills-resolve-config.sh" ]; then
  export CLAUDE_PLUGIN_ROOT="${CLAUDE_PLUGIN_ROOT}"
  . "${CLAUDE_PLUGIN_ROOT}/skills/update-zskills/scripts/zskills-resolve-config.sh"
else
  . "$CLAUDE_PROJECT_DIR/.claude/skills/update-zskills/scripts/zskills-resolve-config.sh"
fi
TEST_OUT="/tmp/zskills-tests/$(basename "$(pwd)")"
mkdir -p "$TEST_OUT"
$FULL_TEST_CMD > "$TEST_OUT/${TEST_OUTPUT_FILE:-.test-results.txt}" 2>&1
```
e. Guardrails:
- Never skip a test in a file you modified or that tests code you modified
- Never skip a test you wrote
- If you're about to skip a 3rd test in one session, STOP and report to the user — 3+ pre-existing failures suggests a systemic problem
Record results:

Suite Result Failures

Post-tests tracking

After recording test results (pass or fail), create the tests-run step marker if a tracking ID is present:

if [ -f "${CLAUDE_PLUGIN_ROOT}/skills/update-zskills/scripts/zskills-resolve-config.sh" ]; then
  export CLAUDE_PLUGIN_ROOT="${CLAUDE_PLUGIN_ROOT}"
  . "${CLAUDE_PLUGIN_ROOT}/skills/update-zskills/scripts/zskills-resolve-config.sh"
else
  . "$CLAUDE_PROJECT_DIR/.claude/skills/update-zskills/scripts/zskills-resolve-config.sh"
fi
MAIN_ROOT=$(cd "$(git rev-parse --git-common-dir)/.." && pwd)
MARKER_STEM="verify-changes"
[ "$SCOPE" = "branch" ] && MARKER_STEM="verify-changes.final"
# 3-tier PIPELINE_ID resolution (see "Tracking Fulfillment" above).
PIPELINE_ID="${ZSKILLS_PIPELINE_ID:-}"
if [ -z "$PIPELINE_ID" ] && [ -f ".zskills-tracked" ]; then
  PIPELINE_ID=$(tr -d '[:space:]' < ".zskills-tracked")
fi
: "${PIPELINE_ID:=$MARKER_STEM.$TRACKING_ID}"
PIPELINE_ID=$(bash "$ZSKILLS_SKILLS_ROOT/create-worktree/scripts/sanitize-pipeline-id.sh" "$PIPELINE_ID")
[ -n "$PIPELINE_ID" ] || { echo "tracking: empty PIPELINE_ID — refusing flat write" >&2; exit 1; }
printf 'result: %s\ncompleted: %s\n' "$TEST_RESULT" "$(TZ="${TIMEZONE:-UTC}" date -Iseconds)" \
  > "$MAIN_ROOT/.zskills/tracking/$PIPELINE_ID/step.verify-changes.$TRACKING_ID.tests-run"

Phase 4 — Agent Verification + User Verification Classification

Two types of verification — both are mandatory for UI changes:

Agent verification (MANUAL): The agent tests the change via playwright-cli. This is YOUR job. You MUST do this for any change that touches UI files. The pre-commit hook will block commits if playwright-cli wasn't used in the session when UI files are staged. This is not optional.

User verification (USER): Some changes need the HUMAN to see them — judgment calls about animation quality, visual layout, UX feel. The agent flags these but cannot close them. Mechanically classified: if UI/editor/styles files changed → User Verify: NEEDED. /fix-report Step 2 presents these to the user before closing.

Agent verification steps

Use the /manual-testing skill for recipes, selectors, and setup instructions.

"No dev server" is not an excuse to skip. If UI files changed and no dev server is running, START ONE:

if [ -f "${CLAUDE_PLUGIN_ROOT}/skills/update-zskills/scripts/zskills-resolve-config.sh" ]; then
  export CLAUDE_PLUGIN_ROOT="${CLAUDE_PLUGIN_ROOT}"
  . "${CLAUDE_PLUGIN_ROOT}/skills/update-zskills/scripts/zskills-resolve-config.sh"
else
  . "$CLAUDE_PROJECT_DIR/.claude/skills/update-zskills/scripts/zskills-resolve-config.sh"
fi
if [ -z "$DEV_SERVER_CMD" ]; then
  echo "ERROR: dev_server.cmd not configured. Run /update-zskills." >&2
  exit 1
fi
$DEV_SERVER_CMD &

Wait a few seconds, then proceed. The dev server is a static file server — it takes 2 seconds to start. Reporting "N/A (no dev server)" when you could have started one is skipping, not verifying.

For each UI/interaction change:
- Start a dev server if one isn't running
- Follow /manual-testing setup (auth bypass, browser open)
- Reproduce the scenario that exercises the change using real events
- Take screenshots as evidence
- Verify the expected behavior occurs
- Test edge cases (undo/redo, rapid clicks, empty inputs, etc.)
For non-UI changes, verify manually where possible:
- Solver changes: build a small test model, run simulation, check output
- Block changes: place the block, configure params, run sim, verify scope output
- Import/export: load a test file, verify round-trip
Record verification results with both columns:

Change Agent Verify User Verify
Button offset fix PASS (screenshot) NEEDED
Solver tolerance PASS (tests) N/A

Skip agent verification ONLY if no changes are UI-related or manually verifiable. User Verify is always classified (NEEDED or N/A) based on file paths.
For each User Verify: NEEDED item, include verification instructions:
- What to look at (specific UI element, interaction, visual behavior)
- How to reproduce (steps: open app, navigate to X, click Y, observe Z)
- What "correct" looks like (expected appearance, behavior, output)
- URL: http://localhost:$(bash "$ZSKILLS_SKILLS_ROOT/update-zskills/scripts/port.sh")/
The user may be verifying hours later in a different context. "NEEDED" without instructions is useless — the user won't know what to check.

Post-manual-verification tracking

After completing agent verification (Phase 4), if UI changes were verified and a tracking ID is present, create the manual-verified step marker:

if [ -f "${CLAUDE_PLUGIN_ROOT}/skills/update-zskills/scripts/zskills-resolve-config.sh" ]; then
  export CLAUDE_PLUGIN_ROOT="${CLAUDE_PLUGIN_ROOT}"
  . "${CLAUDE_PLUGIN_ROOT}/skills/update-zskills/scripts/zskills-resolve-config.sh"
else
  . "$CLAUDE_PROJECT_DIR/.claude/skills/update-zskills/scripts/zskills-resolve-config.sh"
fi
MAIN_ROOT=$(cd "$(git rev-parse --git-common-dir)/.." && pwd)
MARKER_STEM="verify-changes"
[ "$SCOPE" = "branch" ] && MARKER_STEM="verify-changes.final"
# 3-tier PIPELINE_ID resolution (see "Tracking Fulfillment" above).
PIPELINE_ID="${ZSKILLS_PIPELINE_ID:-}"
if [ -z "$PIPELINE_ID" ] && [ -f ".zskills-tracked" ]; then
  PIPELINE_ID=$(tr -d '[:space:]' < ".zskills-tracked")
fi
: "${PIPELINE_ID:=$MARKER_STEM.$TRACKING_ID}"
PIPELINE_ID=$(bash "$ZSKILLS_SKILLS_ROOT/create-worktree/scripts/sanitize-pipeline-id.sh" "$PIPELINE_ID")
[ -n "$PIPELINE_ID" ] || { echo "tracking: empty PIPELINE_ID — refusing flat write" >&2; exit 1; }
printf 'ui_changes: true\ncompleted: %s\n' "$(TZ="${TIMEZONE:-UTC}" date -Iseconds)" \
  > "$MAIN_ROOT/.zskills/tracking/$PIPELINE_ID/step.verify-changes.$TRACKING_ID.manual-verified"

Only create this marker if UI files were actually verified in Phase 4. Skip for non-UI changes.

Phase 5 — Fix Problems

If any issues were found in Phases 2-4:

Test gaps — write the missing tests:
- Unit tests for uncovered code paths
- Regression tests for bug fixes
- E2E tests for UI changes (if appropriate)
Test failures — fix the root cause:
- Never weaken tests to make them pass. Fix the code, not the test.
- If the test itself is genuinely wrong (testing the wrong expected value), fix the test and document why.
- Never modify the working tree to check if a failure is pre-existing. No git stash && npm test && git stash pop, no checkout of old commits, no comparison worktrees. If you touched code and tests fail, assume your changes caused it and fix them. See CLAUDE.md for the full rule and the past failure that motivated it.
Code issues — fix logic errors, edge cases, security concerns found in Phase 1.
Manual verification failures — fix the behavior, then re-verify.
If working in a worktree, commit fixes so they are preserved for cherry-pick. Use descriptive commit messages referencing what was fixed.

Phase 6 — Re-verify (max 2 rounds)

After fixing any issues in Phase 5:

Run $FULL_TEST_CMD again (canonical form — maintainers: see references/canonical-config-prelude.md §1 in the zskills source) — all suites must pass, including new tests
Re-check manual verifications if fixes touched UI code
If new problems are found, go back to Phase 5

Maximum 2 fix+verify rounds. If the same error recurs after two fix attempts, STOP. Report what was tried, what failed both times, and why you think it's failing. Do not keep guessing — let the user decide. (See CLAUDE.md: "NEVER thrash on a failing fix.")

Phase 7 — Report

Always output the report inline. Additionally, write the report FILE to $ZSKILLS_REPORTS_DIR/verify-{scope-slug}.md and regenerate the index — but ONLY if there are User Verify: NEEDED items that require human sign-off. If all items are clean (no [ ] checkboxes), say so inline and skip the file:

Verification clean — no user sign-off needed. [summary of what passed]

The report file exists for the user to review LATER. If there's nothing to review later, the inline output is sufficient.

Worktree write path: Reports are ALWAYS written to the main repo's audit directory, not the worktree's. When running inside a worktree, resolve via the helper with ZSKILLS_PATHS_ROOT pointing at the main repo:

MAIN_ROOT=$(cd "$(git rev-parse --git-common-dir)/.." && pwd)
ZSKILLS_PATHS_ROOT="$MAIN_ROOT" \
  if [ -f "${CLAUDE_PLUGIN_ROOT}/skills/update-zskills/scripts/zskills-paths.sh" ]; then
    export CLAUDE_PLUGIN_ROOT="${CLAUDE_PLUGIN_ROOT}"
    . "${CLAUDE_PLUGIN_ROOT}/skills/update-zskills/scripts/zskills-paths.sh"
  else
    source "$MAIN_ROOT/.claude/skills/update-zskills/scripts/zskills-paths.sh"
  fi

Then write to $ZSKILLS_REPORTS_DIR/verify-{scope-slug}.md and $ZSKILLS_AUDIT_DIR/VERIFICATION_REPORT.md. This prevents reports from being lost when the worktree is cleaned up.

Scope slug derivation

Scope argument	Report file
(default/omit)	`$ZSKILLS_REPORTS_DIR/verify-working-tree.md`
`worktree`	`$ZSKILLS_REPORTS_DIR/verify-worktree-{name}.md` (name = `basename` of worktree path)
`branch`	`$ZSKILLS_REPORTS_DIR/verify-branch-{branch-name}.md`
`last N`	`$ZSKILLS_REPORTS_DIR/verify-last-{N}.md`
`last`	`$ZSKILLS_REPORTS_DIR/verify-last-1.md`

Report structure

Header:

# Verification Report — YYYY-MM-DD HH:MM
Scope: [default | worktree | branch | last N]
{One-line summary.} **Check the User column and sign off.**
Legend: ✅ verified, ⚠️ partial, ❌ failed, ➖ not applicable, [ ] not yet checked

Changes Reviewed — inventory table at the top:

## Changes Reviewed
| File | Change | Verdict |
|------|--------|---------|

Scope Assessment — mandatory in branch scope (whole-pipeline cumulative diff), recommended in other scopes. Insert immediately after "Changes Reviewed":

## Scope Assessment

**Plan goal:** {one-line quote from plan's Goal section}

| File | In-scope? | Rationale |
|------|-----------|-----------|
| src/foo.js | Yes | Listed in Work Item #2 |
| src/bar.js | ⚠️ Flag | Deletes `bar()` — not mentioned in plan |

If any row is flagged, verification is **NOT** clean — fix or
justify before signing off.

Regression guard: commit faab84b silently deleted features unrelated to its stated plan because no reviewer asked this question. /run-plan Phase 6 greps for ⚠️ Flag in this section and halts landing if found.

Domain-grouped sections — group by concern (UI/UX, Codegen, etc.), NOT by workflow state. Each section uses a single-checkbox checklist (no summary table + detail card dual-checkbox pattern):

## UI / UX Changes

- [ ] **#358** — Block Rotation
  1. Right-click a block and select Rotate
  2. Ports should move to the correct sides
  3. Block label should stay horizontal
  ![rotation](.playwright/output/358-rotation-90deg.png)

- [ ] **#401** — Tooltip positioning
  1. Hover near canvas edge
  2. Verify tooltip doesn't clip off-screen

One checkbox per verifiable item. Include verification steps and screenshots directly under each checkbox. One item per distinct thing to verify — not "3 blocks in explorer" but one per block.

Outcome sections (include only non-empty):

Skipped Verification — per-item reason
Pre-existing Bugs Discovered — found during verification
Test Suite Status — command + per-suite counts

Recommendations:

Next steps (commit, push, review)
Open concerns needing human judgment

Index regeneration

After writing the scope-specific report, regenerate $ZSKILLS_AUDIT_DIR/VERIFICATION_REPORT.md as an index of all verification reports:

Scan $ZSKILLS_REPORTS_DIR/verify-*.md files
For each file: extract the H1 line (date, scope) and count [ ] checkboxes (action items remaining)
Write $ZSKILLS_AUDIT_DIR/VERIFICATION_REPORT.md with this structure:

# Verification Reports Index

Legend: ✅ all signed off, [ ] action items remain

## Needs Sign-off

{Extract ALL `[ ]` items from all reports. For each, show the checkbox,
the item title, and link to the source report file.}

- [ ] Block Rotation sign-off — [verify-working-tree.md]($ZSKILLS_REPORTS_DIR/verify-working-tree.md)
- [ ] Solver tolerance sign-off — [verify-last-3.md]($ZSKILLS_REPORTS_DIR/verify-last-3.md)

{If no `[ ]` items remain across any report, write: "All items signed off."}

**Staleness rule:** Reports older than 7 days with unchecked `[ ]` items are
flagged as **STALE** in the index (append ` ⚠️ STALE` to the action items
column). Stale items are never auto-removed — the user decides whether to
re-verify or dismiss them.

---

## Reports

| Report | Date | Scope | Action Items |
|--------|------|-------|:------------:|
| [verify-working-tree.md]($ZSKILLS_REPORTS_DIR/verify-working-tree.md) | 2026-03-18 14:30 | working tree | 2 [ ] |
| [verify-last-3.md]($ZSKILLS_REPORTS_DIR/verify-last-3.md) | 2026-03-17 09:15 | last 3 | ✅ |

If there are no $ZSKILLS_REPORTS_DIR/verify-*.md files (e.g., all were cleaned up), write just the header and "No verification reports found."

Post-report tracking

After writing the report (or confirming verification is clean), create the complete step marker and update the fulfillment file if a tracking ID is present:

if [ -f "${CLAUDE_PLUGIN_ROOT}/skills/update-zskills/scripts/zskills-resolve-config.sh" ]; then
  export CLAUDE_PLUGIN_ROOT="${CLAUDE_PLUGIN_ROOT}"
  . "${CLAUDE_PLUGIN_ROOT}/skills/update-zskills/scripts/zskills-resolve-config.sh"
else
  . "$CLAUDE_PROJECT_DIR/.claude/skills/update-zskills/scripts/zskills-resolve-config.sh"
fi
MAIN_ROOT=$(cd "$(git rev-parse --git-common-dir)/.." && pwd)
MARKER_STEM="verify-changes"
[ "$SCOPE" = "branch" ] && MARKER_STEM="verify-changes.final"
# 3-tier PIPELINE_ID resolution (see "Tracking Fulfillment" above).
PIPELINE_ID="${ZSKILLS_PIPELINE_ID:-}"
if [ -z "$PIPELINE_ID" ] && [ -f ".zskills-tracked" ]; then
  PIPELINE_ID=$(tr -d '[:space:]' < ".zskills-tracked")
fi
: "${PIPELINE_ID:=$MARKER_STEM.$TRACKING_ID}"
PIPELINE_ID=$(bash "$ZSKILLS_SKILLS_ROOT/create-worktree/scripts/sanitize-pipeline-id.sh" "$PIPELINE_ID")
[ -n "$PIPELINE_ID" ] || { echo "tracking: empty PIPELINE_ID — refusing flat write" >&2; exit 1; }
printf 'completed: %s\n' "$(TZ="${TIMEZONE:-UTC}" date -Iseconds)" \
  > "$MAIN_ROOT/.zskills/tracking/$PIPELINE_ID/step.verify-changes.$TRACKING_ID.complete"

printf 'skill: verify-changes\nid: %s\nscope: %s\nstatus: complete\ndate: %s\n' \
  "$TRACKING_ID" "$SCOPE" "$(TZ="${TIMEZONE:-UTC}" date -Iseconds)" \
  > "$MAIN_ROOT/.zskills/tracking/$PIPELINE_ID/fulfilled.$MARKER_STEM.$TRACKING_ID"

Key Rules

Never commit, merge, or push without explicit user permission — unless working in a worktree where committing fixes is part of the workflow (Phase 5). Even then, never merge the worktree into main or push.
Never weaken tests. Fix the code, not the test. Do not loosen tolerances, skip assertions, or remove test cases.
Ultrathink. Use careful, thorough reasoning throughout. Read code carefully. Understand what changed and why before judging correctness.
Never verify from memory. Read actual diffs, actual files, run actual tests. Even if you just wrote the code — read it again. Memory is not verification.
Real events for manual testing. Never use eval or page.evaluate() to simulate user actions. Use real click/drag/type/press events.
Fix, don't just report. When problems are found, fix them — then re-verify. The goal is a clean verification, not a list of issues left for the user.
Start a dev server if needed. "No dev server" is not an excuse to skip manual verification. Run $DEV_SERVER_CMD & (canonical form — maintainers: see references/canonical-config-prelude.md §1 in the zskills source) — it takes 2 seconds. Only report "cannot verify" for genuinely unavailable tooling (e.g., no cargo for codegen).
Be thorough but honest. If something genuinely can't be verified, say so explicitly. Don't skip silently.
Respect existing changes. Never discard, revert, or overwrite uncommitted work that isn't part of the verification scope.

name	verify-changes
disable-model-invocation	false
argument-hint	[<scope> \| last [N]]
description	Verify all recent changes: review diffs, check that unit/e2e tests cover the changes, run all tests, manually verify UI changes with playwright-cli, fix problems, re-verify until clean, then report with recommendations.
metadata	{"version":"2026.06.03+05fb9c"}

/verify-changes [ | last [N]] — Verify, Test & Fix Changes

Ultrathink throughout. Use careful, thorough reasoning at every step. Read the code, understand what changed and why, verify correctness — don't just skim.

Dispatch protocol

Check your tool list first. Whether you can dispatch fresh sub-agents for verification depends on whether you have the Agent (or Task) tool:

If Agent is in your tool list (you are running at the top level — the user invoked you directly, or a parent skill cron-fired you in chunked mode), dispatch fresh sub-agents per the protocol below using subagent_type: "verifier" at every dispatch site. Each sub-agent is a sibling of any other sub-agent dispatched by you, and they have independent contexts. This is the multi-agent verification mode and gives the strongest fresh-eyes guarantees.

Layer 3 — verifier response validation. After each verifier sub-agent dispatch returns:
```
printf '%s' "$VERIFIER_RESPONSE" | bash "$CLAUDE_PROJECT_DIR/.claude/hooks/verify-response-validate.sh"
VALIDATE_EXIT=$?
```
On VALIDATE_EXIT=1 — STOP this sub-flow. Do NOT roll up the sub-agent's output into the verification report as a PASS. Surface the failure to the dispatcher (or to the user if you ARE the top-level orchestrator). Do not auto-retry — re-dispatching with the same agent type hits the same wall.
If Agent is NOT in your tool list (you are running as a dispatched subagent yourself — Claude Code subagents do not have the Agent/Task tool, by Anthropic's design at https://code.claude.com/docs/en/sub-agents), execute the verification workflow inline in your current context. Be explicit about freshness:
- You ARE fresh relative to your dispatcher IF your dispatcher is a top-level orchestrator and you were dispatched as a separate subagent (typical case after chunked /run-plan dispatches you for verification — you have no memory of the implementation work because it happened in a different subagent).
- You are NOT fresh relative to your dispatcher if you were Skill-loaded inline (your dispatcher's context IS your context, including any implementation work).
- Document the freshness mode in your verification report so the user knows what kind of verification they got: "multi-agent" (you had dispatch and used it), "single-context fresh-subagent" (you didn't have dispatch but you were a fresh subagent), or "inline self-review" (you were Skill-loaded into the implementer's context — limited assurance, flag for the user).

Past failure: a verification was done entirely from memory without reading a single diff or dispatching any agents — it just confirmed "looks good" based on session recall.

Test-command resolution (BEFORE Phase 3)

Resolve $FULL_TEST_CMD explicitly — do NOT search the repo for test scripts, and do NOT default to npm run test:all, scripts/test-all.sh, or any template file. Three-case decision tree:

MAIN_ROOT=$(cd "$(git rev-parse --git-common-dir 2>/dev/null)/.." 2>/dev/null && pwd)
CONFIG_FILE="$MAIN_ROOT/.claude/zskills-config.json"

FULL_TEST_CMD=""
if [ -f "$CONFIG_FILE" ] \
   && [[ "$(cat "$CONFIG_FILE")" =~ \"full_cmd\"[[:space:]]*:[[:space:]]*\"([^\"]*)\" ]]; then
  FULL_TEST_CMD="${BASH_REMATCH[1]}"
fi

if [ -n "$FULL_TEST_CMD" ]; then
  # Case 1 — config sets a test command. Use it verbatim.
  TEST_MODE="config"
else
  # Config is empty or missing. Check whether the project has test infra
  # at all (same detection list as run-plan's preflight hook-placeholder gate):
  # package.json with a "test" script, vitest.config.*, jest.config.*,
  # pytest.ini, .mocharc.*, Makefile, tests/*.sh, tests/*.py, tests/*.js.
  TEST_INFRA_DETECTED=0
  [ -f "$MAIN_ROOT/package.json" ] && grep -q '"test"[[:space:]]*:' "$MAIN_ROOT/package.json" && TEST_INFRA_DETECTED=1
  ls "$MAIN_ROOT"/vitest.config.* "$MAIN_ROOT"/jest.config.* "$MAIN_ROOT"/pytest.ini \
     "$MAIN_ROOT"/.mocharc.* "$MAIN_ROOT"/Makefile 2>/dev/null | grep -q . && TEST_INFRA_DETECTED=1
  ls "$MAIN_ROOT/tests"/*.sh "$MAIN_ROOT/tests"/*.py "$MAIN_ROOT/tests"/*.js 2>/dev/null | grep -q . && TEST_INFRA_DETECTED=1

  if [ "$TEST_INFRA_DETECTED" -eq 1 ]; then
    # Case 2 — tests exist but no command configured. Misconfigured
    # project. Refuse to claim verification; direct caller to fix config.
    echo "ERROR: /verify-changes: test infra detected (tests/ files, package.json" >&2
    echo "  test script, vitest/jest/pytest config, or Makefile present) but" >&2
    echo "  testing.full_cmd is empty in .claude/zskills-config.json. Tests must" >&2
    echo "  not be silently skipped. Run /update-zskills (or edit the config" >&2
    echo "  directly) to set testing.full_cmd, then re-run /verify-changes." >&2
    exit 1
  else
    # Case 3 — no test infra and no configured command. Legitimate state
    # for a docs-only, greenfield, or all-manual-testing project. Skip
    # the test gate and RECORD THE SKIP EXPLICITLY in the verification
    # report so a reviewer can see that tests were not run by design.
    TEST_MODE="skipped"
    echo "/verify-changes: no test infra detected and no testing.full_cmd" >&2
    echo "  configured — skipping Phase 3 tests. This MUST be recorded in" >&2
    echo "  the verification report under 'Tests: skipped — no test infra'." >&2
  fi
fi

Summary of the three cases (report format uses exactly these strings):

TEST_MODE=config — use $FULL_TEST_CMD verbatim in Phase 3 (resolved by the three-case decision tree above; helper source: zskills-resolve-config.sh)
TEST_MODE=skipped — do NOT run tests; explicitly note "Tests: skipped — no test infra" in the final report and tracking marker
Error exit — misconfiguration; no verification attempted, caller must fix

Arguments

/verify-changes [scope]

scope (optional) — what changes to verify:
- (omit) — all uncommitted changes in the working tree (default)
- worktree — changes in the current worktree vs its base branch
- branch — all commits on the current branch vs main
- last — only the last commit (same as last 1)
- last N — the last N commits

Examples: /verify-changes, /verify-changes worktree, /verify-changes last 3

Cron-fired top-level example (final cross-branch verification at the end of a /research-and-go pipeline):

"Run /verify-changes branch tracking-id=meta-add-dark-mode"

Parsing $ARGUMENTS

SCOPE=""
TRACKING_ID=""
for tok in $ARGUMENTS; do
  case "$tok" in
    tracking-id=*) TRACKING_ID="${tok#tracking-id=}" ;;
    worktree|branch|last) SCOPE="$tok" ;;
    [0-9]*) [ "$SCOPE" = "last" ] && SCOPE="last $tok" ;;
  esac
done

Lets /verify-changes branch tracking-id=X parse correctly when fired as a cron-fired top-level turn.

Tracking Fulfillment

PIPELINE_ID resolution — 3-tier priority (same block applies at every PIPELINE_ID assignment below):

$ZSKILLS_PIPELINE_ID env — set by the parent in shell-inheritance cases (rare under Claude Code subagent dispatch since env does not inherit across agents, but covers cron-fired top-level turns and tests that export it explicitly).
.zskills-tracked in the worktree (cwd) — the parent skill (run-plan, research-and-go, etc.) writes its own PIPELINE_ID into the worktree's .zskills-tracked when it sets up the worktree. This is the primary delegation channel under Claude Code.
Fallback verify-changes.$TRACKING_ID (or verify-changes.final.<id> when SCOPE=branch) — only for TRULY standalone invocations with no parent pipeline. verify-changes becomes its own pipeline owner.

if [ -f "${CLAUDE_PLUGIN_ROOT}/skills/update-zskills/scripts/zskills-resolve-config.sh" ]; then
  export CLAUDE_PLUGIN_ROOT="${CLAUDE_PLUGIN_ROOT}"
  . "${CLAUDE_PLUGIN_ROOT}/skills/update-zskills/scripts/zskills-resolve-config.sh"
else
  . "$CLAUDE_PROJECT_DIR/.claude/skills/update-zskills/scripts/zskills-resolve-config.sh"
fi
MAIN_ROOT=$(cd "$(git rev-parse --git-common-dir)/.." && pwd)
MARKER_STEM="verify-changes"
[ "$SCOPE" = "branch" ] && MARKER_STEM="verify-changes.final"
# 3-tier PIPELINE_ID resolution: env, then worktree .zskills-tracked
# (parent's PIPELINE_ID inherited via the worktree file), then fallback
# to own-skill standalone identity.
PIPELINE_ID="${ZSKILLS_PIPELINE_ID:-}"
if [ -z "$PIPELINE_ID" ] && [ -f ".zskills-tracked" ]; then
  PIPELINE_ID=$(tr -d '[:space:]' < ".zskills-tracked")
fi
: "${PIPELINE_ID:=$MARKER_STEM.$TRACKING_ID}"
PIPELINE_ID=$(bash "$ZSKILLS_SKILLS_ROOT/create-worktree/scripts/sanitize-pipeline-id.sh" "$PIPELINE_ID")
[ -n "$PIPELINE_ID" ] || { echo "tracking: empty PIPELINE_ID — refusing flat write" >&2; exit 1; }
mkdir -p "$MAIN_ROOT/.zskills/tracking/$PIPELINE_ID"
printf 'skill: verify-changes\nid: %s\nscope: %s\nstatus: started\ndate: %s\n' \
  "$TRACKING_ID" "$SCOPE" "$(TZ="${TIMEZONE:-UTC}" date -Iseconds)" \
  > "$MAIN_ROOT/.zskills/tracking/$PIPELINE_ID/fulfilled.$MARKER_STEM.$TRACKING_ID"

If no tracking ID was passed (standalone invocation), skip tracking.

When $SCOPE = "branch" this writes to fulfilled.verify-changes.final.<id> — matching the requires.verify-changes.final.<id> lockdown marker created by /research-and-go Step 0.

Phase 1 — Inventory Changes

Determine the diff scope based on the argument:
- Default: git diff + git diff --cached + git status -s
- worktree: git diff $(git merge-base HEAD main)..HEAD + git diff + git diff --cached
- branch: git log $(git merge-base origin/main HEAD)..HEAD --oneline + git diff $(git merge-base origin/main HEAD)..HEAD
- last: git diff HEAD~1..HEAD
- last N: git diff HEAD~N..HEAD
List all changed files with their change type (added/modified/deleted):
```
git diff --name-status [scope]
```
Read and understand every changed file. For each file:
- What was the change? (new feature, bug fix, refactor, test, config)
- What is the intent? (read commit messages, issue references, comments)
- Does the change look correct? (logic errors, edge cases, off-by-ones)
- Any security concerns? (XSS, injection, unsanitized input)
- Any CLAUDE.md rule violations? (weakened tests, external deps, etc.)
- Scope vs plan: does this change stay within the plan's stated goal? Flag any file touched that is not mentioned by the plan's Work Items or Acceptance Criteria, AND any deletion/rewrite of features unrelated to the plan's purpose. (Regression guard: commit faab84b silently deleted unrelated features because no reviewer asked this question.)
Produce a change inventory table:

File Change Type Description Tests Needed

Phase 2 — Test Coverage Audit

For each changed file, verify appropriate tests exist:

Source code changes (src/**):
- Find the corresponding test file(s) in tests/
- Verify test cases exist that exercise the changed code paths
- Check that tests are meaningful — not just smoke tests, but actually testing the specific behavior that changed
- If a bug fix: does a regression test exist that would have caught the bug?
- If a new feature: do tests cover the happy path AND edge cases?
Component/module changes — find corresponding test files and verify output computation, parameter handling, edge cases are tested.
Engine/solver changes — check analytical verification tests where applicable. Verify numerical accuracy assertions (tolerances, convergence).
Codegen/build changes — verify compile tests exist. Check behavioral parity between source and generated code.
UI/editor changes — check for E2E tests. Flag for manual verification in Phase 4.
Produce a coverage assessment:

File Test File(s) Coverage Gaps

Coverage ratings: Good (meaningful tests exist), Partial (some paths untested), Missing (no tests), N/A (config, docs, etc.)

Phase 3 — Run Tests

Otherwise (TEST_MODE=config, $FULL_TEST_CMD set):

Run the full test suite with output captured to a file (canonical form — maintainers: see references/canonical-config-prelude.md §1 in the zskills source):

if [ -f "${CLAUDE_PLUGIN_ROOT}/skills/update-zskills/scripts/zskills-resolve-config.sh" ]; then
  export CLAUDE_PLUGIN_ROOT="${CLAUDE_PLUGIN_ROOT}"
  . "${CLAUDE_PLUGIN_ROOT}/skills/update-zskills/scripts/zskills-resolve-config.sh"
else
  . "$CLAUDE_PROJECT_DIR/.claude/skills/update-zskills/scripts/zskills-resolve-config.sh"
fi
TEST_OUT="/tmp/zskills-tests/$(basename "$(pwd)")"
mkdir -p "$TEST_OUT"
$FULL_TEST_CMD > "$TEST_OUT/${TEST_OUTPUT_FILE:-.test-results.txt}" 2>&1

If tests fail, diagnose with targeted runs. Read "$TEST_OUT/${TEST_OUTPUT_FILE:-.test-results.txt}" to identify the failing test file, then run ONLY that file (the exact single-file invocation depends on the project's test framework — infer from the failing file's extension: node --test tests/x.test.js, pytest tests/x_test.py, bash tests/x.sh, etc.). Do NOT re-run $FULL_TEST_CMD to diagnose — that wastes minutes when the single file takes seconds. Use the full command only as the final gate after fixes.
Pre-existing failure protocol. If a test fails in code you didn't touch, it may be pre-existing — not caused by the changes you're verifying. This also applies when you've hit the max 2 fix attempts on a failure and suspect the root cause predates your changes.

a. Verify it's pre-existing: check git log --oneline -5 -- <test-file> and git log --oneline -5 -- <source-file>. If neither was modified by the changes under review, the failure predates your work.

b. Research and file a GitHub issue (gh issue create) with:
- Title: Test failure: <test name>
- Verbatim error output from "$TEST_OUT/${TEST_OUTPUT_FILE:-.test-results.txt}"
- The exact assert.* line from the test source (read the test code)
- Reproduction command: node --test tests/<file>.test.js
- git log evidence that the failure predates current changes
- Label: test-restore
c. Mark the test skipped referencing the issue:
- Change it('name', to it.skip('name // #NNN',
- Commit the skip and the issue number together
d. Re-run the failing test file to confirm the skip works, then run the final gate:
```
if [ -f "${CLAUDE_PLUGIN_ROOT}/skills/update-zskills/scripts/zskills-resolve-config.sh" ]; then
  export CLAUDE_PLUGIN_ROOT="${CLAUDE_PLUGIN_ROOT}"
  . "${CLAUDE_PLUGIN_ROOT}/skills/update-zskills/scripts/zskills-resolve-config.sh"
else
  . "$CLAUDE_PROJECT_DIR/.claude/skills/update-zskills/scripts/zskills-resolve-config.sh"
fi
TEST_OUT="/tmp/zskills-tests/$(basename "$(pwd)")"
mkdir -p "$TEST_OUT"
$FULL_TEST_CMD > "$TEST_OUT/${TEST_OUTPUT_FILE:-.test-results.txt}" 2>&1
```
e. Guardrails:
- Never skip a test in a file you modified or that tests code you modified
- Never skip a test you wrote
- If you're about to skip a 3rd test in one session, STOP and report to the user — 3+ pre-existing failures suggests a systemic problem
Record results:

Suite Result Failures

Post-tests tracking

After recording test results (pass or fail), create the tests-run step marker if a tracking ID is present:

if [ -f "${CLAUDE_PLUGIN_ROOT}/skills/update-zskills/scripts/zskills-resolve-config.sh" ]; then
  export CLAUDE_PLUGIN_ROOT="${CLAUDE_PLUGIN_ROOT}"
  . "${CLAUDE_PLUGIN_ROOT}/skills/update-zskills/scripts/zskills-resolve-config.sh"
else
  . "$CLAUDE_PROJECT_DIR/.claude/skills/update-zskills/scripts/zskills-resolve-config.sh"
fi
MAIN_ROOT=$(cd "$(git rev-parse --git-common-dir)/.." && pwd)
MARKER_STEM="verify-changes"
[ "$SCOPE" = "branch" ] && MARKER_STEM="verify-changes.final"
# 3-tier PIPELINE_ID resolution (see "Tracking Fulfillment" above).
PIPELINE_ID="${ZSKILLS_PIPELINE_ID:-}"
if [ -z "$PIPELINE_ID" ] && [ -f ".zskills-tracked" ]; then
  PIPELINE_ID=$(tr -d '[:space:]' < ".zskills-tracked")
fi
: "${PIPELINE_ID:=$MARKER_STEM.$TRACKING_ID}"
PIPELINE_ID=$(bash "$ZSKILLS_SKILLS_ROOT/create-worktree/scripts/sanitize-pipeline-id.sh" "$PIPELINE_ID")
[ -n "$PIPELINE_ID" ] || { echo "tracking: empty PIPELINE_ID — refusing flat write" >&2; exit 1; }
printf 'result: %s\ncompleted: %s\n' "$TEST_RESULT" "$(TZ="${TIMEZONE:-UTC}" date -Iseconds)" \
  > "$MAIN_ROOT/.zskills/tracking/$PIPELINE_ID/step.verify-changes.$TRACKING_ID.tests-run"

Phase 4 — Agent Verification + User Verification Classification

Two types of verification — both are mandatory for UI changes:

Agent verification steps

Use the /manual-testing skill for recipes, selectors, and setup instructions.

"No dev server" is not an excuse to skip. If UI files changed and no dev server is running, START ONE:

if [ -f "${CLAUDE_PLUGIN_ROOT}/skills/update-zskills/scripts/zskills-resolve-config.sh" ]; then
  export CLAUDE_PLUGIN_ROOT="${CLAUDE_PLUGIN_ROOT}"
  . "${CLAUDE_PLUGIN_ROOT}/skills/update-zskills/scripts/zskills-resolve-config.sh"
else
  . "$CLAUDE_PROJECT_DIR/.claude/skills/update-zskills/scripts/zskills-resolve-config.sh"
fi
if [ -z "$DEV_SERVER_CMD" ]; then
  echo "ERROR: dev_server.cmd not configured. Run /update-zskills." >&2
  exit 1
fi
$DEV_SERVER_CMD &

Wait a few seconds, then proceed. The dev server is a static file server — it takes 2 seconds to start. Reporting "N/A (no dev server)" when you could have started one is skipping, not verifying.

For each UI/interaction change:
- Start a dev server if one isn't running
- Follow /manual-testing setup (auth bypass, browser open)
- Reproduce the scenario that exercises the change using real events
- Take screenshots as evidence
- Verify the expected behavior occurs
- Test edge cases (undo/redo, rapid clicks, empty inputs, etc.)
For non-UI changes, verify manually where possible:
- Solver changes: build a small test model, run simulation, check output
- Block changes: place the block, configure params, run sim, verify scope output
- Import/export: load a test file, verify round-trip
Record verification results with both columns:

Change Agent Verify User Verify
Button offset fix PASS (screenshot) NEEDED
Solver tolerance PASS (tests) N/A

Skip agent verification ONLY if no changes are UI-related or manually verifiable. User Verify is always classified (NEEDED or N/A) based on file paths.
For each User Verify: NEEDED item, include verification instructions:
- What to look at (specific UI element, interaction, visual behavior)
- How to reproduce (steps: open app, navigate to X, click Y, observe Z)
- What "correct" looks like (expected appearance, behavior, output)
- URL: http://localhost:$(bash "$ZSKILLS_SKILLS_ROOT/update-zskills/scripts/port.sh")/
The user may be verifying hours later in a different context. "NEEDED" without instructions is useless — the user won't know what to check.

Post-manual-verification tracking

After completing agent verification (Phase 4), if UI changes were verified and a tracking ID is present, create the manual-verified step marker:

if [ -f "${CLAUDE_PLUGIN_ROOT}/skills/update-zskills/scripts/zskills-resolve-config.sh" ]; then
  export CLAUDE_PLUGIN_ROOT="${CLAUDE_PLUGIN_ROOT}"
  . "${CLAUDE_PLUGIN_ROOT}/skills/update-zskills/scripts/zskills-resolve-config.sh"
else
  . "$CLAUDE_PROJECT_DIR/.claude/skills/update-zskills/scripts/zskills-resolve-config.sh"
fi
MAIN_ROOT=$(cd "$(git rev-parse --git-common-dir)/.." && pwd)
MARKER_STEM="verify-changes"
[ "$SCOPE" = "branch" ] && MARKER_STEM="verify-changes.final"
# 3-tier PIPELINE_ID resolution (see "Tracking Fulfillment" above).
PIPELINE_ID="${ZSKILLS_PIPELINE_ID:-}"
if [ -z "$PIPELINE_ID" ] && [ -f ".zskills-tracked" ]; then
  PIPELINE_ID=$(tr -d '[:space:]' < ".zskills-tracked")
fi
: "${PIPELINE_ID:=$MARKER_STEM.$TRACKING_ID}"
PIPELINE_ID=$(bash "$ZSKILLS_SKILLS_ROOT/create-worktree/scripts/sanitize-pipeline-id.sh" "$PIPELINE_ID")
[ -n "$PIPELINE_ID" ] || { echo "tracking: empty PIPELINE_ID — refusing flat write" >&2; exit 1; }
printf 'ui_changes: true\ncompleted: %s\n' "$(TZ="${TIMEZONE:-UTC}" date -Iseconds)" \
  > "$MAIN_ROOT/.zskills/tracking/$PIPELINE_ID/step.verify-changes.$TRACKING_ID.manual-verified"

Only create this marker if UI files were actually verified in Phase 4. Skip for non-UI changes.

Phase 5 — Fix Problems

If any issues were found in Phases 2-4:

Test gaps — write the missing tests:
- Unit tests for uncovered code paths
- Regression tests for bug fixes
- E2E tests for UI changes (if appropriate)
Test failures — fix the root cause:
- Never weaken tests to make them pass. Fix the code, not the test.
- If the test itself is genuinely wrong (testing the wrong expected value), fix the test and document why.
- Never modify the working tree to check if a failure is pre-existing. No git stash && npm test && git stash pop, no checkout of old commits, no comparison worktrees. If you touched code and tests fail, assume your changes caused it and fix them. See CLAUDE.md for the full rule and the past failure that motivated it.
Code issues — fix logic errors, edge cases, security concerns found in Phase 1.
Manual verification failures — fix the behavior, then re-verify.
If working in a worktree, commit fixes so they are preserved for cherry-pick. Use descriptive commit messages referencing what was fixed.

Phase 6 — Re-verify (max 2 rounds)

After fixing any issues in Phase 5:

Run $FULL_TEST_CMD again (canonical form — maintainers: see references/canonical-config-prelude.md §1 in the zskills source) — all suites must pass, including new tests
Re-check manual verifications if fixes touched UI code
If new problems are found, go back to Phase 5

Phase 7 — Report

Verification clean — no user sign-off needed. [summary of what passed]

The report file exists for the user to review LATER. If there's nothing to review later, the inline output is sufficient.

MAIN_ROOT=$(cd "$(git rev-parse --git-common-dir)/.." && pwd)
ZSKILLS_PATHS_ROOT="$MAIN_ROOT" \
  if [ -f "${CLAUDE_PLUGIN_ROOT}/skills/update-zskills/scripts/zskills-paths.sh" ]; then
    export CLAUDE_PLUGIN_ROOT="${CLAUDE_PLUGIN_ROOT}"
    . "${CLAUDE_PLUGIN_ROOT}/skills/update-zskills/scripts/zskills-paths.sh"
  else
    source "$MAIN_ROOT/.claude/skills/update-zskills/scripts/zskills-paths.sh"
  fi

Then write to $ZSKILLS_REPORTS_DIR/verify-{scope-slug}.md and $ZSKILLS_AUDIT_DIR/VERIFICATION_REPORT.md. This prevents reports from being lost when the worktree is cleaned up.

Scope slug derivation

Scope argument	Report file
(default/omit)	`$ZSKILLS_REPORTS_DIR/verify-working-tree.md`
`worktree`	`$ZSKILLS_REPORTS_DIR/verify-worktree-{name}.md` (name = `basename` of worktree path)
`branch`	`$ZSKILLS_REPORTS_DIR/verify-branch-{branch-name}.md`
`last N`	`$ZSKILLS_REPORTS_DIR/verify-last-{N}.md`
`last`	`$ZSKILLS_REPORTS_DIR/verify-last-1.md`

Report structure

Header:

# Verification Report — YYYY-MM-DD HH:MM
Scope: [default | worktree | branch | last N]
{One-line summary.} **Check the User column and sign off.**
Legend: ✅ verified, ⚠️ partial, ❌ failed, ➖ not applicable, [ ] not yet checked

Changes Reviewed — inventory table at the top:

## Changes Reviewed
| File | Change | Verdict |
|------|--------|---------|

Scope Assessment — mandatory in branch scope (whole-pipeline cumulative diff), recommended in other scopes. Insert immediately after "Changes Reviewed":

## Scope Assessment

**Plan goal:** {one-line quote from plan's Goal section}

| File | In-scope? | Rationale |
|------|-----------|-----------|
| src/foo.js | Yes | Listed in Work Item #2 |
| src/bar.js | ⚠️ Flag | Deletes `bar()` — not mentioned in plan |

If any row is flagged, verification is **NOT** clean — fix or
justify before signing off.

Domain-grouped sections — group by concern (UI/UX, Codegen, etc.), NOT by workflow state. Each section uses a single-checkbox checklist (no summary table + detail card dual-checkbox pattern):

## UI / UX Changes

- [ ] **#358** — Block Rotation
  1. Right-click a block and select Rotate
  2. Ports should move to the correct sides
  3. Block label should stay horizontal
  ![rotation](.playwright/output/358-rotation-90deg.png)

- [ ] **#401** — Tooltip positioning
  1. Hover near canvas edge
  2. Verify tooltip doesn't clip off-screen

One checkbox per verifiable item. Include verification steps and screenshots directly under each checkbox. One item per distinct thing to verify — not "3 blocks in explorer" but one per block.

Outcome sections (include only non-empty):

Skipped Verification — per-item reason
Pre-existing Bugs Discovered — found during verification
Test Suite Status — command + per-suite counts

Recommendations:

Next steps (commit, push, review)
Open concerns needing human judgment

Index regeneration

After writing the scope-specific report, regenerate $ZSKILLS_AUDIT_DIR/VERIFICATION_REPORT.md as an index of all verification reports:

Scan $ZSKILLS_REPORTS_DIR/verify-*.md files
For each file: extract the H1 line (date, scope) and count [ ] checkboxes (action items remaining)
Write $ZSKILLS_AUDIT_DIR/VERIFICATION_REPORT.md with this structure:

# Verification Reports Index

Legend: ✅ all signed off, [ ] action items remain

## Needs Sign-off

{Extract ALL `[ ]` items from all reports. For each, show the checkbox,
the item title, and link to the source report file.}

- [ ] Block Rotation sign-off — [verify-working-tree.md]($ZSKILLS_REPORTS_DIR/verify-working-tree.md)
- [ ] Solver tolerance sign-off — [verify-last-3.md]($ZSKILLS_REPORTS_DIR/verify-last-3.md)

{If no `[ ]` items remain across any report, write: "All items signed off."}

**Staleness rule:** Reports older than 7 days with unchecked `[ ]` items are
flagged as **STALE** in the index (append ` ⚠️ STALE` to the action items
column). Stale items are never auto-removed — the user decides whether to
re-verify or dismiss them.

---

## Reports

| Report | Date | Scope | Action Items |
|--------|------|-------|:------------:|
| [verify-working-tree.md]($ZSKILLS_REPORTS_DIR/verify-working-tree.md) | 2026-03-18 14:30 | working tree | 2 [ ] |
| [verify-last-3.md]($ZSKILLS_REPORTS_DIR/verify-last-3.md) | 2026-03-17 09:15 | last 3 | ✅ |

If there are no $ZSKILLS_REPORTS_DIR/verify-*.md files (e.g., all were cleaned up), write just the header and "No verification reports found."

Post-report tracking

After writing the report (or confirming verification is clean), create the complete step marker and update the fulfillment file if a tracking ID is present:

if [ -f "${CLAUDE_PLUGIN_ROOT}/skills/update-zskills/scripts/zskills-resolve-config.sh" ]; then
  export CLAUDE_PLUGIN_ROOT="${CLAUDE_PLUGIN_ROOT}"
  . "${CLAUDE_PLUGIN_ROOT}/skills/update-zskills/scripts/zskills-resolve-config.sh"
else
  . "$CLAUDE_PROJECT_DIR/.claude/skills/update-zskills/scripts/zskills-resolve-config.sh"
fi
MAIN_ROOT=$(cd "$(git rev-parse --git-common-dir)/.." && pwd)
MARKER_STEM="verify-changes"
[ "$SCOPE" = "branch" ] && MARKER_STEM="verify-changes.final"
# 3-tier PIPELINE_ID resolution (see "Tracking Fulfillment" above).
PIPELINE_ID="${ZSKILLS_PIPELINE_ID:-}"
if [ -z "$PIPELINE_ID" ] && [ -f ".zskills-tracked" ]; then
  PIPELINE_ID=$(tr -d '[:space:]' < ".zskills-tracked")
fi
: "${PIPELINE_ID:=$MARKER_STEM.$TRACKING_ID}"
PIPELINE_ID=$(bash "$ZSKILLS_SKILLS_ROOT/create-worktree/scripts/sanitize-pipeline-id.sh" "$PIPELINE_ID")
[ -n "$PIPELINE_ID" ] || { echo "tracking: empty PIPELINE_ID — refusing flat write" >&2; exit 1; }
printf 'completed: %s\n' "$(TZ="${TIMEZONE:-UTC}" date -Iseconds)" \
  > "$MAIN_ROOT/.zskills/tracking/$PIPELINE_ID/step.verify-changes.$TRACKING_ID.complete"

printf 'skill: verify-changes\nid: %s\nscope: %s\nstatus: complete\ndate: %s\n' \
  "$TRACKING_ID" "$SCOPE" "$(TZ="${TIMEZONE:-UTC}" date -Iseconds)" \
  > "$MAIN_ROOT/.zskills/tracking/$PIPELINE_ID/fulfilled.$MARKER_STEM.$TRACKING_ID"

Key Rules

Never commit, merge, or push without explicit user permission — unless working in a worktree where committing fixes is part of the workflow (Phase 5). Even then, never merge the worktree into main or push.
Never weaken tests. Fix the code, not the test. Do not loosen tolerances, skip assertions, or remove test cases.
Ultrathink. Use careful, thorough reasoning throughout. Read code carefully. Understand what changed and why before judging correctness.
Never verify from memory. Read actual diffs, actual files, run actual tests. Even if you just wrote the code — read it again. Memory is not verification.
Real events for manual testing. Never use eval or page.evaluate() to simulate user actions. Use real click/drag/type/press events.
Fix, don't just report. When problems are found, fix them — then re-verify. The goal is a clean verification, not a list of issues left for the user.
Start a dev server if needed. "No dev server" is not an excuse to skip manual verification. Run $DEV_SERVER_CMD & (canonical form — maintainers: see references/canonical-config-prelude.md §1 in the zskills source) — it takes 2 seconds. Only report "cannot verify" for genuinely unavailable tooling (e.g., no cargo for codegen).
Be thorough but honest. If something genuinely can't be verified, say so explicitly. Don't skip silently.
Respect existing changes. Never discard, revert, or overwrite uncommitted work that isn't part of the verification scope.

Change	Agent Verify	User Verify
Button offset fix	PASS (screenshot)	NEEDED
Solver tolerance	PASS (tests)	N/A

verify-changes

More from this repository

/verify-changes [ | last [N]] — Verify, Test & Fix Changes

Dispatch protocol

Test-command resolution (BEFORE Phase 3)

Arguments

Parsing $ARGUMENTS

Tracking Fulfillment

Phase 1 — Inventory Changes

Phase 2 — Test Coverage Audit

Phase 3 — Run Tests

Post-tests tracking

Phase 4 — Agent Verification + User Verification Classification

Agent verification steps

Post-manual-verification tracking

Phase 5 — Fix Problems

Phase 6 — Re-verify (max 2 rounds)

Phase 7 — Report

Scope slug derivation

Report structure

Index regeneration

Post-report tracking

Key Rules

/verify-changes [ | last [N]] — Verify, Test & Fix Changes

Dispatch protocol

Test-command resolution (BEFORE Phase 3)

Arguments

Parsing $ARGUMENTS

Tracking Fulfillment

Phase 1 — Inventory Changes

Phase 2 — Test Coverage Audit

Phase 3 — Run Tests

Post-tests tracking

Phase 4 — Agent Verification + User Verification Classification

Agent verification steps

Post-manual-verification tracking

Phase 5 — Fix Problems

Phase 6 — Re-verify (max 2 rounds)

Phase 7 — Report

Scope slug derivation

Report structure

Index regeneration

Post-report tracking

Key Rules

More from this repository