con un clic
more-loop-verify
// Create a verification file (shell script or markdown checklist) for use with the more-loop iterative development script
// Create a verification file (shell script or markdown checklist) for use with the more-loop iterative development script
| name | more-loop-verify |
| description | Create a verification file (shell script or markdown checklist) for use with the more-loop iterative development script |
| disable-model-invocation | true |
| argument-hint | [run-name] |
| allowed-tools | Read, Write, Glob, Grep, AskUserQuestion, Bash |
You are helping the user create a verification file for use with more-loop, an iterative development script that wraps the claude CLI.
Verification runs at two points with different roles:
.sh): Executed with bash in a separate process group. Exit code 0 = PASS, non-zero = FAIL. Stdout/stderr is captured as feedback..md): Fed to a fresh claude -p process that evaluates the checklist against the current codebase state and outputs PASS or FAIL with reasoning.Since .sh verify files are the final acceptance gate, they should be comprehensive and test the complete deliverable.
Ask the user which verification type they want:
.sh) — Best for: tests passing, files existing, builds succeeding, linting, concrete measurable checks. Fast and deterministic..md) — Best for: code quality, architecture decisions, subjective criteria, checking things that are hard to script. Uses an LLM call so slower and costs tokens.If $ARGUMENTS is provided and ends in .sh or .md, infer the type from the extension.
Ask what should be verified — what does "done" look like? Ask about:
Ask at most 2 rounds of clarifying questions.
Scan the codebase if relevant — find existing test commands, build scripts, linters, CI config to build on what's already there.
Write the verification file using the appropriate format below.
.sh)Write a bash script that:
#!/usr/bin/env bash and set -euo pipefailTemplate:
#!/usr/bin/env bash
set -euo pipefail
echo "=== Checking build ==="
<build command>
echo "=== Checking tests ==="
<test command>
echo "=== Checking lint ==="
<lint command>
echo "=== All checks passed ==="
Guidelines:
npm test, make check, pytest, etc.).md file for those.md)Write a markdown file with - [ ] checklist items that Claude can evaluate by examining the codebase.
Template:
# Verification Checklist
## Functionality
- [ ] <Specific, evaluable criterion>
- [ ] <Another criterion>
## Code Quality
- [ ] <Quality criterion>
## Tests
- [ ] <Test criterion>
Guidelines:
Files are organized under .more-loop/runs/<run-name>/:
$ARGUMENTS if provided (as a slug, e.g., web-dashboard). Otherwise, derive a short kebab-case slug from what the user described, or ask the user. If a .more-loop/runs/<run-name>/prompt.md already exists for a matching topic, use the same <run-name> directory so the prompt and verify files live together..more-loop/runs/<run-name>/.more-loop/runs/<run-name>/verify.sh (or verify.md for markdown type)If $ARGUMENTS looks like an explicit file path (contains / or ends in .sh/.md), respect it as-is instead of using the directory convention.
If writing a .sh file, make it executable (chmod +x).
Write tests at the highest behavioral level possible. Avoid keyword-grep-only tests.
Level 1 — Behavioral (preferred): Execute the actual program, observe runtime behavior.
# Good: Actually start the server and test it
python3 server.py "$RUN_DIR" --port "$PORT" &
curl -sf http://127.0.0.1:$PORT/state.json | python3 -c "import sys,json; d=json.load(sys.stdin); assert 'phase' in d"
Level 2 — Structural: Verify code structure (function exists, is called, correct arguments).
# Good: Verify function is defined AND called
grep -q '^write_state_json()' script.sh
grep -c 'write_state_json ' script.sh # count call sites
Level 3 — Keyword grep (last resort): Only when behavioral/structural tests are impractical.
# Bad: grep "approve" script.sh (matches comments, strings, anything)
# Better: grep -q 'APPROVE_SIGNAL' script.sh (specific constant name)
Never assume how a program is invoked based on the spec. Always check the real interface first:
argparse block or run python3 script.py --helpusage() function or run bash script.sh --help# Bad — assumes flags from spec wording:
python3 server.py --run-dir "$DIR" --dashboard "$HTML"
# Good — matches actual argparse definition:
python3 server.py "$DIR" --port "$PORT"
Include a coverage comment block at the top of each verification file that maps spec requirements to test status:
# Spec coverage:
# [TESTED] Web server serves dashboard HTML — Section 6
# [TESTED] POST /approve creates signal file — Section 6
# [GREP] State JSON has required fields — Section 5 (structural check only)
# [GAP] Approval timeout countdown display — not tested
Use [TESTED] for behavioral tests, [GREP] for keyword/structural checks, [GAP] for untested requirements.
Always include tests for opt-in features being inactive by default and error paths:
# Opt-in feature disabled when flag absent:
check "no state.json without --web" bash -c "! test -f \"\$RUN_DIR/state.json\""
# Error message on bad input:
check "error on missing file" bash -c "bash script.sh --web nonexistent 2>&1 | grep -qi 'error\\|not found'"
# Edge case handling:
check "timeout 0 means infinite wait" grep -q 'timeout.*0\|infinite\|forever' script.sh
.sh vs .mdShell script (.sh) — things that can be automated:
Markdown checklist (.md) — things that need human/LLM judgment:
Do NOT put in .md:
.sh.shcurl, grep, or python3 -c — use .shCross-referencing: When both files exist, the .md file should note which items are already verified by .sh and focus on what .sh cannot check. Use phrasing like "Verified by shell script; this item checks the design rationale."
In addition to the guidelines above:
Name the target: Each item should specify the file, function, or code region being checked.
server.py, the DashboardHandler._serve_state method returns a JSON error response (not a bare 500) when state.json is missing"Use "code contains" phrasing: Write items as falsifiable statements about code existence, not runtime behavior claims.
server.py, HTTPServer is instantiated with '127.0.0.1' as the bind address (not '' or '0.0.0.0')"Must be falsifiable: Every item must be answerable as true/false by reading the code. If an item requires running the program to evaluate, move it to .sh.