| name | feedback-loop |
| description | Construct a deterministic, agent-runnable pass/fail signal for a bug, regression, or feature — fast, sharp, and reproducible without a human in the loop. Use when the user says "build a test harness", "make this reproducible", "what's the cheapest reproducer for this bug", "the dev loop is too slow", or "we need a deterministic check for this". This is the reference skill the rest of the Valesco engineering chain leans on: `/diagnose` Phase 1 invokes it to find a bug's cause, `/tdd` invokes it to drive new code red→green. Distinct from `/diagnose` (which *uses* loops to locate causes) and `/tdd` (which *uses* loops to drive incremental implementation) — this skill is about *constructing* the loop itself. Whenever you find yourself debugging by re-reading code or running ad-hoc commands without a deterministic pass/fail, stop and run this. |
/feedback-loop
Construct a fast, sharp, deterministic, agent-runnable pass/fail signal for
the behavior under question. Pick from a catalog of ten patterns, harden
the loop against flakes, and hand it off to whatever skill called this one.
This skill is advisory. It writes test files, scripts, harnesses, and
fixtures — never tracker labels.
Why a deterministic loop is load-bearing
Two things in the Valesco chain depend on the loop existing and behaving
predictably:
/diagnose Phase 1. Without a loop, hypothesis-testing degrades to
hand-waving. Phase 1 is the entire skill — the rest is mechanical once
the signal exists.
/tdd red→green. A test that takes 30 seconds to fail breaks the
tracer-bullet rhythm; the agent stops listening to its own feedback.
The loop also tends to graduate into a regression test that lives in the
repo permanently, so the next person hitting the same bug — including
Claude Code running inside runway — has a deterministic check available.
If you don't have a loop, you don't have engineering — you have
storytelling. Build the loop.
When to run
- Inside
/diagnose Phase 1, when you've reproduced the bug by hand but
need an automatable signal.
- Inside
/tdd, when picking the seam for the first failing test in a new
feature slice.
- Standalone, when the user says "the dev loop is too slow" — the
iteration ladder applies even when no specific bug is on the table.
When NOT to run
- The loop already exists and runs in CI — use it; don't rebuild.
- The "loop" the user wants is a manual checklist for stakeholders. That's
a runbook, not a feedback loop.
- The bug only matters in production with real customer data. First run
/triage to determine whether you have legitimate access to a
reproducer; this skill cannot manufacture one out of nothing.
Three properties of a good loop
Every loop, regardless of pattern, is judged on three axes. A 30-second
flaky loop is barely better than no loop at all; a 2-second deterministic
loop is a debugging superpower.
Fast
- Target: under 5 seconds wall-clock per iteration.
- Superpower threshold: under 2 seconds. At this speed the agent (or
human) iterates without losing context between cycles.
- Disqualifying: over 30 seconds per iteration — the iteration rhythm
collapses.
How to get fast: cache setup steps (don't re-bootstrap the DB on every
iteration), narrow the scope (run one test, not the whole file), skip
unrelated init, swap heavy deps for thin fakes only at the outer
boundary.
Sharp
A sharp loop asserts on the specific symptom, not "didn't crash". If
the bug is "wrong total when cart contains a 0-priced item", the
assertion is assert total == 0, not assert order.completed_ok.
Sharp loops survive refactors because they describe behavior, not
structure. A loop that fails when you rename an internal function was
testing implementation, not behavior — see /tdd.
Deterministic
Same input, same output, every run, on every machine. The four sources of
non-determinism to neutralize:
- Time — pin the clock (
vi.useFakeTimers(), freezegun, or inject
a clock dependency).
- Randomness — seed RNG (
Math.random is rarely the issue; UUIDs and
test data factories are).
- Filesystem — isolate per-test (
tmp.mkdtempSync(), ephemeral
Supabase branch via mcp__a80053c7…__create_branch).
- Network — freeze with a recorded fixture (MSW, VCR, nock) or run
inside a container with no egress.
If you cannot neutralize one of the four, document which and why in a
comment alongside the loop.
Pattern catalog
Pick the cheapest pattern that produces a sharp signal. The patterns are
ordered by general preference — when in doubt, try them in this order.
1. Failing test
A test at the seam closest to the bug — unit when behavior is local,
integration when the bug only appears across module boundaries, e2e when
the bug is in wiring.
2. Curl / HTTP harness
A shell script that hits a running dev server with a recorded request and
asserts on the response shape, status, or specific field.
3. CLI fixture diff
Invoke the CLI with a fixture input file and diff stdout against a
recorded snapshot.
4. Headless browser script
Playwright or Puppeteer drives the real UI; the script asserts on DOM
state, console messages, or network calls.
5. Replay a captured trace
Save a real network request / payload / event log to disk; replay it
through the code path in isolation.
6. Throwaway harness
A minimal subset of the system — one service, faked deps — that exercises
the bug code path with a single function call. Lives in scripts/debug/,
not in the test suite.
7. Property / fuzz loop
Run 1000+ random inputs through the system and look for the failure mode.
- When it fits: "sometimes wrong output" bugs; serializers, parsers,
state machines, anything with a large input space; no obvious
triggering input.
- Pitfalls: unbounded test time (cap with
numRuns); shrinking
failures the framework can't simplify (write a custom shrinker); flakes
from network or time inside the property (don't — keep the property
pure).
- Skeleton:
fc.assert(fc.property(fc.array(fc.integer()), arr => {
const sorted = mySort(arr);
return isAscending(sorted);
}), { numRuns: 1000, seed: 42 });
- Iteration ladder: seed the PRNG (
seed: 42); save the smallest
counterexample as a regression test; raise numRuns only after the
shrinker is reliable.
8. Bisection harness
git bisect run over a script that boots state X, checks the bug,
exits 0 (good) or 1 (bad).
9. Differential loop
Run the same input through two configurations — old version vs new, two
deployments, two implementations — and diff outputs.
10. HITL bash
A bash script that prompts a human for the action, captures the result,
and feeds it back to the agent. Last resort — humans aren't part of an
autonomous run, so this disqualifies the loop as a regression test that
runway / CI can re-run later.
Which pattern do I pick?
Walk this tree top-to-bottom. Stop at the first "yes."
- Is there an existing test seam that exercises the behavior end-to-end?
→ Pattern 1 (failing test).
- Is the surface an HTTP API and you have a dev server? → Pattern 2
(curl harness).
- Is the surface a CLI or text transformer? → Pattern 3 (CLI fixture
diff).
- Is the bug only visible in the rendered UI? → Pattern 4 (headless
browser).
- Can you replay a captured production trace? → Pattern 5 (replay
trace).
- Is the system too tangled or mid-refactor to test in place?
→ Pattern 6 (throwaway harness).
- Is the bug "sometimes" — non-deterministic on input? → Pattern 7
(property / fuzz).
- Did the bug appear between two known commits / versions? → Pattern 8
(bisection).
- Are there two configurations or versions you can compare? → Pattern 9
(differential).
- Final fallback — and only after honestly trying 1–9: Pattern 10
(HITL bash). The loop will not be replayable by an autonomous run, so
the issue probably needs
needs-human rather than Todo.
If you fall off the bottom of the tree without a fit, the bug isn't
loopable yet — return to /triage and request reproducer access,
trace capture, or production instrumentation as needs-info.
Iteration ladder
Apply across every pattern. Climb until the loop is fast, sharp, and
deterministic enough to defend.
| Step | Cheap | Expensive |
|---|
| Cache setup | Memoize fixture loads, keep dev server warm. | Snapshot the entire DB and restore between runs. |
| Narrow scope | Filter to one test (-t name). | Compile a per-test bundle that excludes unrelated code. |
| Pin time | Inject a clock; freeze in the test. | Run inside a container with the system clock pinned. |
| Seed RNG | Pass an explicit seed to test data factories. | Replace global RNG with a deterministic-by-default wrapper. |
| Isolate filesystem | tmp.mkdtempSync() per test. | Per-test ephemeral Supabase branch. |
| Freeze network | MSW / VCR / nock at the route level. | Container with no egress; recorded HAR replay. |
| Raise repro rate | Loop the trigger 100×; tighten timing windows. | Inject targeted sleeps to widen the race window; run under TSAN. |
For non-deterministic bugs the goal is not a clean repro but a higher
reproduction rate. A 50%-flake bug is debuggable; 1% is not. Climb the
ladder until the rate is workable, then build the loop on top of that.
Hand-off — how the loop becomes downstream input
The loop you build here usually doesn't stay in scripts/debug/. It
graduates into the project's permanent infrastructure:
| Caller | What the loop becomes |
|---|
/diagnose Phase 1 | The active feedback loop for hypothesis testing. The Phase 5 regression test typically condenses it into a permanent test. |
/tdd | The first failing test in the red→green cycle. Each subsequent slice extends or copies it. |
| Issue body for runway pickup | Once a loop graduates into a permanent test, reference it by path in the issue body so Claude Code (running inside runway) sees both the spec and a deterministic check. |
Self-validation before declaring done
If any check fails, fix the loop before returning. A loop that "mostly
works" is the same as no loop — it pollutes downstream skills with noise.
Non-goals
- No test framework prescription. Use what the project uses. If the
project doesn't have one, surface that to the user and let them choose
before continuing — picking a framework is a project-level decision,
not a per-loop one.
- No performance budget setting. "Loop should be fast" is the
discipline here. "API should respond in < 100ms" is acceptance-criteria
territory in the issue body, not a property of this skill.
- No fix authoring. This skill ends when the loop reliably observes
the bug. The fix is
/diagnose Phase 5 or /tdd green.
- No production instrumentation. Loops live in test code or
scripts/debug/, never in src/. Production probes need explicit user
approval.
- No CI integration. Hooking the loop into CI is a project-level
decision after the loop is proven locally.
References
../diagnose/SKILL.md — the primary caller;
Phase 1 is "build a loop using this skill."
- Matt Pocock's
/tdd
— the red→green caller; this skill is the "build the red" half.
- Matt Pocock's
/diagnose
— Phase 1 of his version is the doctrinal source for the ten-pattern
catalog; this skill is the expanded reference.