| name | qe-audit |
| disable-model-invocation | true |
| argument-hint | [bash [area]] [every SCHEDULE] [now] | stop | next |
| description | QE audit: check recent commits for test coverage gaps, or bash/stress-test features to find bugs. Supports scheduling with every/now/next/stop. Usage: /qe-audit [bash [area]] [every SCHEDULE] [now] | stop | next. |
/qe-audit [bash [area]] [every SCHEDULE] [now] | stop | next -- Quality Engineering Audit
Two modes of quality assurance:
- Commit audit (default) -- review recent commits for test coverage gaps,
missing tests, and bugs. Files GitHub issues for findings.
- Bash -- adversarial stress-testing of features. Pick a specific area
or let the agent choose under-tested areas. Try to break things with edge
cases, unusual inputs, and unexpected workflows.
Both modes file GitHub issues and update plans/QE_ISSUES.md. Both are
schedulable. Together they form the quality feedback loop: audit finds gaps ->
/fix-issues fixes them -> audit validates the fixes.
Ultrathink throughout.
Arguments
/qe-audit [bash [area]] [every SCHEDULE] [now]
/qe-audit stop | next
- bash (optional) -- switch to bash/stress-test mode instead of commit audit
- area (optional, with bash) -- specific feature or area to bash. If
omitted, the agent picks under-tested areas based on coverage data and
recent changes. Examples:
"fish behavior", "genetics/breeding",
"rendering", "live share", "UI interactions"
- every SCHEDULE (optional) -- self-schedule recurring runs via cron:
- Accepts intervals:
4h, 2h, 30m, 12h
- Accepts time-of-day:
day at 9am, day at 14:00, weekday at 9am
- Without
now: schedules only, does NOT run immediately
- With
now: schedules AND runs immediately
- Each run re-registers the cron (self-perpetuating)
- Cron is session-scoped -- dies when the session dies
- now (optional) -- run immediately. When combined with
every, runs
immediately AND schedules. Without every, now is the default behavior
(bare invocation always runs immediately).
- stop -- cancel any existing
/qe-audit cron and exit. Takes
precedence over all other arguments.
- next -- check when the next scheduled run will fire. Takes precedence
over all other arguments except
stop.
Detection: scan $ARGUMENTS for:
stop (case-insensitive) -- cancel cron and exit (highest precedence)
next (case-insensitive) -- check schedule and exit
bash (case-insensitive) -- bash mode (everything after bash until
every/now/stop/next is the area description)
now (case-insensitive) -- run immediately
every followed by a schedule expression -- scheduling mode
Examples:
/qe-audit -- audit recent commits now
/qe-audit bash -- bash random under-tested features now
/qe-audit bash "fish behavior" -- bash a specific feature now
/qe-audit bash "genetics" every 6h -- schedule genetics bashing every 6h
/qe-audit every day at 9am -- schedule daily commit audit (first run at 9am)
/qe-audit every day at 9am now -- schedule daily + run now
/qe-audit every weekday at 9am -- weekday mornings only
/qe-audit bash every 12h now -- bash random features every 12h, start now
/qe-audit next -- when's the next audit?
/qe-audit stop -- cancel scheduled audits
Now (standalone -- just now with no mode or schedule)
If $ARGUMENTS is just now (no bash, no every, no area):
- Use
CronList to list all cron jobs
- Find any whose prompt starts with
Run /qe-audit
- If found: extract the cron's prompt to get mode, area, and schedule.
Run immediately -- do NOT ask for confirmation. The cron stays active.
- If none found: report
No active /qe-audit cron to trigger. Use /qe-audit to run a commit audit manually. and exit.
Next (if next is present)
If $ARGUMENTS contains next (case-insensitive):
- Use
CronList to list all cron jobs
- Find any whose prompt starts with
Run /qe-audit
- Report:
- If found: parse the cron expression and compute the next fire time.
Use
date +%Z for the timezone. Show both relative and absolute:
Next QE audit in ~14h 47m (~9:03 AM ET tomorrow, cron XXXX).
Mode: commit audit
Prompt: Run /qe-audit every day at 9am now
- If none found:
No active /qe-audit cron in this session.
- Exit. Do not run anything.
Stop (if stop is present)
If $ARGUMENTS contains stop (case-insensitive):
- Use
CronList to list all cron jobs
- Delete ALL whose prompt starts with
Run /qe-audit using CronDelete
- Report what was cancelled:
- If one found:
QE audit cron stopped (was job ID XXXX).
- If multiple found:
Stopped N QE audit crons (IDs: XXXX, YYYY).
- If none found:
No active /qe-audit cron found.
- Exit. Do not run anything.
Phase 0 -- Schedule (if every is present)
If $ARGUMENTS contains every <schedule>:
-
Parse the schedule -- convert to a cron expression.
For interval-based schedules (4h, 2h, 30m): use the CURRENT
minute as the offset so the first fire is a full interval from now.
Check the current minute with date +%M:
4h at minute 9 -> 9 */4 * * *
12h at minute 9 -> 9 */12 * * *
30m -> */30 * * * * (no offset needed for sub-hour)
For time-of-day schedules: offset round minutes by a few:
day at 9am -> 3 9 * * *
day at 14:00 -> 3 14 * * *
weekday at 9am -> 3 9 * * 1-5
-
Deduplicate -- use CronList + CronDelete to remove any whose
prompt starts with Run /qe-audit.
-
Construct the cron prompt. Always include now in the cron prompt
so each cron fire runs immediately AND re-registers itself. Note: this
now is for the CRON's invocation, not the current invocation:
Run /qe-audit [bash [area]] every <schedule> now
-
Create the cron -- use CronCreate:
cron: the cron expression from step 1
recurring: true
prompt: the constructed command from step 3
-
Confirm with wall-clock time. Always show times in America/New_York
(ET) -- use TZ=America/New_York date for conversion:
If now is present:
QE audit scheduled every day at 9am. Running now.
Next audit after this one: ~9:03 AM ET tomorrow (cron ID XXXX).
If now is NOT present:
QE audit scheduled every day at 9am.
First run: ~9:03 AM ET tomorrow (cron ID XXXX).
Use /qe-audit next to check, /qe-audit stop to cancel.
-
If now is present: proceed to the audit/bash.
If now is NOT present: Exit. The cron fires later.
If every is NOT present, skip this phase and proceed to the audit/bash
(bare invocation always runs immediately).
Mode: Commit Audit (default)
Run when bash is NOT present in arguments.
-
Find the last audit checkpoint -- read the bottom of plans/QE_ISSUES.md
for the last audited commit range and date (format: *Last audited: YYYY-MM-DD -- commits <hash> through <hash>*). If the file doesn't exist
or has no checkpoint, fall back to git log --oneline -20.
-
List new commits -- git log --oneline <last_commit>..HEAD. Skip
QE-generated commits (messages matching fix: N QE issues, fix: QE batch,
docs: QE audit, or test: QE batch).
-
If no new commits -- report "no new commits since last audit" and stop.
-
Audit each commit -- For each commit with code changes, dispatch
parallel Explore agents using the Agent tool (group 4-5 commits per
agent). Do not audit all commits yourself -- dispatch agents for fresh
eyes on each batch:
- Read the diff (
git show <hash>)
- Read related test files
- Assess: Are tests good (testing real behavior, not no-ops)? Are there
coverage gaps? Are there bugs?
- Rate severity: Critical / High / Medium / Low / Very Low
-
Create GitHub issues -- For actionable findings (Medium+ severity, or
Low with clear fix), create issues via gh issue create. Include: summary,
root cause, suggested fix/test, severity, and which commit introduced it.
-
Update tracker -- Edit plans/QE_ISSUES.md:
- Add new issues to "Open Issues" section
- Move any resolved issues to "Resolved Issues"
- Update the audit date and commit range at the bottom
-
Report -- Summarize findings: issues filed, notable positives, and
overall assessment. If a cron is active, include the next run time:
Audit complete. Filed N issues. Next audit in ~23h 55m (~9:03 AM ET
tomorrow, cron XXXX).
Tips for commit audit
- Skip docs-only, config-only, and log-only commits
- For fish behavior and genetics commits, pay extra attention to simulation correctness
- Check that test files in
tests/unit/ and tests/e2e/ cover new functionality
Mode: Bash (stress-test)
Run when bash IS present in arguments.
-
Select target area:
- If area specified (e.g.,
bash "fish behavior"): use that area
- If no area specified: pick under-tested areas based on:
- Files with low test-to-code ratio
- Features with recent bug fixes (fragile areas)
- Complex code paths (fish AI, genetics, rendering)
- Areas not recently audited
-
Research the target area:
- Read the source code for the selected feature
- Read existing tests -- what's already covered?
- Identify edge cases, boundary conditions, unusual inputs
- Think adversarially: what could break? What assumptions are fragile?
- Identify what types of testing apply (see step 3)
-
Test the area thoroughly -- use ALL applicable methods, not just
unit tests. The goal is to exercise the feature the way a user would,
plus adversarial edge cases:
a. Manual UI testing (for interactive, UI, visual features):
- Use Playwright or manual browser testing
- Exercise real workflows: interact with fish, tap/click, resize window,
navigate tabs, share links
- Test edge cases: rapid clicks, empty inputs, overlapping
elements, browser resize during interaction
- Take screenshots as evidence of bugs found
b. Adversarial unit tests (for all areas):
- Edge cases (empty inputs, zero values, NaN, Infinity, negative numbers)
- Boundary conditions (max array size, deeply nested structures)
- Race conditions (rapid interactions, concurrent operations)
- Invalid state (corrupted data, missing references)
- Unusual workflows (unexpected user actions, rapid state changes)
c. Integration testing (for cross-cutting features):
- Full workflows end-to-end
- Cross-feature interactions
-
Run automated tests:
npm test
- Tests that PASS: the feature handles the edge case correctly. Good.
- Tests that FAIL: found a bug. File a GitHub issue.
- Tests that CRASH: found a serious bug. File a high-severity issue.
-
File GitHub issues for each failure (from any testing method):
- Include: what was tested, expected vs actual behavior, test code,
severity rating, suggested fix
- Tag with appropriate labels
-
Update plans/QE_ISSUES.md with new findings.
-
Clean up test files:
- Keep passing adversarial tests (they're valuable regression tests)
- Keep failing tests too -- do NOT remove or comment them out. CLAUDE.md
says tests must pass. A failing bash test is evidence of a real
bug. Mark them with
{ todo: 'Bug found by QE bash -- see #NNN' } so
they're skipped but preserved, and file a GitHub issue for each.
- ONLY use
todo for bugs you just DISCOVERED during this bash
session. NEVER use todo to skip a test that was passing before
and now fails due to your changes -- that's weakening, not discovery.
- Run
npm test before committing -- all suites must pass
(todo-skipped tests are acceptable)
- Commit all tests (passing + skipped) with descriptive message
-
Report -- Summarize:
- Area bashed
- Testing methods used (manual UI, adversarial unit tests, integration)
- Scenarios tested (count per method)
- Bugs found (count, severity, method that found them)
- Issues filed (numbers)
- Passing tests committed
- Screenshots taken (if manual testing)
- If a cron is active, include the next run time
Key Rules
- Never weaken tests -- if a bash test reveals a real bug, file an issue.
Don't make the test pass by loosening assertions.
every implies autonomous operation -- scheduled audits run without
user approval.
- Deduplicate crons -- always remove existing
/qe-audit crons before
creating a new one.
- Crons are session-scoped -- they expire when the session dies.
- File issues, don't fix inline -- QE audit finds problems.
/fix-issues
fixes them. Keep the separation clean.
- Ultrathink -- use careful, thorough reasoning. Read code, understand
what changed and why, verify correctness.