ワンクリックでManusで任意のスキルを実行

$pwd:

abadge-e2e-sweep

Name: Abadge E2e Sweep
Author: punitarani

// Use when the user wants to run, resume, monitor, or stop a long-running end-to-end test sweep of the abadge codebase (web, API, CLI, MCP, daemon, crypto, DB, SDK), including phrases like "sweep abadge", "run the e2e audit", "continue the test campaign", "resume the sweep", "what's the sweep finding", "stop the sweep", or any request to methodically test every surface of abadge in a loop with subagents and durable issue tracking.

Manusで実行

$ git log --oneline --stat

stars:1

forks:0

updated:2026年5月29日 03:37

ファイルエクスプローラー

27 ファイル

SKILL.md

readonly

related-skills.json

同じリポジトリ

pr-ready.md

from "punitarani/abadge"

Use when getting an abadge PR merge-ready, checking whether a branch is mergeable, resolving conflicts against main, recovering after a rebase, verifying CI is green, or shepherding a PR through checks and review comments. Triggers on "is this merge-ready", "get this PR green", "rebase onto main", "did I lose commits", "force push", "address the review comments".

2026-05-291

abadge-security-audit.md

from "punitarani/abadge"

Use when the user wants to run, resume, monitor, or stop a deep, multi-wave security/compliance audit of the abadge codebase — code review, pen testing, threat modelling, and the full cybersecurity review pipeline. Triggers on phrases like "security audit abadge", "pen test the codebase", "start the security review", "continue the security audit", "what did the audit find", "generate the security report", "stop the audit", "production readiness security checklist", or any request to methodically audit all trust boundaries of abadge (api, web, sdk, cli, mcp, daemon, crypto, auth, db) in a loop with subagents, durable finding files, and honest saturation gating. READ-ONLY by contract — no code edits. Distinct from abadge-e2e-sweep, which tests functional correctness; this skill reasons about adversarial behaviour.

2026-05-281

live-test-matrix.md

from "punitarani/abadge"

Define and execute a comprehensive end-to-end test matrix for an abadge feature against a live local stack — not just code-level integration tests, but real CLI binary invocations, real Hono+tRPC API calls on the wrangler emulator, real Better Auth sessions, real agent bearer tokens, and real Postgres state verification. Categorize the matrix into happy paths, edge cases, adversarial scenarios, and security pentests (≥3 variations per category), track every row in a `TESTING.md` running log, and execute via a generated bash harness. Use this skill whenever the user wants thorough manual or end-to-end testing of a feature, asks to "pentest" or "adversarially test" something, says "actually test it not just code tests" or "manually run the CLI against this", asks for a test matrix with multiple categories, or wants to verify a feature works against a real running stack. Prefer this skill over ad-hoc one-off testing scripts whenever the user wants more than three or four assertions, even if they don't explicitly

2026-05-121

cli-release.md

from "punitarani/abadge"

Prepare, validate, and publish abadge CLI releases and the PRs that carry them. Use when updating the CLI release pipeline, checking changesets or versioning, dry-running release artifacts or the installer, or committing, pushing, reviewing, and merge-prepping the CLI release PR.

2026-04-041

package.json

"author": "punitarani"

"repository": "punitarani/abadge"

GitHub リポジトリを開く Creator のリポジトリを見る

$ install --global

$ download --local

Manusで実行

name

abadge-e2e-sweep

description

Use when the user wants to run, resume, monitor, or stop a long-running end-to-end test sweep of the abadge codebase (web, API, CLI, MCP, daemon, crypto, DB, SDK), including phrases like "sweep abadge", "run the e2e audit", "continue the test campaign", "resume the sweep", "what's the sweep finding", "stop the sweep", or any request to methodically test every surface of abadge in a loop with subagents and durable issue tracking.

abadge E2E Sweep

Methodical, resumable, subagent-driven test campaign across every abadge surface. Runs in a ralph-style loop that survives compaction and session restarts because all state lives in files. Lessons baked in from the prior 224-iteration campaign that produced 184 catalogued issues — including the saturation gate that earlier campaign never had.

Announce on use: "Using abadge-e2e-sweep to <start|resume|status|report|cancel> the sweep."

Mental model

The sweep is a 2-D matrix:

	happy path	adversarial	edge	regression
API	…	…	…	…
CLI	…	…	…	…
Web	…	…	…	…
MCP	…	…	…	…
Daemon	…	…	…	…
Crypto	…	…	…	…
DB	…	…	…	…
SDK	…	…	…	…
Docs	…	…	…	…

Each cell is a test bundle: a focused subagent prompt with explicit scope and a JSON report contract.

The controller (the main session) walks the matrix BFS by default (one cell per surface, then rotate), then DFS into surfaces with open findings. Every iteration it:

Reads state/plan.yaml to pick the next undone cell
Dispatches one or more subagents (in parallel when independent)
Aggregates JSON reports
Runs the bug-triager subagent against state/issues.md to dedupe
Updates progress + saturation counters
Calls advisor() at planned checkpoints

The outer loop is ralph-loop — the same prompt re-fires every iteration; files carry the memory.

Operations

The user invokes the skill with one of: start, resume, status, doctor, recover, report, cancel, reset. If no operation is given, default to status (it's the safest read-only op).

`start [--surfaces api,cli,...] [--max-iterations N] [--mode bfs|dfs|hybrid]`

Refuse if state/active.yaml already exists — direct the user to resume or cancel first.
Capture the controller's Claude Code session id (ask the user, or read $CLAUDE_CODE_SESSION_ID from their terminal). You will need it in step 3.
Run scripts/sweep-init.sh <surfaces> <mode> --session-id <id> to create state/ directory under docs/superpowers/sweeps/<run-id>/ and seed plan.yaml from assets/plan-seed.yaml. Passing the session id explicitly is important — otherwise the stop-hook's session-isolation guard degrades to "any session drives the loop," which is how a dead controller leaves a zombie driver that hijacks unrelated sessions.
Verify the dev stack is alive (bun run dev ports 3000 + 8787/8788) — if not, prompt the user to start it before continuing. Do not start it for them; that's a foreground process they need to own.

Hand off to the loop engine by running:

/ralph-loop:ralph-loop "$(cat scripts/sweep-iteration-prompt.md)" --max-iterations <N> --completion-promise "SWEEP_COMPLETE"

Print the run-id and state-dir path so the user can tail it externally.

`resume`

Resume continues an existing run in place. It never creates a new run, never overwrites sweep memory, and never rewrites active.yaml's run-config fields. Its only job is to re-attach the ralph-loop driver to state that already exists.

Refuse if state/active.yaml is missing — direct to start. (A missing active.yaml means there is nothing to resume.)
Read state/active.yaml for run-id, surface list, mode, and max iters. Do not modify those fields — they belong to the original run.
If status: cancelled or status: completed: refuse and surface the current status. Explicit recover uncancel --apply is required to revive a cancelled run; completed runs should not be revived at all.
Ensure the ralph-loop state exists and names this session. Rather than calling /ralph-loop:ralph-loop from scratch (which would stamp a fresh started_at and potentially reset the iteration counter), invoke:
```
scripts/sweep-recover.sh reseed-ralph --apply
scripts/sweep-recover.sh set-session <your-session-id> --apply
```
reseed-ralph reads the existing progress.yaml.iteration and writes the ralph state with that iteration as the starting point — so the very next stop-hook fires iteration N+1 of the same run, not iteration 1 of a new one. It refuses to clobber a ralph state younger than 60 s without --force, so running it when a live loop exists is safe.
Tell the user the run-id the loop re-attached to, the current iteration, and that findings/plan/progress are preserved untouched.

Resume is a boundary, not a restart. The sweep's memory — plan.yaml, progress.yaml, issues.md, iteration-log.md, repros/ — is owned by the original run and must not be rewritten, truncated, or re-seeded during resume. If any of those files are corrupt, that is a reset (destructive) conversation, not resume.

`status`

Read state/progress.yaml and print:

run-id, started_at, current iteration / max
per-surface counts: tested / total, open_issues / closed
top 3 most-recent findings (last issues.md entries)
saturation flag (consecutive zero-finding iters)
next planned cells

This is a pure read; never mutates state.

`doctor`

Run scripts/sweep-doctor.sh to diagnose why the loop looks stuck. Read-only. It checks:

Is the sweep state dir discoverable from the current cwd? (Matters because worktrees each have their own docs/superpowers/sweeps/ tree.)
Does active.yaml exist and have status: active? Flag if cancelled / completed.
Does .claude/ralph-loop.local.md exist? If missing while the sweep is active, the loop driver is gone — nothing will re-fire.
Is ralph's iteration numeric, and below max_iterations? If the hook sees non-numeric or iteration >= max, it terminates the loop on the next stop.
Does ralph's session_id match the current $CLAUDE_CODE_SESSION_ID? If mismatched and non-empty, this session's stop-hook exits without re-firing.
Drift between ralph iteration and progress.yaml iteration (>2 suggests a previous iteration crashed).
Is the iteration-prompt file readable at the path the ralph state references?
Are the dev-stack URLs reachable (:8787, :8788, :3000)? Non-fatal, but unreachable stacks block most cells.

Prints [info] / [warn] / [PROB] lines, and alongside each problem the exact sweep-recover.sh invocation that fixes it. Exits 0 on no problems, 1 otherwise — handy for CI-style babysitter loops.

`recover`

Run scripts/sweep-recover.sh <subcommand> [args] [--apply] to fix specific blockers. Every invocation is dry-run by default — the user must pass --apply to actually mutate files. Every mutation writes a .bak.<epoch> sidecar before replacing the target. Subcommands:

reseed-ralph [--max N] [--force] — rebuild .claude/ralph-loop.local.md from active.yaml + progress.yaml when the ralph state is missing, corrupt, or stale. Refuses to overwrite a state file younger than 60 s without --force (avoids racing a live stop-hook).
bump-max [N=100] — increase max_iterations in ralph state when the iteration budget is about to be exhausted.
set-session [ID] — write $CLAUDE_CODE_SESSION_ID (or a supplied id) into both ralph state and active.yaml. Needed when the stop-hook in the current session is exiting without re-firing because the state names a different session.
uncancel — flip active.yaml's status: cancelled back to active so /resume will pick the run up. Refuses if status is completed.
all — run the four above, each gated on its own precondition.

Invariant: recover never touches issues.md, plan.yaml, progress.yaml, or repros/. Those are the sweep's memory of what was tested and found; recover only fixes the drivers around them so the loop can resume. If sweep memory itself is corrupt, that's a reset (destructive) conversation, not recover.

`report`

Run scripts/sweep-report.sh to render state/REPORT.md from plan.yaml + progress.yaml + issues.md. Format mirrors the prior TESTING.md (Final counts, Headline regression chain if applicable, Fix Priority roadmap, Surface assessment, Iteration log). Print the path.

`cancel`

Run scripts/sweep-cancel.sh which removes .claude/ralph-loop.local.md (if it belongs to this run's session) and writes cancelled_at into state/active.yaml (preserved, not deleted).
Print the final progress snapshot.
Tell the user state/ is intact and report will still work.

`reset`

Destructive — refuses without --confirm. Removes the entire state/ directory and any associated ralph state. Use only when starting a brand-new campaign.

Per-iteration loop body (what ralph re-fires)

This is the prompt template at scripts/sweep-iteration-prompt.md. The controller (you, in the loop) executes it every iteration:

You are running iteration N of the abadge E2E sweep.

1. READ state files:
   - state/active.yaml          → run config, mode, surface filter
   - state/plan.yaml            → matrix; find next K undone cells
   - state/progress.yaml        → consecutive_zero_finding_iters, saturation_threshold
   - state/issues.md (last 50 entries) → dedup context

2. SATURATION CHECK:
   If consecutive_zero_finding_iters >= saturation_threshold (default 5),
   call advisor() with the current state and ask: "Is this campaign saturated?".
   If advisor says yes, write SATURATED to state/active.yaml and output
   <promise>SWEEP_COMPLETE</promise>. Do not lie to exit — advisor must agree.

3. PLAN-COMPLETE CHECK:
   If every plan.yaml cell has tested_at AND there are zero open P0 issues,
   output <promise>SWEEP_COMPLETE</promise>.

4. DISPATCH (parallel):
   Pick K = min(parallel_limit, undone_cells). For each cell, dispatch a
   tester subagent using subagents/<surface>-prompt.md as the template.
   Pass the cell's scope, the dev URL, and a fresh repro skeleton path.

5. AGGREGATE:
   Each subagent returns a JSON report (schema in references/subagent-contract.md).
   Parse all reports.

6. TRIAGE:
   Dispatch bug-triager subagent with: {new_findings, existing_issues_md_tail}.
   Triager returns: per-finding {dup_of, severity, dedup_action}.

7. WRITE state:
   - Append non-duplicate findings to state/issues.md
   - Update plan.yaml cells with tested_at, finding_ids
   - Update progress.yaml counters + saturation
   - Drop any new repro into state/repros/

8. CHECKPOINT:
   Every CHECKPOINT_INTERVAL iterations (default 10), call advisor() with a
   short status and ask for strategic guidance. Save advisor verdict into
   state/checkpoints.md.

9. CONTINUE:
   Output exactly: "iter N: <K cells tested>, <new bugs>, <dups merged>,
   saturation <consecutive_zero>/<threshold>" and let the ralph hook re-fire.
   Do NOT output the completion promise unless step 2 or 3 said to.

File contract (the durable state)

Lives at docs/superpowers/sweeps/<run-id>/state/. See references/state-files.md for full schemas.

state/
├── active.yaml         # one-of-a-kind: declares this run is in flight
├── plan.yaml           # the test matrix; mutated as cells complete
├── progress.yaml       # counters, saturation, checkpoints
├── issues.md           # bug log (dedup'd, severity-tagged)
├── checkpoints.md      # advisor strategic notes
├── iteration-log.md    # one short line per iter (audit trail)
└── repros/             # one .ts or .md per confirmed bug

These files are the only place memory persists across iterations and across sessions. The session can be killed and resumed; the worktree can move; only the state dir matters.

What lives in references/

references/surface-map.md — full inventory of abadge surfaces with file:line pointers (lifted from AGENTS.md + the prior TESTING.md). The seed for plan.yaml.
references/state-files.md — exact YAML/MD schemas for every state file.
references/subagent-contract.md — JSON return schema, error handling, escalation rules for tester subagents.
references/dedup-protocol.md — how the triager merges duplicate findings; severity rubric.
references/saturation-detection.md — what counts as a zero-finding iter, when to involve advisor, how to honestly write SWEEP_COMPLETE.
references/loop-mechanics.md — exactly how this skill cooperates with the ralph-loop plugin (state file ownership, session-id isolation, cancel semantics).

When you (the controller) need any of those, read the file directly with the Read tool. They are not auto-loaded.

What lives in subagents/

One prompt template per surface. Each is a self-contained dispatchable prompt that takes {cell, dev_urls, prior_findings_summary} and returns a JSON report per the contract.

api-prompt.md, cli-prompt.md, web-prompt.md, mcp-prompt.md, daemon-prompt.md, crypto-prompt.md, db-prompt.md, sdk-prompt.md, docs-prompt.md
triager-prompt.md (dedup + severity)
reporter-prompt.md (used by report op to render REPORT.md)

What lives in scripts/

Plain bash, idempotent, shellcheck-clean.

sweep-init.sh <surfaces> <mode> — creates state dir, seeds plan, writes active.yaml
sweep-status.sh — prints status table; pure read
sweep-report.sh — renders REPORT.md
sweep-cancel.sh — removes ralph state file (after session-id check), marks active.yaml cancelled
sweep-iteration-prompt.md — the template fed to /ralph-loop on start/resume

Invariants (do not violate)

Files are the only memory. Never assume the next iteration sees your reasoning. Write what you learned to iteration-log.md (one line) and issues.md (if new finding) before yielding.
One sweep per project at a time. state/active.yaml is the lock. Refuse to start while one exists.
Don't fabricate completion. SWEEP_COMPLETE is only honest when (a) every plan cell has tested_at and zero open P0 issues, OR (b) advisor() agreed the work is saturated. Lying to exit the loop is the failure mode that wasted 110 iterations last time.
Subagents are isolated. Never paste the controller's full conversation into a subagent prompt. Pass only: scope cell, dev URLs, last N issue titles for dedup, and the contract schema.
Parallel safety. Two subagents may not write to the same file. The controller owns all state writes; subagents only return JSON.
No destructive actions without --confirm. reset, removing repros, dropping the dev DB.
Respect the dev stack. The user owns bun run dev. Refuse to (re)start it; only verify it's alive.
Cap subagent concurrency. Default 4 parallel testers per iter. Higher risks rate limits and context churn.

Common failure modes (and the counters)

Failure	Counter
Loop runs forever after value runs out	Saturation gate at step 2 + advisor checkpoint at step 8
Same bug reported 30 times	Triager step 6 with `references/dedup-protocol.md`
Subagent context bleed	Strict prompt template; the controller curates context
State files contradict each other	Single owner: the controller writes, never the subagents
User loses progress on session crash	All progress in `state/`; `resume` re-attaches
Worktree pollution from test scripts	All scratch goes under `state/repros/`; no top-level files
ralph state belongs to another session	`sweep-cancel.sh` checks `session_id:` before removing

Integration with other skills

superpowers:dispatching-parallel-agents — this skill IS that pattern, applied to a long-running surface sweep.
superpowers:subagent-driven-development — same controller/subagent split; here the "tasks" are sweep cells, not plan tasks.
superpowers:test-driven-development — when a sweep finding produces a fix-it task, hand off to TDD per fix.
ralph-loop:ralph-loop — the loop engine. This skill cooperates with it; doesn't reimplement.

Quick reference

User says	Op	Effect
"start the sweep"	`start`	Init state, seed plan, hand off to ralph
"continue testing"	`resume`	Re-attach to the last run-id
"what's the sweep doing"	`status`	Print progress (read-only)
"the loop looks stuck" / "why isn't it iterating"	`doctor`	Diagnose loop + state (read-only); prints recover commands
"unblock the loop" / "get it running again"	`recover <subcmd> --apply`	Targeted fix (dry-run by default): reseed-ralph / bump-max / set-session / uncancel / all
"give me the bug list"	`report`	Render REPORT.md
"stop the sweep"	`cancel`	Remove ralph state, preserve findings
"reset everything"	`reset`	Destructive; requires `--confirm`

abadge-e2e-sweep

このリポジトリの他の Skills

abadge E2E Sweep

Mental model

Operations

start [--surfaces api,cli,...] [--max-iterations N] [--mode bfs|dfs|hybrid]

resume

status

doctor

recover

report

cancel

reset

Per-iteration loop body (what ralph re-fires)

File contract (the durable state)

What lives in references/

What lives in subagents/

What lives in scripts/

Invariants (do not violate)

Common failure modes (and the counters)

Integration with other skills

Quick reference

abadge E2E Sweep

Mental model

Operations

start [--surfaces api,cli,...] [--max-iterations N] [--mode bfs|dfs|hybrid]

resume

status

doctor

recover

report

cancel

reset

Per-iteration loop body (what ralph re-fires)

File contract (the durable state)

What lives in references/

What lives in subagents/

What lives in scripts/

Invariants (do not violate)

Common failure modes (and the counters)

Integration with other skills

Quick reference

このリポジトリの他の Skills

`start [--surfaces api,cli,...] [--max-iterations N] [--mode bfs|dfs|hybrid]`

`resume`

`status`

`doctor`

`recover`

`report`

`cancel`

`reset`

`start [--surfaces api,cli,...] [--max-iterations N] [--mode bfs|dfs|hybrid]`

`resume`

`status`

`doctor`

`recover`

`report`

`cancel`

`reset`