with one click
agent-work-adversarial-review
Adversarially review the last 24h of multi-agent work by combining git history, GitHub issue state, generated analysis artifacts, governance tests, and duplicate-checked follow-up issue creation.
Menu
Adversarially review the last 24h of multi-agent work by combining git history, GitHub issue state, generated analysis artifacts, governance tests, and duplicate-checked follow-up issue creation.
Analyze session quality trends — identify high-churn patterns, report waste, flag sessions exceeding 500 tool calls
Build self-contained interactive web applications as single HTML files. Use for creating demos, prototypes, interactive tools, and standalone web experiences that work without external servers.
Quick provider/model switching for Hermes CLI — aliases, fallbacks, task routing matrix, and utilization audit pattern.
Quick provider/model switching for Hermes CLI — aliases, fallbacks, task routing matrix, and utilization audit pattern.
Build complex AI systems with declarative programming, optimize prompts automatically, create modular RAG systems and agents with DSPy - Stanford NLP's framework for systematic LM programming
Mandatory planning workflow for ALL GitHub issues — plan, review, approve, then implement.
| name | agent-work-adversarial-review |
| description | Adversarially review the last 24h of multi-agent work by combining git history, GitHub issue state, generated analysis artifacts, governance tests, and duplicate-checked follow-up issue creation. |
| version | 1.0.0 |
| category | coordination |
| tags | ["audit","adversarial-review","github","governance","artifacts","issues"] |
Use when asked to review recent work done by multiple agents across the ecosystem, especially the last 24h. This is not a normal progress summary — the goal is to find regressions, contradictions, stale claims, enforcement gaps, and missing follow-through.
Produce an evidence-backed review of recent agent work and create high-value follow-up GitHub issues without spamming duplicates.
Run:
pwd
git rev-parse --show-toplevel
git remote get-url origin
date -u '+%Y-%m-%d %H:%M:%S UTC'
gh auth status
Do not rely on only git log or only session logs.
Use:
git log --since='24 hours ago' --date=iso --pretty=format:'%h%x09%ad%x09%an%x09%s' --stat --no-merges
Also inspect:
.claude/state/session-signals/YYYY-MM-DD.jsonllogs/orchestrator/claude/session_*.jsonlgh issue list --state all --limit 30 --json number,title,state,createdAt,updatedAt,labels,author,url
Treat generated result docs as first-class review targets.
Check recent files under patterns like:
docs/plans/*/results/*.mddocs/handoffs/*.mdLook for:
Do not stop at document review. Re-run focused tests or scripts for the changed area.
When reviewing Deckhand/customer-channel behavior, include an interaction inconsistency pass using references/deckhand-interaction-inconsistency-audit.md. This pass must compare channel logs, scope/routing config, audit rows, and Claude/session claims across five axes: channel fit, domain scope, result-delivery state, engineering credibility, and live-readiness/canary evidence.
Good pattern for governance/runtime work:
uv run pytest <focused test subset> -q
Also exercise both human-facing and machine-facing entrypoints when a tool claims automation support:
--json / structured-output mode separatelyAdversarial check for governance/checker work:
Adversarial check for scheduled governance/cron wrappers:
fail vs FAIL, etc.)conformance or registry-health are definedIf one file fails in a combined run but passes alone, record it as a possible invocation-context/import-path problem rather than claiming a stable failure.
Delegate independent subreviews for parallel adversarial pressure, for example:
Ask subreviewers for:
Always search GitHub before opening follow-up items.
Use targeted searches such as:
gh issue list --state open --search '<keywords>' --limit 20
Important: distinguish exact duplicates from umbrella issues. If an umbrella exists, reference it in the new issue instead of skipping automatically.
If a previously closed issue is directly contradicted by a reproduced live failure, prefer reopening the original issue instead of creating a duplicate regression ticket.
Use this when:
Pattern:
gh issue reopen <number>
gh issue comment <number> --body-file /tmp/repro.md
Your comment should include:
Create issues for systemic gaps, not every symptom.
High-value categories:
Avoid filing noise issues unless the evidence is concrete and reproducible.
Return a concise summary with:
Good output is short but evidence-backed. Keep the detailed proof in issue bodies or internal notes; keep the user summary compact.