Jeden Skill in Manus ausführen
mit einem Klick

Jeden Skill in Manus mit einem Klick ausführen

aspect-panel

Run a panel of 5 specialized verifiers (correctness, edge-case, security, regression, style) in parallel against a change, with confidence-weighted voting. Disagreement (40-60% pass share) escalates to lead. Replaces single-critic for high-stakes review. Triggers on /aspect-panel, "panel review", "multi-aspect verify", "review with multiple critics".

In Manus ausführen

Sterne0

Forks0

Aktualisiert10. Mai 2026 um 15:17

Quelle

sethdford

sethdford/claude-code-teams

GitHub-Repository öffnen Creator-Repositorys ansehen

Installationsbefehl

Download

In Manus ausführen

Nützlich fürSOC

Softwarequalitätssicherungsanalysten und -testerInformatik- und Mathematikberufe15-1253L4

SKILL.md

readonly

name	aspect-panel
description	Run a panel of 5 specialized verifiers (correctness, edge-case, security, regression, style) in parallel against a change, with confidence-weighted voting. Disagreement (40-60% pass share) escalates to lead. Replaces single-critic for high-stakes review. Triggers on /aspect-panel, "panel review", "multi-aspect verify", "review with multiple critics".
when_to_use	Before merging release-blocking changes. Before committing security-sensitive code. After /verify says PASS but you want orthogonal coverage. NOT for trivial diffs (use /verify alone).
allowed-tools	Bash, Read
arguments	["target"]

/aspect-panel — Heterogeneous Verifier Panel

/aspect-panel <target> spawns 5 aspect-specialized verifiers in parallel, each examining one dimension of the change. Confidence-weighted vote produces a final verdict; disagreement is preserved as a signal.

Built on:

Lifshitz et al. 2025 — Multi-Agent Verification (MAV)
arXiv 2510.01499 — "Beyond Majority Voting"
arXiv 2308.07201 — ChatEval (heterogeneous personas matter)

How

python3 ~/.claude/rl/aspect_panel.py --target "<files-or-diff-or-description>"

The script:

Spawns 5 aspect verifiers in parallel via claude -p:
- correctness-verifier — does the change implement the contract?
- edge-case-verifier — NULL/empty/overflow/concurrent inputs handled?
- security-verifier — paranoid OWASP/secrets/injection sweep
- regression-verifier — what existing behavior could this break?
- style-verifier — maintainability, hates clever code
Each emits VERDICT: pass|fail, CONFIDENCE: 0.0-1.0, RATIONALE, and RESULT_<aspect>-verifier=PASS|FAIL.
Aggregator computes confidence-weighted vote:
- pass_share = sum(conf where pass) / sum(conf)
- pass_share > 0.6 → PASS
- pass_share < 0.4 → FAIL
- 0.4 ≤ pass_share ≤ 0.6 → ESCALATE (real disagreement; surface to lead)

Output

{
  "ts": "...",
  "target": "...",
  "aspects": [
    {"aspect": "correctness", "verdict": "pass", "confidence": 0.85, "rationale": "..."},
    {"aspect": "edge-case", "verdict": "fail", "confidence": 0.72, "rationale": "no handling for None input at src/x.py:42"},
    ...
  ],
  "aggregate": {
    "verdict": "ESCALATE",
    "pass_share": 0.55,
    "split": true
  }
}

A reward event is emitted to ~/.claude/rl/rewards.jsonl recording the panel's decision.

When to use which

Stake	Use
Trivial change (rename, format, comment)	nothing, or `/verify`
Routine task closure	`/verify` (single verifier)
Pre-commit on a touched-shared-code change	`/verify` + `/aspect-panel`
Release-blocking, security-sensitive, migration	`/aspect-panel` + `/best-of-n verifier --n 3`

The panel costs ~5× a single /verify (5 parallel verifiers). Don't run for free.

Why heterogeneous personas matter

ChatEval (2308.07201) found homogeneous personas hurt — running 5 identical critics adds noise without signal. Each verifier in our panel has a deliberately different priority: paranoid security vs perf vs maintainability. They catch different bug classes.

The ESCALATE state when they disagree is itself the signal: when 50/50 split, you don't have consensus, and pretending to is worse than admitting it.

Anti-patterns to refuse

Running on a 5-line diff (overkill; the cost dominates the benefit)
Auto-resolving ESCALATE without surfacing to lead (defeats the design)
Using only one or two aspects (the value is in coverage breadth — run all 5)
Treating confidence values as ground truth (they're self-reported; use as a weight, not as truth)

Mehr aus diesem Repository

gleiches Repository

best-of-n

sethdford/claude-code-teams

Run an agent N times in parallel against the same prompt, then aggregate via one of 5 modes — critic argmax, USC consistency, confidence-weighted, hybrid, or AggAgent synthesis. Use for high-stakes invocations where you'd rather pay Nx to be sure. Triggers on /best-of-n, "best of n", "run multiple", "give me three options", "synthesize across rollouts".

2026-05-100

scrum

sethdford/claude-code-teams

Run a complete SCRUM sprint with all ceremonies — Product Owner authors stories, Tech Lead designs, Scrum Master orchestrates implementers, Verifier+Aspect-Panel guard quality, Sprint Auditor adversarially audits, Retro feeds back to /tune-agent. Triggers on /scrum, "run a sprint", "scrum me this", "ship this with full process".

2026-05-100

exec-grounded

sethdford/claude-code-teams

Run an agent N times in parallel against a code-change task, with each rollout in its own sandboxed copy of the codebase, scored by ACTUAL test pass rate (not just critic opinion). Use for SWE-bench-style code changes where the test suite is the ground truth. Triggers on /exec-grounded, "execution-grounded", "verify by running tests", "best-of-N with tests".

2026-05-100

mine-transcripts

sethdford/claude-code-teams

Mine past Claude Code session transcripts (JSONL) to extract user corrections, successful patterns, and recurring failure modes. Proposes diffs against lessons.md and rules/*.md for human review. Triggers on /mine-transcripts, "mine my sessions", "what did I learn this week", "session retro". Use weekly or after a fleet completes.

2026-05-100

verify-ui

sethdford/claude-code-teams

Verify UI changes by capturing before/after screenshots and asking Claude vision to judge whether the change matches intent and didn't break anything else. Mano-verify pattern (arXiv 2509.17336 —

2026-05-100

ab-test

sethdford/claude-code-teams

A/B test an agent's current prompt against a candidate variant from policy/<agent>/candidates/. Runs both on the same scenarios, aggregates rewards, recommends promotion if candidate beats current by >1 stderr with n≥10. Triggers on /ab-test, "compare prompts", "test the candidate", "is the new prompt better".

2026-05-100

name	aspect-panel
description	Run a panel of 5 specialized verifiers (correctness, edge-case, security, regression, style) in parallel against a change, with confidence-weighted voting. Disagreement (40-60% pass share) escalates to lead. Replaces single-critic for high-stakes review. Triggers on /aspect-panel, "panel review", "multi-aspect verify", "review with multiple critics".
when_to_use	Before merging release-blocking changes. Before committing security-sensitive code. After /verify says PASS but you want orthogonal coverage. NOT for trivial diffs (use /verify alone).
allowed-tools	Bash, Read
arguments	["target"]

/aspect-panel — Heterogeneous Verifier Panel

Built on:

Lifshitz et al. 2025 — Multi-Agent Verification (MAV)
arXiv 2510.01499 — "Beyond Majority Voting"
arXiv 2308.07201 — ChatEval (heterogeneous personas matter)

How

python3 ~/.claude/rl/aspect_panel.py --target "<files-or-diff-or-description>"

The script:

Spawns 5 aspect verifiers in parallel via claude -p:
- correctness-verifier — does the change implement the contract?
- edge-case-verifier — NULL/empty/overflow/concurrent inputs handled?
- security-verifier — paranoid OWASP/secrets/injection sweep
- regression-verifier — what existing behavior could this break?
- style-verifier — maintainability, hates clever code
Each emits VERDICT: pass|fail, CONFIDENCE: 0.0-1.0, RATIONALE, and RESULT_<aspect>-verifier=PASS|FAIL.
Aggregator computes confidence-weighted vote:
- pass_share = sum(conf where pass) / sum(conf)
- pass_share > 0.6 → PASS
- pass_share < 0.4 → FAIL
- 0.4 ≤ pass_share ≤ 0.6 → ESCALATE (real disagreement; surface to lead)

Output

{
  "ts": "...",
  "target": "...",
  "aspects": [
    {"aspect": "correctness", "verdict": "pass", "confidence": 0.85, "rationale": "..."},
    {"aspect": "edge-case", "verdict": "fail", "confidence": 0.72, "rationale": "no handling for None input at src/x.py:42"},
    ...
  ],
  "aggregate": {
    "verdict": "ESCALATE",
    "pass_share": 0.55,
    "split": true
  }
}

A reward event is emitted to ~/.claude/rl/rewards.jsonl recording the panel's decision.

When to use which

Stake	Use
Trivial change (rename, format, comment)	nothing, or `/verify`
Routine task closure	`/verify` (single verifier)
Pre-commit on a touched-shared-code change	`/verify` + `/aspect-panel`
Release-blocking, security-sensitive, migration	`/aspect-panel` + `/best-of-n verifier --n 3`

The panel costs ~5× a single /verify (5 parallel verifiers). Don't run for free.

Why heterogeneous personas matter

The ESCALATE state when they disagree is itself the signal: when 50/50 split, you don't have consensus, and pretending to is worse than admitting it.

Anti-patterns to refuse

Running on a 5-line diff (overkill; the cost dominates the benefit)
Auto-resolving ESCALATE without surfacing to lead (defeats the design)
Using only one or two aspects (the value is in coverage breadth — run all 5)
Treating confidence values as ground truth (they're self-reported; use as a weight, not as truth)