بنقرة واحدة
pony-code-review
// Ensemble code review with specialized reviewer personas. Has full (8-persona) and lightweight (3-persona) modes. Load when conducting a code review of a PR, branch, or local changes.
// Ensemble code review with specialized reviewer personas. Has full (8-persona) and lightweight (3-persona) modes. Load when conducting a code review of a PR, branch, or local changes.
Load the Pony language reference (capabilities, PonyCheck, stdlib pitfalls, mort pattern). Load it before Pony coding sessions.
Ensemble documentation review with specialized reviewer personas. Has full (8-persona) and lightweight (3-persona) modes. Load when reviewing documentation-only changes where code-focused personas don't apply.
Ensemble workflow for producing higher-confidence outputs through decorrelated reasoning paths. Load when the human explicitly requests the ensemble approach.
Property-based and generative testing patterns. Load when writing property-based tests, generators, or generative test suites.
Disciplines for software design work. Load when designing APIs, type systems, features, or system boundaries. Counters the tendency to retrieve familiar patterns instead of discovering what the problem actually needs. Has full (8-persona) and lightweight (5-persona) modes.
Two-stage ensemble for planning meaningful tests. Load when writing tests for new features or reviewing test quality. Counters the tendency to write tests that exercise the stdlib instead of your code. Has full (8-persona) and lightweight (5-persona) modes.
| name | pony-code-review |
| description | Ensemble code review with specialized reviewer personas. Has full (8-persona) and lightweight (3-persona) modes. Load when conducting a code review of a PR, branch, or local changes. |
| disable-model-invocation | false |
Ensemble code review that synthesizes findings from specialized reviewer personas. Has two modes: full (8 personas, iterative re-review) and lightweight (3 personas, single pass). Not for one-line config changes or typo fixes — ask permission to skip review for those.
The skill has two modes: full and lightweight. The orchestrator selects the appropriate mode based on the criteria below and proceeds. Report the mode choice when presenting results — the human can request full mode if lightweight was used and they want deeper coverage.
Full mode is the default. Use it when:
Lightweight mode is for small, bounded changes within established patterns:
When in doubt, use full mode. Lightweight is appropriate when the change is clearly bounded — the orchestrator should be able to state what makes it bounded and why fewer personas are sufficient.
Note: the pony-software-design skill's lightweight mode follows a similar pattern (mode selection, single pass, escalation) but reduces differently — it has two stages and shrinks only the evaluation stage (5→2 personas). Code-review has a single stage, so the reduction is in persona count (8→3). The shared pattern is the mode selection and escalation structure.
These modes apply to both full and lightweight:
Integrated (pre-PR pipeline): The implementer runs pony-code-review as part of their pre-PR workflow. In full mode, findings are triaged, unambiguous ones are fixed, and the loop repeats until clean. In lightweight mode, findings are triaged and fixed in a single pass. See the relevant process section below.
Standalone: Invoked directly on an existing PR, branch, or local changes for a one-shot review. The process section for the selected mode applies as-is.
Use the ensemble workflow with review-specific customizations. Load pony-ensemble for the mechanical process. This skill replaces the generic attention focuses with review personas (8 for full mode, 3 for lightweight) and replaces the generic agent output format with the review-specific format defined below.
The Adversarial persona satisfies the ensemble's "fix reviews require an adversarial focus" requirement — when the review target is a fix, the Adversarial persona naturally focuses on showing the fix is incomplete per its identity statement.
Persona agents write detailed evidence to files and return structured summaries to the orchestrator. The synthesizer works from summaries and digs into evidence files only when it needs to examine a finding more closely. This prevents context overload during synthesis.
The pony-synthesize skill's output format (Integrated Result, Synthesis Rationale, etc.) is not a natural fit for code review. The orchestrator instructs the synthesizer to produce output in the review-specific final format (below) rather than the generic synthesizer format. No changes to pony-synthesize itself.
When principles conflict, these values set the priority. We value the left side over the right — but the right side still matters when the left isn't at stake.
API safety over API minimality — an error-prone API should be fixed even if the fix adds surface. Prefer solutions that don't expand the API, but never leave a footgun to preserve minimality.
Correctness over performance — never sacrifice correctness for speed. Get it right first, then optimize. A faster wrong answer is still wrong.
Correctness over concision — correct but verbose beats concise but wrong. Don't simplify code or APIs at the cost of correct behavior.
Security over performance — never skip validation at trust boundaries for speed. Optimize how you validate, not whether you validate. Security is correctness.
Interface simplicity over implementation simplicity — accept a harder implementation to give users a clean interface. The consumer's experience matters more than the implementer's convenience.
Performance over interface simplicity — runtime speed matters more than programmer convenience. It's acceptable to make things harder on the user to improve performance, but never at the cost of correctness.
Simplicity over consistency — don't force artificial consistency when it makes things harder to use. If two similar things genuinely need different interfaces, let them be different.
Explicitness over implicitness — when the language allows something to work by magic (implicit conversions, convention-based wiring, unnamed dependencies), prefer the version that states what's happening. The cost of a few extra characters is less than the cost of reconstructing hidden knowledge.
Type safety over convenience — use the type system to encode constraints even when it's more work. Distinct types for distinct semantics, validated wrappers over raw primitives, explicit error vocabularies over generic errors. "We could just use a String here" is almost always wrong.
Changeability over predictive design — make designs modular and replaceable so future needs can be accommodated, but don't add abstractions, extension points, or features for changes that haven't happened yet. Easy to modify beats designed for a specific predicted modification.
Identify the review target. PR URL, branch name, or local changes. If not specified, ask. Resolve the target into concrete instructions for agents: the base branch to diff against, the git diff command or gh pr diff command to run, and any design doc or issue URLs for context.
Build and run tests once. Capture the build output and test results. These results are provided to persona agents that need them — no agent runs tests itself. If the build fails or tests fail, proceed with the review anyway — provide the failure output to all personas and note whether the failure is pre-existing or introduced by the change.
Create a temporary directory for evidence files. Use ~/tmp/code-review-<timestamp>/. Each persona will write its detailed evidence to a file in this directory. Generate the file path for each persona (e.g., correctness-evidence.md) and pass it in the prompt.
Spawn 8 persona agents in parallel, each as a fresh-context sub-agent using your most capable model. Each agent's prompt includes:
personas/. When a persona's context loading references an external skill (e.g., pony-test-design, pony-pbt-patterns, pony-ref), read that skill's content and include it in the agent prompt.references/principles.md (alongside this skill) and include its content in the agent prompt, the same way referenced skills are injected above. Also instruct the agent to read the project AGENTS.md if one exists (not all projects have one; if absent, note it and proceed) and to follow its conventions, including loading any skills it references.For the Wildcard persona specifically: include the identity statement (first paragraph) from each of the other 7 personas so the wildcard knows what territory is already covered.
Triage agent outputs per ensemble protocol — check that each persona addressed the actual code and stayed coherent.
Pass triaged persona summaries to a synthesis agent loaded with pony-synthesize, plus the review-specific synthesis focus (below). Provide the paths to each persona's evidence file so the synthesizer can dig in when needed. Instruct the synthesizer to produce its output in the final review format (below) rather than the generic synthesizer format. If this is a re-review (iterative mode), include the full review history: for each prior round, what was found, what was fixed, what was parked. The synthesizer uses this to verify fixes, avoid re-flagging parked items, and detect convergence failures.
Reviewer loop on the synthesis — the reviewer verifies that no persona findings were dropped, severity changes from individual findings are justified, and cross-persona patterns were correctly identified.
Present the consolidated review.
When pony-code-review runs as part of the pre-PR pipeline, findings go back to the implementer for triage and fixing. The loop repeats until clean.
The implementer categorizes each finding. Every finding must be categorized — no finding is silently dropped, regardless of severity.
After fixing the "fix" items, pony-code-review runs again with two key rules:
The synthesizer monitors the review history for signs that point fixes aren't converging — that the code area has a structural problem no amount of individual fixes will resolve. Signals:
When the synthesizer detects a convergence failure, it escalates a structural question: "This area has produced findings across N review rounds. The fixes are adding complexity rather than removing it. Is the underlying data structure / abstraction / design right for this problem?" This is always parked — it's a design decision, not something the implementer resolves autonomously.
The structural question should be specific: name the data structure or abstraction, describe the pattern of bugs it's producing, and suggest what kind of replacement might eliminate the bug class. Don't just say "consider a redesign" — say "the radix tree is producing segment-boundary bugs because it operates at the character level; a segment-level trie would eliminate this class structurally."
The loop ends when no findings remain except parked and out-of-scope items. At that point, open the PR with the parked items listed in the PR description or as a PR comment so the human can weigh in — never in commit messages. Commit messages are for change rationale only; parked items are transient review artifacts that don't belong in git history. If the human's direction on parked items requires changes, make them and run a final pony-code-review pass to confirm.
Lightweight mode runs 3 personas in a single pass with no iterative re-review. Load pony-ensemble for the mechanical process.
Core pair (always run):
| File | Focus |
|---|---|
correctness.md | Logic, edge cases, completeness for valid inputs |
adversarial.md | Concrete break scenarios, backward from failure |
Context-dependent slot (pick 1):
| When | Pick |
|---|---|
| Tests changed or should have been changed | tests.md |
| API surface changed | api-design.md |
| Change touches trust boundaries or external input | security.md |
| Change is on a hot path or introduces coordination points | performance.md |
Pick whichever is most relevant to the change. If multiple conditions apply, pick the most relevant one. If the reason for picking a particular persona is a change characteristic that also appears in the full-mode selection criteria, that's a signal the change warrants full mode — don't use the persona pick to compensate for a wrong mode selection.
Not included in lightweight:
Identify the review target. Same as full mode — PR URL, branch name, or local changes. Resolve into concrete diff instructions for agents.
Build and run tests once. Same as full mode — capture output, proceed with review even if build or tests fail.
Create a temporary directory for evidence files. Use ~/tmp/code-review-<timestamp>/. Same convention as full mode.
Spawn 3 persona agents in parallel, each as a fresh-context sub-agent using your most capable model. Each agent's prompt includes:
personas/. When a persona's context loading references an external skill (e.g., pony-test-design, pony-pbt-patterns, pony-ref), read that skill's content and include it in the agent prompt.references/principles.md (alongside this skill) and include its content in the agent prompt, the same way referenced skills are injected above. Also instruct the agent to read the project AGENTS.md if one exists (not all projects have one; if absent, note it and proceed) and to follow its conventions, including loading any skills it references.When the review target is a fix, the Adversarial persona's prompt still includes the fix-specific focus from the ensemble protocol: construct a concrete scenario where the bug still occurs despite the fix.
The Wildcard-specific instruction (providing other personas' identity statements) does not apply — Wildcard is not part of lightweight mode.
Triage agent outputs per ensemble protocol — check that each persona addressed the actual code and stayed coherent.
Pass triaged persona summaries to a synthesis agent loaded with pony-synthesize, plus the lightweight synthesis focus (below). Provide the paths to each persona's evidence file so the synthesizer can dig in when needed. Instruct the synthesizer to produce its output in the final review format (below) rather than the generic synthesizer format — same override as full mode.
Reviewer loop on the synthesis — same checks as full mode: verify no persona findings were dropped, severity changes from individual findings are justified, and cross-persona patterns were correctly identified.
Present the consolidated review in the same output format as full mode.
Same categories as full mode:
No re-review loop. Fix the findings and proceed to opening the PR.
If the review produces an unexpectedly high density of findings relative to the change size, if a finding reveals the approach is fundamentally wrong, or if a finding reveals the change touches more subsystems or has more complex interactions than the mode selection assumed, the orchestrator presents this to the human. The human decides what to do — run full mode from scratch, fix directly, rethink the approach, or something else. Lightweight doesn't prescribe the response; it presents the information.
The synthesizer should pay special attention to:
The synthesizer should pay special attention to:
Findings grouped by severity, then by location:
Critical (must fix before merge) High (should fix — real risk if left) Medium (would improve the code, not blocking) Low (suggestions, style)
Each finding:
file:lineFollowed by:
Include these instructions in every persona agent's prompt.
Each persona produces two artifacts:
Written to the file path provided by the orchestrator. Contains the full detailed analysis: every finding with complete evidence, full code excerpts, detailed reasoning, complete pass/fail evaluations. This is the authoritative record.
A structured summary for the synthesizer to work from:
Findings — ordered by severity (Critical > High > Medium > Low). Each:
file:lineConfidence calibration: High = verified by reading code or running tests. Medium = strong inference from code structure, not directly verified. Low = inferred from patterns or conventions, may not apply.
Passes — key things checked that look correct. Brief.
Uncertainties — things the persona couldn't determine, and why.
The persona documents are in personas/. Full mode uses all 8; lightweight uses Correctness + Adversarial + one context-dependent persona (see "Process: Lightweight Mode" for selection criteria).
| File | Focus |
|---|---|
correctness.md | Logic, edge cases, completeness for valid inputs |
adversarial.md | Concrete break scenarios, backward from failure |
api-design.md | Consumer experience, naming, footguns, pattern conformance |
security.md | Trust boundaries, injection, auth, resource bounds |
performance.md | Architectural bottlenecks, then local waste |
tests.md | Test quality, missing tests, counterfactual reasoning |
principles.md | Systematic principles audit with evidence |
wildcard.md | Chaos agent — finds what the others miss |