con un clic
pony-docs-review
// Ensemble documentation review with specialized reviewer personas. Has full (8-persona) and lightweight (3-persona) modes. Load when reviewing documentation-only changes where code-focused personas don't apply.
// Ensemble documentation review with specialized reviewer personas. Has full (8-persona) and lightweight (3-persona) modes. Load when reviewing documentation-only changes where code-focused personas don't apply.
Load the Pony language reference (capabilities, PonyCheck, stdlib pitfalls, mort pattern). Load it before Pony coding sessions.
Ensemble code review with specialized reviewer personas. Has full (8-persona) and lightweight (3-persona) modes. Load when conducting a code review of a PR, branch, or local changes.
Ensemble workflow for producing higher-confidence outputs through decorrelated reasoning paths. Load when the human explicitly requests the ensemble approach.
Property-based and generative testing patterns. Load when writing property-based tests, generators, or generative test suites.
Disciplines for software design work. Load when designing APIs, type systems, features, or system boundaries. Counters the tendency to retrieve familiar patterns instead of discovering what the problem actually needs. Has full (8-persona) and lightweight (5-persona) modes.
Two-stage ensemble for planning meaningful tests. Load when writing tests for new features or reviewing test quality. Counters the tendency to write tests that exercise the stdlib instead of your code. Has full (8-persona) and lightweight (5-persona) modes.
| name | pony-docs-review |
| description | Ensemble documentation review with specialized reviewer personas. Has full (8-persona) and lightweight (3-persona) modes. Load when reviewing documentation-only changes where code-focused personas don't apply. |
| disable-model-invocation | false |
Ensemble documentation review that synthesizes findings from specialized reviewer personas. Has two modes: full (8 personas, iterative re-review) and lightweight (3 personas, single pass). Not for one-line typo fixes or trivial formatting changes — for those, a lightweight self-review is sufficient.
The skill has two modes: full and lightweight. The orchestrator selects the appropriate mode based on the criteria below and proceeds. Report the mode choice when presenting results — the human can request full mode if lightweight was used and they want deeper coverage.
Full mode is the default. Use it when:
Lightweight mode is for small, bounded documentation changes:
When in doubt, use full mode. Lightweight is appropriate when the change is clearly bounded — the orchestrator should be able to state what makes it bounded and why fewer personas are sufficient.
Note: the pony-code-review skill follows a similar mode-selection pattern but with code-specific criteria and a different persona roster. Pony-docs-review and pony-code-review are separate skills for separate concerns — documentation review and code review have different failure modes, different severity calibration, and different reviewer lenses.
These modes apply to both full and lightweight:
Integrated (pre-PR pipeline): The implementer runs pony-docs-review as part of their pre-PR workflow for documentation-only changes, in place of pony-code-review. In full mode, findings are triaged, unambiguous ones are fixed, and the loop repeats until clean. In lightweight mode, findings are triaged and fixed in a single pass. See the relevant process section below.
Standalone: Invoked directly on an existing PR, branch, or local changes for a one-shot documentation review. The process section for the selected mode applies as-is.
Use the ensemble workflow with documentation-review-specific customizations. Load pony-ensemble for the mechanical process. This skill replaces the generic attention focuses with documentation review personas (8 for full mode, 3 for lightweight) and replaces the generic agent output format with the review-specific format defined below.
The ensemble protocol requires an adversarial focus when reviewing fixes. Pony-docs-review does not have a dedicated adversarial persona because documentation "fixes" (correcting factual errors, filling gaps) don't have the same failure mode as code fixes — there's no analog to "the bug still occurs despite the fix." When a pony-docs-review targets a documentation fix, the Accuracy persona naturally covers the adversarial concern: verifying that the corrected information is actually correct and that the fix doesn't introduce new inaccuracies. This satisfies the ensemble protocol's "fix reviews require an adversarial focus" requirement — no additional adversarial agent is needed. The orchestrator should note this in the Accuracy persona's prompt when the review target is a fix.
Persona agents write detailed evidence to files and return structured summaries to the orchestrator. The synthesizer works from summaries and digs into evidence files only when it needs to examine a finding more closely. This prevents context overload during synthesis.
The pony-synthesize skill's output format is not a natural fit for documentation review. The orchestrator instructs the synthesizer to produce output in the review-specific final format (below) rather than the generic synthesizer format. No changes to pony-synthesize itself.
Identify the review target. PR URL, branch name, or local changes. If not specified, ask. Resolve the target into concrete instructions for agents: the base branch to diff against, the git diff command or gh pr diff command to run, and any design doc or issue URLs for context.
Gather context. Read the documentation being changed in full. Identify:
At minimum, gather and distribute: Accuracy gets source code, Consistency gets related docs, Principles gets style guides. Each persona's context loading section specifies additional context needs — check them when building prompts. All personas get the full changed documentation.
Create a temporary directory for evidence files. Use ~/tmp/pony-docs-review-<timestamp>/. Each persona will write its detailed evidence to a file in this directory. Generate the file path for each persona (e.g., accuracy-evidence.md) and pass it in the prompt.
Spawn 8 persona agents in parallel, each as a fresh-context sub-agent using your most capable model. Each agent's prompt includes:
personas/. When a persona's context loading references an external skill (e.g., pony-ref), read that skill's content and include it in the agent prompt.references/principles.md (alongside this skill) and include its content in the agent prompt, the same way referenced skills are injected above. Also instruct the agent to read the project AGENTS.md if one exists (not all projects have one; if absent, note it and proceed) and to follow its conventions, including loading any skills it references.For the Wildcard persona specifically: include the identity statement (first paragraph) from each of the other 7 personas so the wildcard knows what territory is already covered.
Triage agent outputs per ensemble protocol — check that each persona addressed the actual documentation and stayed coherent.
Pass triaged persona summaries to a synthesis agent loaded with pony-synthesize, plus the pony-docs-review-specific synthesis focus (below). Provide the paths to each persona's evidence file so the synthesizer can dig in when needed. Instruct the synthesizer to produce its output in the final review format (below) rather than the generic synthesizer format. If this is a re-review (iterative mode), include the full review history: for each prior round, what was found, what was fixed, what was parked. The synthesizer uses this to verify fixes were addressed, avoid re-flagging parked items, and detect convergence failures.
Reviewer loop on the synthesis — the reviewer verifies that no persona findings were dropped, severity changes from individual findings are justified, and cross-persona patterns were correctly identified.
Present the consolidated review.
When pony-docs-review runs as part of the pre-PR pipeline, findings go back to the implementer for triage and fixing. The loop repeats until clean.
The implementer categorizes each finding. Every finding must be categorized — no finding is silently dropped, regardless of severity.
After fixing the "fix" items, pony-docs-review runs again with two key rules:
The synthesizer monitors the review history for signs that point fixes aren't converging — that the documentation area has a structural problem no amount of individual fixes will resolve. Signals:
When the synthesizer detects a convergence failure, it escalates a structural question: "This section has produced findings across N review rounds. The fixes are adding qualifications rather than clarifying. Is the underlying organization / audience targeting / conceptual framework right for this content?" This is always parked — it's a structural decision, not something the implementer resolves autonomously.
The structural question should be specific: name the section, describe the pattern of findings it's producing, and suggest what kind of restructuring might eliminate the finding class. Don't just say "consider rewriting" — say "this section is trying to teach concept X before the reader has concept Y; moving section 3 before section 2 would eliminate this class of completeness findings."
The loop ends when no findings remain except parked and out-of-scope items. At that point, open the PR with the parked items listed in the PR description or as a PR comment so the maintainer can weigh in — never in commit messages. Commit messages are for change rationale only; parked items are transient review artifacts that don't belong in git history. If the maintainer's direction on parked items requires changes, make them and run a final pony-docs-review pass to confirm.
Lightweight mode runs 3 personas in a single pass with no iterative re-review. Load pony-ensemble for the mechanical process.
Core pair (always run):
| File | Focus |
|---|---|
accuracy.md | Technical correctness — cross-references claims against source code |
completeness.md | Coverage gaps — missing prerequisites, skipped steps, unstated assumptions |
Context-dependent slot (pick 1):
| When | Pick |
|---|---|
| Tutorial, getting-started, or guide content | reader-experience.md |
| Reference documentation or API docs | consistency.md |
| Large document or multi-page change | structure.md |
| Established style guide applies | principles.md |
Pick whichever is most relevant to the change. If multiple conditions apply, pick the most relevant one. If the reason for picking a particular persona is a change characteristic that also appears in the full-mode selection criteria, that's a signal the change warrants full mode — don't use the persona pick to compensate for a wrong mode selection. If none of the conditions clearly apply, the change may be simple enough to skip pony-docs-review and rely on a lightweight self-review alone.
Not included in lightweight:
Identify the review target. Same as full mode — PR URL, branch name, or local changes. Resolve into concrete diff instructions for agents.
Gather context. Same as full mode — read changed documentation in full, identify referenced source code, related docs, and style guides.
Create a temporary directory for evidence files. Use ~/tmp/pony-docs-review-<timestamp>/. Same convention as full mode.
Spawn 3 persona agents in parallel, each as a fresh-context sub-agent using your most capable model. Each agent's prompt includes:
personas/. When a persona's context loading references an external skill, read that skill's content and include it in the agent prompt.references/principles.md (alongside this skill) and include its content in the agent prompt, the same way referenced skills are injected above. Also instruct the agent to read the project AGENTS.md if one exists (not all projects have one; if absent, note it and proceed) and to follow its conventions, including loading any skills it references.When the review target is a fix, the Accuracy persona's prompt includes the fix-specific focus: verify that the corrected information is actually correct and that the fix doesn't introduce new inaccuracies.
The Wildcard-specific instruction (providing other personas' identity statements) does not apply — Wildcard is not part of lightweight mode.
Triage agent outputs per ensemble protocol — check that each persona addressed the actual documentation and stayed coherent.
Pass triaged persona summaries to a synthesis agent loaded with pony-synthesize, plus the lightweight synthesis focus (below). Provide the paths to each persona's evidence file so the synthesizer can dig in when needed. Instruct the synthesizer to produce its output in the final review format (below) rather than the generic synthesizer format — same override as full mode.
Reviewer loop on the synthesis — same checks as full mode: verify no persona findings were dropped, severity changes from individual findings are justified, and cross-persona patterns were correctly identified.
Present the consolidated review in the same output format as full mode.
Same categories as full mode:
No re-review loop. Fix the findings and proceed to opening the PR.
If the review produces an unexpectedly high density of findings relative to the change size, if a finding reveals the content is fundamentally wrong, or if a finding reveals the change touches more documentation areas or has more complex interactions than the mode selection assumed, the orchestrator presents this to the human. The human decides what to do — run full mode from scratch, fix directly, rethink the approach, or something else. Lightweight doesn't prescribe the response; it presents the information.
The synthesizer should pay special attention to:
The synthesizer should pay special attention to:
Findings grouped by severity, then by location:
Critical (must fix before merge) High (should fix — real risk if left) Medium (would improve the docs, not blocking) Low (suggestions, style)
Each finding:
file:lineFollowed by:
Include these instructions in every persona agent's prompt.
Each persona produces two artifacts:
Written to the file path provided by the orchestrator. Contains the full detailed analysis: every finding with complete evidence, full text excerpts, detailed reasoning, complete pass/fail evaluations. This is the authoritative record.
A structured summary for the synthesizer to work from:
Findings — ordered by severity (Critical > High > Medium > Low). Each:
file:lineConfidence calibration: High = verified by reading source code, checking cross-references, or demonstrating a specific misreading (e.g., constructing an alternative interpretation of an ambiguous sentence). Medium = strong inference from document structure or content, not directly verified. Low = inferred from patterns or conventions, may not apply.
Passes — key things checked that look correct. Brief.
Uncertainties — things the persona couldn't determine, and why.
The persona documents are in personas/. Full mode uses all 8; lightweight uses Accuracy + Completeness + one context-dependent persona (see "Process: Lightweight Mode" for selection criteria).
| File | Focus |
|---|---|
accuracy.md | Technical correctness — cross-references claims against source code |
completeness.md | Coverage gaps — missing prerequisites, skipped steps, unstated assumptions |
clarity.md | Language quality — ambiguity, jargon, unclear antecedents |
structure.md | Organization and flow — information architecture, concept ordering, findability |
consistency.md | Internal and external consistency — terminology, formatting, cross-references |
reader-experience.md | End-to-end journey — can the audience achieve their goal? |
principles.md | Systematic check against documentation conventions and principles |
wildcard.md | Chaos agent — finds what the others miss |