| name | hardwood-review |
| description | Review a Hardwood PR, local branch, or staged diff against the project's CLAUDE.md conventions and recurring failure modes. Use whenever the user asks to "review", "look over", "check", "audit", or "assess" a PR number, a branch, or pending changes in the hardwood-hq/hardwood repo. Also use when the user pastes a PR URL or says they want feedback before merging. Produces a checkbox markdown file with findings grouped by priority. |
Hardwood code review
Project-specific code review. Encodes the conventions in /workspace/CLAUDE.md and the recurring failure modes the maintainer has surfaced in past reviews, so the review covers them by construction instead of by memory.
Priority frame: the Code Review Pyramid
Review effort is weighted by how expensive the issue is to fix after merge. Width = weight: the wide base is the load-bearing tier, the narrow apex is the least important.
┌────────────────────┐
│ Code style │ ← apex. Automate it; don't burn review cycles here.
┌┴────────────────────┴┐
│ Tests │
┌┴──────────────────────┴┐
│ Documentation │ ← focus your review effort
┌┴────────────────────────┴┐
│ Implementation │ ← focus your review effort
┌┴──────────────────────────┴┐
│ API semantics │ ← base. Hardest to change later — spend the most thought here.
└────────────────────────────┘
When writing the findings file, sort by pyramid tier, not by checklist section. A small API-shape concern outranks a big style nit. Style items (no var, JavaDoc syntax, formatting) belong in a "Nits" footer — flag them so the author can fix in passing, but don't elevate them to "Blockers".
Anchor reference: https://www.morling.dev/images/code_review_pyramid.svg
Workflow
1. Identify the target
Parse the user's request for one of:
- PR number (e.g. "418") →
gh pr view <n> and gh pr diff <n>
- PR URL → extract number, same as above
- Branch name →
git diff main...<branch> (or whatever the main branch is)
- No argument →
git diff (working tree) plus git diff --staged; if both empty, gh pr list and ask which one
For PR diffs, persist large output to a file and read it in chunks rather than letting it land in the conversation as one blob. The diff for a non-trivial PR can run 5k+ lines.
2. Read the design context
If the PR touches a new design area, look in _designs/ for a matching markdown file. Hardwood requires non-trivial changes to land a design doc; if one is expected but missing, that's a finding. If one exists, skim it before reading code so the review can flag drift between intent and implementation.
3. Run the checklist
Walk every item in references/checklist.md. Each item has a brief rationale and the failure mode it catches. Skip items that don't apply to the diff (e.g. Dive TUI rules on a core-only change). Do not skip items just because they look unlikely — the checklist exists because each one has been missed at least once.
4. Cross-check tests
For every new behaviour or matcher, check that test coverage is breadth-first across the type/op axes, not just the one the author exercised. Common gap: an oracle test that proves parity for two types when the implementation supports six. List explicitly which (type, op) pairs are not covered.
5. Check doc/code drift
When a PR adds a design doc, JavaDoc, or user-facing docs/content/*.md, read at least the load-bearing claims (gate descriptions, eligibility rules, supported types) and grep the code to confirm the wording matches. Mismatches between "≥ 2 distinct columns" in the doc and leaves.size() < 2 in the code are the dominant doc-bug class.
6. Prune to signal
Before writing the file, walk every candidate finding and apply the inclusion bar:
A finding describes something that is wrong: code that breaks, will regress under a plausible future change, violates a stated CLAUDE.md / design rule, or has a missing safety property the project relies on elsewhere.
Cut anything that fails this bar. Common cuts:
- Non-findings. "No public API change here, theme usage looks fine." If a pyramid tier has no findings, omit the section entirely — do not write a section header followed by a reassurance bullet.
- Taste calls without a stated rule. "Consider renaming
kv to kvLines", "prefer Optional over null here." If CLAUDE.md or a design doc doesn't mandate it and the existing code is correct, drop it.
- Micro-optimisations on cold paths. Render-tier UI code, one-shot init, test fixtures. Flag allocations / branches only on paths that run thousands of times per second or are documented hot.
- Test-style polish when the test is correct. A brittle-but-correct assertion is not a finding; a missing test for a real edge case is. Worked cut: a test that asserts exact substrings depending on a width / indent constant. The test passes today. "It will break if anyone touches the constant" is not a defect, it's a property of every assertion. Don't flag. (A missing test that would catch the height-overflow bug at a different breakpoint is a defect — that survives.)
- PR-process / description hygiene. "The 'docs updated' checkbox is unchecked, please confirm." That belongs in a PR comment, not the review file.
Self-edit prompt for each candidate: "If the author shipped this exact line tomorrow, what specifically breaks, regresses, or fails a CI check / convention?" If you can't answer concretely in one sentence, delete the finding.
Lift decisions out of the findings. Some surviving items aren't pure fix-it work — they're forks the maintainer needs to answer. Move these to the ## Decisions section instead of leaving them mixed in with defects. The tell-tale shapes:
- "Either X or Y" phrasing ("drop the dead branch or move the helper", "document the
-1 or remove it").
- Trade-offs the maintainer owns ("keep the per-class duplication for JIT specialisation or extract a shared helper" — project-wide pattern, not a local fix).
- Sub-pattern changes ("constructor-inject the filter or keep the setter" — both work; the choice is policy).
Decisions should be selectable — the maintainer can tick the option they pick and the file becomes a record of the answer. Format each as:
- **Q:** <question, one line>
- [ ] **A.** <option, one clause — say what *changes* if this is picked>
- [ ] **B.** <option>
- [ ] **C.** <"keep as-is" — include this when the question implies a change, so disagreement has an explicit checkbox>
- **Rec:** <A/B/C> — <one-line justification>
Always include a Rec: line, even if it's "no strong opinion — A reads cleaner". Posing a decision without doing the analysis homework just defers the work. Two options are typical; three when "keep as-is" is plausible; rarely more.
Worked example:
- **Q:** The `split("\n", -1)` branch in `wrapValue` is unreachable — no description contains a newline. Drop it or move the helper somewhere multi-line is exercised?
- [ ] **A.** Drop the branch; `wrapValue` stays in `HelpOverlay`.
- [ ] **B.** Move `wrapValue` to a shared `Strings` helper and exercise the multi-line path there (also resolves the duplication with `DataPreviewScreen`).
- [ ] **C.** Keep as-is.
- **Rec:** B — collapses two findings into one fix.
Items that are unambiguous fixes (one right answer, just hasn't been done) stay in their tier section, not in Decisions.
A decision must have at least two defensible branches. Some findings carry "either X or Y" phrasing but only one branch is a real fix — the other is just "document the magic / explain the constant", which is the lazy non-fix. These are not decisions. They stay under their tier (usually Implementation) as a defect.
Worked example:
wrapValue(description, Math.max(1, descBudget - 1)) — the -1 is unexplained. Either drop it or add a one-liner why.
The two branches are fix it vs paper over it with a comment. Only the first is a real answer. Stays under Implementation semantics as a defect, not a decision.
Compare to a real decision:
Drop the dead split("\n") branch, or move wrapValue to a shared helper where the multi-line case is actually exercised?
Both branches are defensible end-states — one shrinks the surface, the other widens reuse. Goes to Decisions.
If lifting a finding to Decisions would make the underlying defect disappear (because the question is "should we fix this or not"), keep the defect under its tier and only raise a Decision for genuine forks in how to fix.
Better to ship a short review with 5 real defects than a long one where the maintainer has to filter the signal out.
7. Write the findings file
Write to _reviews/pr-<N>-review.md (or _reviews/branch-<name>-review.md for non-PR runs). The _reviews/ directory already exists; do not write findings to the repo root. Format:
# PR #<N> — <title> — Review actions
<https://github.com/hardwood-hq/hardwood/pull/<N>>
## Summary
**What:** <1–2 sentences. The actual change, in your own words. Not a copy of the PR description — what the diff does, structurally.>
**Why:** <1 sentence. The motivator, from the PR description / linked issue / design doc. If you can't tell, say "Motivator not stated in the PR or linked issue." — don't invent one.>
**Assessment:** <1–2 sentences. Headline verdict: ready to merge / ready with nits / needs work / not ready, and the single most load-bearing reason. The detail lives in the sections below; this is the elevator pitch.>
## Decisions
- **Q:** <question or fork — one line>
- [ ] **A.** <option 1 — what changes if you pick this>
- [ ] **B.** <option 2>
- [ ] **C.** <option 3 — typically "keep as-is" if a change is on the table>
- **Rec:** <A/B/C> — <one-line justification>
## Blockers
- [ ] <item> — one-sentence why
## API semantics
- [ ] ...
## Implementation semantics
- [ ] ...
## Documentation
- [ ] ...
## Tests
- [ ] ...
## Nits
- [ ] ...
Emit only sections that have at least one entry. A pyramid tier (or Decisions / Blockers) with nothing in it gets no section header — silence is the signal. The template above lists all seven possible sections; a real review usually has two or three.
Use [ ] not [x] — the maintainer checks items off as they're addressed (per CLAUDE.md "Code Reviews" section).
8. Hand back a short summary
After writing the file, give the user a 3–5 sentence summary: what the PR does, the highest-priority finding(s), and the path to the findings file. Do not repeat the whole list inline — they can read the file.
When NOT to use this skill
- Reviewing a PR in a different repo. The checklist is hardwood-specific.
- Asked to apply a fix or implement a feature. This skill produces findings, not edits.
- Asked for a one-line opinion ("does this look OK?"). Just answer.
Output discipline
- Findings file uses
[ ] checkboxes per CLAUDE.md.
- Group by pyramid tier (see template). Sections with no findings are omitted entirely — do not write a section header followed by "looks fine" or a non-finding bullet.
- Each item is one sentence stating the issue and one sentence (or clause) on why it matters / what fails if ignored. No multi-paragraph justifications.
- Reference specific files / line ranges where useful. Don't fabricate line numbers from memory — if you're not sure, drop the number.
- If a finding is judgment-dependent (e.g. "consider A/B-ing this duplication"), say so. Don't dress up an opinion as a defect.
- A short review with five real defects beats a long one where the maintainer has to filter the signal out. When you find yourself reaching for noise to "look thorough", stop.