with one click
code-review
// Principal engineer code review of changed/new code before presenting to user
// Principal engineer code review of changed/new code before presenting to user
[HINT] Download the complete skill directory including SKILL.md and all related files
| name | code-review |
| description | Principal engineer code review of changed/new code before presenting to user |
| allowed-tools | Read, Grep, Glob, Bash |
Review the code that was just written or modified. Act as a principal engineer reviewing a junior engineer's work. Be thorough but not pedantic.
Before reviewing, determine which files were changed (from context, git diff, or the conversation). Classify each changed file into one or more domains:
.github/, *.tf, Dockerfile, docker-compose*, CI/CD configs**/migrations/**, *.sql (outside warehouse-model paths), schema definitions, CDC / change-stream config, ETL/ELT pipeline code, warehouse ingestion connector configsmodels/**/*.sql, models/**/*.yml, dbt_project.yml, macros/**, tests/** (dbt-style), seeds/**, scheduled-query / view definitions in warehouse-config repos, semantic-layer files*.tsx, *.jsx, *.css, src/components/**, src/pages/***.go, *.py, NoSQL access-pattern code, schema design files (Prisma, ORM models, NoSQL ODM models).claude/**.lovable/**Apply the Base checklist always. Apply each Domain checklist only when at least one changed file matches that domain.
If a project-specific layer exists for this skill, invoke it now and merge its checklist into the items below. Glob for .claude/skills/code-review-*/SKILL.md from the repo root (resolved via git rev-parse --show-toplevel); if exactly one matches, invoke it via the Skill tool. If multiple match, list them and stop — that's a config error in the project, not a review you can resolve. If none match, proceed without a layer.
Before reviewing for gaps, answer: is the implementation appropriately sized for what the change needed to accomplish? Gap-finding on an over-elaborate change elaborates it further, and the checklist won't surface "the whole implementation is the wrong shape."
Markers of over-elaboration:
If over-elaborated: stop. Surface the simpler implementation as the primary review output before any checklist item. Otherwise proceed to the checklist.
Question implementation choices, not feature scope. Feature scope is fixed by the ticket; implementation choice is reviewer-tunable.
Evaluate the code against each item. Only flag items where there is a concrete issue — do not flag items just to show you checked them.
Read every changed file fully, including generated ones. Generated files (Supabase types, OpenAPI clients, GraphQL codegen) need the same scrutiny — runners like Vitest+esbuild strip types without validating, so npm warnings or build noise can leak into the file head undetected. Check head -5 of generated files even when tests are green.
API misuse — Are libraries, frameworks, and language APIs used as designed? Flag any reliance on accidental or undocumented behavior (passing invalid arguments that happen to work, using internal methods, relying on side effects of unrelated calls).
Error handling changes — Are there catch blocks, fallback defaults, or error handlers that hide failures the caller would want to know about? Empty catches, catch-and-return-null, and catch-and-log-only are all suspects. If this change modifies error/fallback behavior, trace the error paths — most production incidents come from changed catch blocks and silent fallbacks, not happy-path logic.
Race conditions — Is shared mutable state accessed concurrently without synchronization? Check module-level variables, singletons, caches, and lazy-init patterns.
Silent defaults for unexpected values — Does the code silently substitute a default for an unexpected value (unknown enum variant, unrecognized config key)? In infrastructure and test code, prefer throwing over guessing.
Feature flag coverage — If the change adds or modifies feature flags or conditional rollout logic, are all flag states tested? Check for stale flags that are always-on/always-off and should be removed; verify the default-off state doesn't break existing behavior.
Dead exports — Are there exported types, functions, or constants not imported by any other file? Check with grep before flagging.
Unnecessary wrappers — Are there functions that simply delegate to another function without adding logic, type narrowing, or meaningful naming?
Inline business logic where a library method exists — Is there hand-rolled logic (regex parsing, string manipulation, date math, data structure ops) where the project's existing dependencies already provide a tested function for the same thing?
Repeated in-house logic that should be extracted — Is the same non-trivial logic block (≥~5 lines, or any block with semantic identity beyond formatting) repeated in three or more sites in this codebase, where a single shared helper would carry the same behavior? The default threshold is three instances, but it's a default, not a rule: a 2-instance extraction can be right when the logic is obviously a single domain concept (Stripe error shaping, retry with backoff, auth-error mapping) and a 5-instance repetition can be wrong to extract when the bodies are coincidentally similar but semantically distinct (each one will diverge under its own pressure). Suggest the extraction site (_shared/<helper>.ts, lib/<helper>.ts, or the codebase's established shared-utility location) and name what the helper should encapsulate, not just "DRY this up." Carve-out: production code is DRY; test code is DAMP — repeated arrange/assert blocks in tests often aid readability over abstraction, so do not flag test-file duplication unless the repeated block is large enough to obscure the test's intent.
Undocumented limitations — Does the code make assumptions or have known constraints invisible to future readers (only handling the first element, assuming single-tenant, ignoring edge cases by design)?
Misleading names — Do function or variable names promise more or less than they deliver? A validateUser that only checks one field, an allItems that holds a filtered subset.
Concurrency and parallelism scoping — Do concurrency groups, mutex locks, or job dependencies match their intended scope? A workflow-level concurrency group affects all jobs, including no-op or unrelated ones — check that cancel-in-progress won't kill an important job due to an unrelated trigger.
Secret exposure — Are secrets used in contexts that could log them? Check for secrets in run: commands that echo or pipe output, in env: blocks visible to steps that don't need them, and in artifact uploads. Ensure secrets are not passed as command-line arguments (visible in process lists).
Permissions least privilege — Are workflow permissions, IAM roles, or service accounts scoped to what's actually needed? Flag contents: write when only read is required, admin when write suffices, or wildcard permissions.
Idempotency — Is the workflow/script safe to re-run? Check for unconditional creates (without "if not exists"), non-atomic operations that leave partial state on failure, and missing cleanup on retry.
Trigger-condition alignment — Do trigger filters (branch, path, actor, event type) match the job's purpose? A job intended only for bot commits but triggered on all pushes is a mismatch even if individual steps have if guards.
Migration reversibility — Destructive operations (DROP COLUMN, type narrowing, table drops) need a backup or reversal path.
Index coverage — New WHERE/JOIN/ORDER BY columns and foreign keys need supporting indexes, especially on growing tables.
Lock safety — Flag long-locking DDL on large tables — ALTER with defaults, CREATE INDEX without CONCURRENTLY, in-migration backfills.
RLS and access control on new tables — New tables exposed via auto-generated APIs (PostgREST, Hasura, generated resolvers) need row-security enabled with policies.
Accessibility — Do interactive elements have accessible names (aria-label, visible label, alt text)? Are click handlers on non-button elements keyboard-accessible? Check for missing focus management in modals and drawers.
Render performance — Are there new inline object/array/function literals in JSX props that would cause child re-renders on every parent render? Check for missing key props on list items and expensive computations not wrapped in useMemo/useCallback where the component re-renders frequently.
Bundle impact — Does the change add a large new dependency where a smaller alternative or existing utility exists? Flag full-library imports (e.g., all of lodash) when only one function is used.
State-dependent rendering coverage — Does the change modify which UI state a component enters (conditional branches, state machines, context-driven rendering)? If so, check whether tests verify the affected states render correctly. For each new or changed condition, is there a test that the component renders the expected output for each branch?
Auth boundary coverage — Every new endpoint or RPC needs both authentication (who) and authorization (can they) — and the check must not be bypassable by hitting the endpoint outside the UI flow.
Input validation at system boundaries — Validate/sanitize user input before SQL, shell, file paths, or outbound API calls — framework parameterization counts, string concatenation does not.
Error response leakage — Error responses must not expose internal details (stack traces, internal IDs, DB error text, file paths) — log server-side, return generic to the client.
Dependency upgrades — Does the change upgrade a runtime or build dependency? Read the changelog for breaking changes. Check for peer dependency conflicts and, for frontend deps, bundle size regression.
Third-party API integration — Does the change add or modify a third-party API call? Verify retry/timeout behavior, credential scoping (least-privilege keys), and failure modes (API down, unexpected data).
Sensitive data in logs — Does the change add or modify logging? Verify logs do not include tokens, credentials, PII, or full API responses from auth/OAuth endpoints. Extract only the fields needed for debugging.
Performance-sensitive code paths — Does the change modify a hot path (queries in loops, N+1, cache read/write, large list ops)? Verify with representative data volumes, not just test fixtures.
For .claude/skills/**/SKILL.md review (frontmatter, trigger design, voice, length, behavior test, cross-reference vs duplication, and behavioral-equivalence audit on compressions), invoke the skill-review skill — do not assert behavioral equivalence on prose compressions yourself; that audit is skill-review's job.
permissions.allow rules in settings.json follow least-privilege? Flag blanket allows ("Bash") where scoped ("Bash(git:*)") would suffice. If permissions.allow rules were added or modified, invoke /review-permissions for deep security analysis.For hook reviews (claude/.claude/hooks/*.sh, hook entries in settings.json), invoke the claude-hook-review skill.
Apply when changed files match .lovable/**. Invoke the lovable-knowledge skill for the review checklist (perspective, specificity, scope split between project/workspace knowledge, char budget, sync status).
Start with a one-line summary of which domains were detected (e.g., "Domains: Infrastructure, Backend").
If ripple-effect subagents were spawned, follow with a Consulted: line listing every persona by name, comma-separated and case-exact (e.g., "Consulted: ciso-reviewer, staff-backend-engineer"). Omit the line entirely when no subagents were spawned. The fixed prefix Consulted: makes the spawn grep-able post hoc.
For each finding, state:
If no issues are found, say: "No issues found" — do not pad with praise or generic observations.
After the implementation-fitness gate and the checklist review, consider whether the change crosses system boundaries in a way that exceeds your own judgment depth.
You are the principal-engineer-generalist running this skill; specialists below have narrower lane depth, not broader context. Form your own ripple judgment first using the table as a reference for what boundaries exist in the change.
This is the last gate before ship — a specialist miss here ships as a regression. Lean conservative; misses are not recoverable downstream the way they are at plan-review. Spawn a specialist when at least one applies:
Always spawn ciso-reviewer when the change touches auth/authz, secrets, tokens, data exposure, sensitive-data logging, third-party data sharing, or infra permissions — high-stakes-boundary case is non-optional, unless the change is declared dev-only or internal-only (no privilege boundary crossed) (e.g., a dev-only flow with no production reachability, or an internal-only path where engineers themselves are the only callers and the change crosses no privilege boundary they shouldn't cross). When skipping on those grounds, name the surface in the review output — never silently skip. Always spawn staff-product-engineer when the change alters user-facing behavior.
Spawn per question (not per file-path domain) — "change touches .github/" isn't enough; the question needs a specific shape.
When you spawn: spawn on the CODE, not on this review's output (each subagent reads the diff fresh); pick the specialist that serves the question (table is reference, not roster); pass diff scope, specific question, AND — for re-review — prior findings + what's been applied. Reviewers without prior context re-discover; that's wasted spawn.
The Change type column keys on what the change does for an operator or consumer, not on file types — a markdown-only diff can cross a runtime-config or CI/CD boundary if it restructures the taxonomy operators use to provision secrets, identify deploy targets, or reason about config layering. Use the table as a checklist of boundaries to consider, not as default-fire triggers.
| Change type | Spawn |
|---|---|
| Restricts DB access (RLS, GRANT, triggers) | ciso-reviewer + staff-backend-engineer — trace restrictions against caller code and check for privilege escalation |
| Changes API response shape | staff-product-engineer + staff-backend-engineer — verify all consumers handle new shape |
| Adds/modifies security controls | staff-sdet + ciso-reviewer — verify test pyramid, coverage, and threat model |
| Changes auth model (JWT, roles, permissions) | ciso-reviewer + staff-backend-engineer — trace all auth paths including token refresh, session expiry, and error fallbacks |
| Modifies shared utilities (helpers, hooks, contexts) | staff-backend-engineer + staff-frontend-engineer — verify all call sites and check for behavioral assumptions |
| Changes data model (columns, types, defaults, migrations) | Route by change type: new nullable column, index, or view → staff-backend-engineer only; new table → staff-backend-engineer + staff-analytics-engineer; rename, drop, type change, NOT NULL constraint added, partition key, or RLS policy → staff-backend-engineer + staff-data-engineer + staff-analytics-engineer. Add staff-product-engineer if user-visible. |
| Adds or changes warehouse models / dbt transformations / semantic-layer files | staff-analytics-engineer (modeling, transformation correctness, materialization, test coverage) |
| Adds or changes CDC / change-stream / ETL/ELT pipeline / warehouse ingestion connector | staff-data-engineer (transport, schema-drift, observability) + staff-platform-engineer (operational footprint) |
| Modifies CI/CD pipelines or deploy config | staff-platform-engineer + staff-backend-engineer — verify pipelines and environment consistency |
| Changes runtime config (env vars, secrets, feature flags) | staff-platform-engineer + ciso-reviewer — verify config is consistent across environments, check for leaked secrets |
Reshapes reviewer ownership (substantive edits to plan-review/code-review skill routing tables, or scope language in agents/*.md) | Spawn every persona named in the pre- or post-edit table — each evaluates whether their row (or its removal) is accurate, scoped, and not bleeding into another lane. The pre/post union ensures a row deletion still spawns the affected persona. For an agents/*.md edit, spawn the edited persona plus their Item-ownership co-owners. Skip whitespace / typo / copy-edit-only diffs. |
Output: State which boundaries you considered and what your generalist judgment was on each. If you escalated, list which specialists and why; if you didn't, name the boundary and your read. Empty findings on a boundary you considered is a valid result — write the read, don't omit it.
When you do spawn a specialist, be specific. "Spawn ciso-reviewer" is useless; "Spawn ciso-reviewer and ask it to verify the checkout flow in CheckoutPage.tsx still enforces ownership after the new validation" is actionable.
When spawning staff-backend-engineer, pass findings_path: agent-reviews/staff-backend-engineer-<epoch>-<slug>.md in the prompt, where epoch = $(date +%s) and slug = $(git rev-parse --abbrev-ref HEAD | tr '/' '-' | cut -c1-20). The agent writes structured Markdown findings to that path and returns inline only: path, one-sentence summary, finding count. After it returns, Read the findings file — Recommendations section first, full file when count > 0. Before the first spawn, add agent-reviews/ to $(git rev-parse --git-dir)/info/exclude idempotently (grep-check before appending) so findings files can't be accidentally staged; cleanup is automatic with the worktree.
Other spawned specialists must return ≤2K tokens of structured findings (checklist-item-keyed bullets), not narrative prose; if they exceed the budget, prioritize by severity and explicitly note that lower-severity items were omitted. Include this constraint in each agent's prompt.
After spawned reviewers return findings, pause if findings concentrate on a single surface — the same feature, implementation detail, or design choice attracting multiple gaps. If two specialists flag the same file:line with the same root cause, present the finding once with both reviewer attributions rather than as duplicate findings. Two readings:
You judge which applies. Don't treat convergence as automatic authority for "patch each gap." If implementation-wrong-shape, replace the surface and re-run Step 1. If prompt-overlap, apply the underlying finding once, skip duplicates, and note the overlap so the next spawn uses tighter prompts.
Routes each checklist item to the reviewer subagent(s) that file findings on it. Bold shorthands match titles above; numbers are the dispatcher's primary key. Primary owner files findings; co-owners are spawned where the item touches their turf. When in doubt, this table wins over inline mentions.
The dispatcher fires reviewers per file-path domain detection. Each agent self-scopes against the diff and returns early ("No X concerns") when out of lane.
| Item | Primary owner | Co-owners |
|---|---|---|
| 1. API misuse | staff-backend-engineer (server-side library / SDK use) | staff-platform-engineer (build / CI tools), staff-frontend-engineer (client-side library use) |
| 2. Error handling changes | staff-backend-engineer (server error paths), staff-frontend-engineer (client UX) | ciso-reviewer (sensitive-data leak), staff-sdet (catch-branch coverage) |
| 3. Race conditions | staff-backend-engineer | — |
| 4. Silent defaults | staff-backend-engineer, staff-platform-engineer (infra / test code) | — |
| 5. Feature flag coverage | staff-product-engineer (default-off semantics) | staff-platform-engineer (rollout) |
| 6–9. Hygiene (dead exports, unnecessary wrappers, inline business logic, repeated in-house logic) | judgment (any reviewer) | — |
| 10. Undocumented limitations | staff-product-engineer (user-visible limitations) | judgment (others) |
| 11. Misleading names | staff-product-engineer (API / copy facing) | staff-frontend-engineer (component / hook), staff-backend-engineer (server) |
| 12. Test adequacy for security controls | ciso-reviewer (designated writer) | staff-sdet (second-reader) |
| 13. Pre-existing issues in unchanged code | judgment (any reviewer) | — |
| 14–18. Infrastructure (concurrency scoping, secret exposure, least-privilege, idempotency, trigger alignment) | staff-platform-engineer | ciso-reviewer (15 secret exposure, 16 least-privilege) |
| 19. Migration reversibility | staff-data-engineer (rollback safety, pipeline impact) | staff-backend-engineer |
| 20. Index coverage | staff-backend-engineer (app-query coverage) | staff-data-engineer (DDL risk and bloat) |
| 21. Lock safety | staff-data-engineer (DDL execution shape, pipeline impact) | staff-platform-engineer (deploy-window, lock-budget) |
| 22. RLS / access control on new tables | staff-data-engineer (enforceability) | ciso-reviewer (threat framing) |
| 23. Accessibility | staff-frontend-engineer (technical a11y) | staff-product-engineer (a11y as spec fidelity) |
| 24. Render performance | staff-frontend-engineer | — |
| 25. Bundle impact | staff-frontend-engineer | staff-platform-engineer (build tooling) |
| 26. State-dependent rendering coverage | staff-frontend-engineer (branch implementation) | staff-sdet (test coverage), staff-product-engineer (right branches for spec) |
| 27. Auth boundary coverage | staff-backend-engineer | ciso-reviewer |
| 28. Input validation at boundaries | staff-backend-engineer | ciso-reviewer |
| 29. Error response leakage | staff-backend-engineer | ciso-reviewer |
| 30. Dependency upgrades | staff-backend-engineer (runtime deps) | staff-platform-engineer (CI / build deps) |
| 31. Third-party API integration | staff-backend-engineer | ciso-reviewer (credential scoping) |
| 32. Sensitive data in logs | staff-backend-engineer | ciso-reviewer |
| 33. Performance-sensitive paths | staff-backend-engineer (app-level query patterns) | staff-data-engineer (DDL / index / read-path) |
| 34. Permission scope | ciso-reviewer | — |
35. Hook correctness — reviewed via claude-hook-review skill | staff-platform-engineer | ciso-reviewer |
If the review is clean (no blockers, no unresolved critical findings, and you reviewed the currently staged changes), record it by running this command exactly once:
~/.claude/scripts/marker.sh write code-review
This writes the hash of the currently staged diff into a per-session marker keyed by <repo-hash>.<session-id>. The pre-commit hook reads the same session-id from its JSON payload and compares the staged-diff hash against THIS session's marker — match means the commit is allowed through. Per-session keying prevents two parallel sessions in the same worktree from overwriting each other's markers. Re-staging any change invalidates the marker automatically.
If the chain fails (empty SESSION_ID, etc.), the capture-session-id.sh SessionStart hook didn't run — abort and report; do not proceed without the marker, since git commit will be blocked by the gate.
Do NOT write the marker if:
If you skip it, say so explicitly so the user knows the commit gate will block until issues are fixed and the review is re-run on the final staged state.
This marker must be written only by this skill. Never compute and write it manually — that satisfies the hook's shell check but forges the gate. If the hook blocks a commit and you cannot invoke this skill, spawn a subagent that can; do not hand-write the marker path.