with one click
test-review
// Review test quality across multiple dimensions (behavior, completeness, structure, concurrency, crash-safety) using specialized agents with triage-based selection.
// Review test quality across multiple dimensions (behavior, completeness, structure, concurrency, crash-safety) using specialized agents with triage-based selection.
[HINT] Download the complete skill directory including SKILL.md and all related files
| name | test-review |
| description | Review test quality across multiple dimensions (behavior, completeness, structure, concurrency, crash-safety) using specialized agents with triage-based selection. |
| argument-hint | [branch | commit-range | uncommitted | PR-number] |
| user-invocable | true |
Review test quality across multiple dimensions by dispatching to specialized test review agents and synthesizing their findings.
Use $ARGUMENTS as the review target if provided (branch name, commit range, or "uncommitted").
Determine the review context using the following priority:
$ARGUMENTS is provided:ytdb-605-unified-edges): Review all changes on that branch that are absent from the base branch.abc123..def456): Review that specific range.last 3 commits): Review HEAD~N...HEAD.git diff HEAD).#42 or https://github.com/.../pull/42): Fetch PR details and review its diff.$ARGUMENTS is empty:gh pr list --head $(git branch --show-current) --json number,title,body,baseRefName,url --limit 1
develop:
git log develop..HEAD --oneline
develop, review those.develop or has no commits ahead, ask the user what to review.baseRefName (fetched via gh pr view).develop.Based on the review mode, collect:
git diff {base}...HEAD --name-only
git diff {base}...HEAD
git log {base}..HEAD --oneline
git diff {start}..{end} --name-only
git diff {start}..{end}
git log {start}..{end} --oneline
git diff HEAD --name-only
git diff HEAD
gh pr view {number} --json body --jq '.body'
Store the collected context:
DIFF โ the full diff outputCHANGED_FILES โ the list of changed file pathsCOMMIT_LOG โ the commit historyPR_DESCRIPTION โ the PR body text (empty string if no PR)REVIEW_SCOPE โ human-readable description of what's being reviewedNote files that should be skipped:
core/src/main/java/com/jetbrains/youtrackdb/internal/core/sql/parser/generated-sources/ or generated-test-sources/Before dispatching agents, perform a quick triage pass over the entire diff (both production and test code) to determine which test review dimensions are actually relevant. This avoids wasting time on agents that have nothing meaningful to review.
Scan the diff and assign one or more categories to every changed file โ production code, test code, and other files alike:
| Category | Signals |
|---|---|
| storage-engine | Files in storage/, cache/, wal/, StorageComponent subclasses, page read/write logic, DiskStorage, WriteCache, ReadCache, LogSequenceNumber, double-write log |
| concurrency | synchronized, Lock, Atomic*, volatile, StampedLock, ReentrantLock, thread pools, ConcurrentHashMap, CompletableFuture, shared mutable state, @GuardedBy, ConcurrentTestHelper, CountDownLatch, CyclicBarrier |
| crash-durability | WAL operations, crash simulation, durable StorageComponent recovery, page corruption handling, transaction atomicity under failure, LogSequenceNumber manipulation, double-write log, Java assert statements in production code |
| index-data-structures | Files in index/, B-tree, hash index, SBTree, CellBTree, histogram, IndexEngine |
| serialization | Record serializers, binary format, property map encoding/decoding |
| sql-query | Files in sql/ (excluding parser/), query execution, command handlers |
| gremlin | Files in gremlin/, traversal steps, YTDBGraph* classes, TinkerPop integration |
| network-server | Files in server/, driver/, Gremlin Server, protocol handling, authentication |
| public-api | Files in com.jetbrains.youtrackdb.api, YourTracks, YouTrackDB interface |
| configuration | GlobalConfiguration, config parameters, system properties |
| build-config | pom.xml, CI workflows, Maven profiles, Docker configs |
| docs-only | Markdown, documentation, comments-only changes |
A file can belong to multiple categories (e.g., a concurrent index test is both concurrency and index-data-structures). Production and test files in the same domain should share the same categories.
Two agents always run because they catch general gaps regardless of domain. The remaining three are specialized and only launch when their domain is relevant. Categories from both production and test code count โ for example, if production code adds a new synchronized block but tests don't exercise threading, review-test-concurrency should still launch to flag the gap.
| Agent | When to launch |
|---|---|
| review-test-behavior | Always (unless docs-only or build-config are the ONLY categories) |
| review-test-completeness | Always (unless docs-only or build-config are the ONLY categories) |
| review-test-structure | Any test files are changed (reviews isolation, readability, setup/teardown of test code itself) |
| review-test-concurrency | concurrency, OR production code touches shared mutable state / threading primitives even if no concurrency tests exist yet |
| review-test-crash-safety | crash-durability |
Before launching agents, output a brief triage summary so the user can see the reasoning:
### Triage Summary
- **Categories detected**: storage-engine, index-data-structures
- **Agents selected**: review-test-behavior, review-test-completeness, review-test-structure, review-test-crash-safety
- **Agents skipped**: review-test-concurrency (no threading primitives or shared mutable state in changes)
docs-only or build-config with no production or test code changes: Skip all agents. Report that no reviewable code was changed.Launch the selected agents in parallel using the Agent tool. Each agent receives the same context but reviews from its own dimension.
For each agent, use this prompt template:
Review the following code changes from your specialized test quality perspective.
## Review Scope
{REVIEW_SCOPE}
## PR Description
{PR_DESCRIPTION or "No PR associated with these changes."}
## Commit Log
{COMMIT_LOG}
## Changed Files
{CHANGED_FILES}
## Skip These Files (generated code)
- core/src/main/java/com/jetbrains/youtrackdb/internal/core/sql/parser/*
- Any files under generated-sources/ or generated-test-sources/
- Generated Gremlin DSL classes
## Tooling
Use **mcp-steroid PSI find-usages / find-implementations / type-hierarchy
via `steroid_execute_code`, not grep**, for any reference-accuracy
question while reviewing test quality (which production methods this
test exercises and where else they're called; whether an assertion's
expected value matches the contract observed at every override of an
SPI; whether a test fixture's helper has additional consumers that
constrain its shape). Grep is acceptable for filename globs, unique
string literals, and orientation reads, but the load-bearing answer
behind a finding must be PSI-backed when the mcp-steroid MCP server is
reachable per the SessionStart hook (`steroid_list_projects` once at
the start confirms the open project matches the working tree). Fall
back to grep with an explicit reference-accuracy caveat in the finding
only when mcp-steroid is unreachable. See `CLAUDE.md` ยง MCP Steroid โ "Grep vs PSI โ when to switch" for the full routing rule.
## Diff
{DIFF}
The 5 possible agents (launch only those selected in Step 5):
Set subagent_type to the agent name and model to opus for each.
After all selected agents complete, produce a unified review report. Do NOT simply concatenate the outputs. Instead:
Deduplicate: If multiple agents flagged the same issue (e.g., a missing test for a concurrent crash scenario flagged by both concurrency and crash-safety), merge into one finding and note which dimensions it affects.
Prioritize: Order findings by severity:
Attribute: For each finding, indicate which review dimension(s) identified it.
Summarize: Write a brief overall assessment (2-3 sentences).
## Test Quality Review: {REVIEW_SCOPE}
### Overall Assessment
[2-3 sentences: are tests meaningful and thorough? What are the main gaps?]
### Blockers
[Tests that give false confidence or missing tests for dangerous code paths]
1. **[Dimension]** `path/to/file.ext` (line X-Y)
- **Issue**: ...
- **Suggestion**: ...
### Should-Fix
[Missing corner cases, weak assertions hiding real bugs]
1. **[Dimension]** `path/to/file.ext` (line X-Y)
- **Issue**: ...
- **Suggestion**: ...
### Suggestions
[Recommended improvements, test data quality, naming, optional edge cases]
1. **[Dimension]** `path/to/file.ext` (line X-Y)
- **Issue**: ...
- **Suggestion**: ...
### Questions for the Author
[Clarifying questions aggregated from all reviewers]
If a priority level has no findings, omit it entirely.
gh CLI for GitHub API calls, not WebFetch..claude/workflow/review-iteration.md).