with one click
fix-ci-failure
// Diagnose and fix a CI test failure from a GitHub PR or workflow run. Includes root cause analysis, dimensional code review, and PR creation.
// Diagnose and fix a CI test failure from a GitHub PR or workflow run. Includes root cause analysis, dimensional code review, and PR creation.
[HINT] Download the complete skill directory including SKILL.md and all related files
| name | fix-ci-failure |
| description | Diagnose and fix a CI test failure from a GitHub PR or workflow run. Includes root cause analysis, dimensional code review, and PR creation. |
| argument-hint | [github-url] |
| user-invocable | true |
Fix a CI test failure from a GitHub PR or workflow run.
The user will provide a URL to a failing check run, PR checks page, or workflow run.
Use $ARGUMENTS as the URL if provided.
Before any investigation, create tasks for every workflow step using TaskCreate.
These tasks are your checklist โ mark each in_progress when starting and
completed when done. Do NOT skip tasks. Complete them in order.
Create these tasks:
gh CLI to fetch PR and check run details. Prefer gh api and gh pr over WebFetch for GitHub URLs.gh pr list --head {branch} --json number,title,url,statusCheckRollup
gh pr view {number} --json statusCheckRollup --jq '.statusCheckRollup[] | select(.conclusion != "SUCCESS")'
gh api "repos/{owner}/{repo}/check-runs/{check_run_id}/annotations"
gh run view --job {job_id} --log-failed
gh api repos/{owner}/{repo}/issues/{pr_number}/comments --jq '.[] | select(.body | contains("gate")) | .body'
Not all CI failures are test failures. Identify the category before diving into code:
CI gates enforce policies on PRs. When a gate fails:
[no-test-number-check] unless the user explicitly confirms the drop is intentional.<excludes> blocks in surefire/failsafe that were previously overridden by a profile the PR removedcombine.self="override" patterns that clear exclusions)mvn command lineGlob (e.g., **/{TestClassName}.java).pom.xml depended on it? Read the module's pom.xml fully to find exclusions, profile overrides, and plugin configurations.assert statements generously in production code touched by the fix or exercised by new/changed tests. These must have zero production overhead (disabled by default in production JVMs). Good candidates:
assert index >= 0 : "negative index")assert size == backingArray.length)assert !closed)
Do NOT add assertions that duplicate existing validation, have side effects, or are tautological.
If the assertion condition is complex, extract it to a package-private helper method to get full JaCoCo coverage (see .claude/docs/architecture.md ยง Codebase-specific tips โ "JaCoCo and Java assert statements").com.jetbrains.youtrackdb.internal (e.g., extract methods, increase visibility from private to package-private, add package-private accessors for state verification) but never modify the public API under com.jetbrains.youtrackdb.api. Any testability change must not alter the class's external behavior../mvnw -pl {module} spotless:apply to fix formatting../mvnw -pl {module} clean test -Dtest={TestClass}
./mvnw -pl {module} clean test
./mvnw -pl {module} spotless:check to ensure formatting compliance.BLOCKING: Do NOT proceed to Step 8 (summarize) until this step is fully complete. Skipping this step is never acceptable โ it exists to catch bugs in the fix itself before presenting to the user.
If you changed any code or added/fixed/changed any tests, run a dimensional code review and test quality review. This step is mandatory whenever code or tests were modified.
Triage โ Categorize changes and select relevant agents
Before launching agents, perform a quick triage pass over the diff to determine which review dimensions are actually relevant. This avoids wasting time on agents that have nothing meaningful to review.
1a: Categorize each changed file
Scan the diff and assign one or more categories to every changed file (production, test, and other):
| Category | Signals |
|---|---|
| storage-engine | Files in storage/, cache/, wal/, StorageComponent subclasses, page read/write logic, DiskStorage, WriteCache, ReadCache, LogSequenceNumber, double-write log |
| concurrency | synchronized, Lock, Atomic*, volatile, StampedLock, ReentrantLock, thread pools, ConcurrentHashMap, CompletableFuture, shared mutable state, @GuardedBy, ConcurrentTestHelper, CountDownLatch, CyclicBarrier |
| crash-durability | WAL operations, crash simulation, durable StorageComponent recovery, page corruption handling, transaction atomicity under failure, LogSequenceNumber manipulation, double-write log, Java assert statements in production code |
| index-data-structures | Files in index/, B-tree, hash index, SBTree, CellBTree, histogram, IndexEngine |
| network-server | Files in server/, driver/, Gremlin Server, protocol handling, TLS/SSL, authentication, session management |
| sql-query | Files in sql/ (excluding parser/), query execution, command handlers |
| gremlin | Files in gremlin/, traversal steps, YTDBGraph* classes, TinkerPop integration |
| public-api | Files in com.jetbrains.youtrackdb.api, YourTracks, YouTrackDB interface |
| serialization | Record serializers, binary format, property map encoding/decoding |
| configuration | GlobalConfiguration, config parameters, system properties |
| tests-only | Changes exclusively in test files with no production code changes |
| build-config | pom.xml, CI workflows, Maven profiles, Docker configs |
| docs-only | Markdown, documentation, comments-only changes |
A file can belong to multiple categories.
1b: Select code review agents based on categories
| Agent | Launch when ANY of these categories are present |
|---|---|
| review-code-quality | Always launched (unless docs-only is the ONLY category) |
| review-bugs-concurrency | concurrency, storage-engine, index-data-structures, network-server, serialization, gremlin, sql-query |
| review-crash-safety | crash-durability |
| review-security | network-server, public-api, sql-query, serialization, configuration, OR when new dependencies are added in pom.xml |
| review-performance | storage-engine, index-data-structures, concurrency, serialization, sql-query, gremlin |
1c: Select test quality agents based on categories
| Agent | When to launch |
|---|---|
| review-test-behavior | Always (unless docs-only or build-config are the ONLY categories) |
| review-test-completeness | Always (unless docs-only or build-config are the ONLY categories) |
| review-test-structure | Any test files are changed |
| review-test-concurrency | concurrency, OR production code touches shared mutable state / threading primitives even if no concurrency tests exist yet |
| review-test-crash-safety | crash-durability |
1d: Log your triage decision
Before launching agents, output a brief triage summary:
### Triage Summary
- **Categories detected**: storage-engine, concurrency
- **Code review agents selected**: review-code-quality, review-bugs-concurrency, review-performance
- **Code review agents skipped**: review-crash-safety (no crash-durability category), review-security (no network/API/SQL/config/dependency changes)
- **Test quality agents selected**: review-test-behavior, review-test-completeness, review-test-structure, review-test-concurrency
- **Test quality agents skipped**: review-test-crash-safety (no crash-durability category)
1e: Edge cases
docs-only: Skip all agents. Report that only documentation changed and no code review is needed.build-config: Launch review-code-quality (to check for misconfigurations) and review-security (to check for dependency changes).tests-only: Launch review-code-quality, review-bugs-concurrency for code agents, and review-test-behavior, review-test-completeness, review-test-structure for test agents.Launch selected review agents in parallel (fresh sub-agents):
Each agent receives the same context:
## Review Target
CI fix: {brief description of the fix}
Reviewing: changes on current branch vs develop
## Changed Files
{git diff develop...HEAD --name-only}
## Skip These Files (generated code)
- core/.../sql/parser/*, generated-sources/*, Gremlin DSL
## Tooling
Use **mcp-steroid PSI find-usages / find-implementations / type-
hierarchy via `steroid_execute_code`, not grep**, for any reference-
accuracy question about a Java symbol in this fix (callers/overrides
/usages of a method, field, class, or annotation; whether the fix
leaves stale references; whether a new helper duplicates an existing
one; whether a test exercises the same code path the production fix
touches). Grep is acceptable for filename globs, unique string
literals, and orientation reads, but the load-bearing answer behind
a finding must be PSI-backed when the mcp-steroid MCP server is
reachable per the SessionStart hook (`steroid_list_projects` once at
the start confirms the open project matches the working tree). Fall
back to grep with an explicit reference-accuracy caveat in the
finding only when mcp-steroid is unreachable. See
`CLAUDE.md` ยง MCP Steroid โ "Grep vs PSI โ when to switch"
for the full routing rule.
## Diff
{git diff develop...HEAD}
Set subagent_type to the agent name and model to opus for each.
Synthesize findings: After all selected agents complete, deduplicate across dimensions. Prioritize: blocker > should-fix > suggestion.
Address all blocker and should-fix findings:
assert statements in production code (zero-overhead)After applying fixes, run ./mvnw -pl {module} spotless:apply and
re-run the affected tests to confirm they pass.
Re-run only the agent(s) with open findings on the updated changes.
Repeat steps 4-6 until no blocker/should-fix findings remain.
Present to the user:
Do NOT commit or push until the user approves.
develop with:
gh CLI for GitHub API calls, not WebFetch.[no-test-number-check] or similar bypasses. The gate may be catching a real problem (tests silently excluded by build config changes).<excludes> with profile overrides, failsafe configurations gated by profiles, and CI workflow mvn command-line profile flags../mvnw test or ./mvnw verify invocation to finish before starting another. Running tests in parallel across separate Maven processes causes classloading errors, database file locking conflicts, and false test failures. This includes any combination of unit tests, integration tests, and coverage runs.assert statements generously: When fixing or testing production code, add Java assert statements for invariants, preconditions, postconditions, and consistency checks. These cost nothing in production (assertions disabled by default) but catch bugs during development and testing. Do not add assertions that duplicate existing checks or have side effects.com.jetbrains.youtrackdb.internal can be modified to improve testability (e.g., extract methods, widen visibility to package-private, add state-inspection accessors). Never modify the public API under com.jetbrains.youtrackdb.api.refs/notes/test-counts). Fails if any module drops >5%. PR comment contains the per-module comparison table. Bypass with [no-test-number-check] in PR title (only for intentional restructuring).coverage-gate.py. PR comment has per-file tables.ci-integration-tests profile with <excludes combine.self="override"/> to include them. If CI stops activating that profile, those tests silently disappear.EmbeddedGraphFeatureTest runs ~1900 TinkerPop Gremlin scenarios. Requires 4GB heap. Previously gated by ci-integration-tests profile, now runs by default.test phase), integration tests use failsafe (verify phase with -P ci-integration-tests). Test count gate counts surefire results.# Check PR gate comments
gh api repos/JetBrains/youtrackdb/issues/{pr}/comments --jq '.[] | select(.body | contains("gate")) | .body'
# Run single module tests
./mvnw -pl {module} clean test -Dtest={TestClass}
# Run with disk storage (as CI does)
./mvnw -pl {module} clean test -Dyoutrackdb.test.env=ci