ワンクリックで
fix-ci
// Check CI status, analyze test failures, auto-fix obvious issues or discuss with user
// Check CI status, analyze test failures, auto-fix obvious issues or discuss with user
| name | fix-ci |
| description | Check CI status, analyze test failures, auto-fix obvious issues or discuss with user |
| argument-hint | [PR number or URL] |
| allowed-tools | Bash, Read, Edit, Grep, Glob, Agent, AskUserQuestion |
Analyze CI failures for PR $ARGUMENTS (or the current branch's PR if no argument given).
# If $ARGUMENTS is a number or URL, use it directly.
# Otherwise detect from current branch:
gh pr view --json number,title,headRefName --jq '{number, title, headRefName}'
Parse the buildId from any check URL first (see Step 1c), then hit the test summary endpoint:
curl -s "https://dev.azure.com/{org}/{project_guid}/_apis/test/resultsummarybybuild?buildId={buildId}&api-version=7.0-preview"
No auth needed. Returns aggregate counts instantly (~1KB):
{
"aggregatedResultsAnalysis": {
"totalTests": 279518,
"resultsByOutcome": {
"Passed": {"count": 276837},
"Failed": {"count": 24},
"NotExecuted": {"count": 2657}
},
"runSummaryByOutcome": {
"Failed": {"runsCount": 14},
"Passed": {"runsCount": 23}
}
}
}
If Failed.count == 0 and all checks passed — report success, stop.
Use gh pr checks text format (tab-separated) — it reliably includes Azure URLs unlike the JSON statusCheckRollup where detailsUrl can be null.
gh pr checks $PR 2>&1
Output format (tab-separated):
CheckName\tstatus\tduration\tURL
Parse and classify each check:
pass — succeededfail — failedpending — still runningGroup checks by source:
dev.azure.com. Extract buildId from first Azure URL. These are the test jobs.github.com/actions. These include build (compilation), gitleaks (secret scanning), Danger (PR linting). Failed GitHub Actions checks are relevant — report them.CodeRabbit (no URL or review-only) — skip.Extract buildId from the first Azure URL (all checks in one pipeline run share the same buildId).
For each failed GitHub Actions check, fetch its details:
# Extract run_id and job_id from URL: https://github.com/{owner}/{repo}/actions/runs/{run_id}/job/{job_id}
gh run view {run_id} --json jobs --jq '.jobs[] | select(.conclusion == "failure") | {name, conclusion}'
Report these as Category E (non-test failures) in the final output. Common cases:
gh pr view $PR --comments --jq '.comments[-1].body'/loop 2m /fix-ci $PR to monitor."Use the Azure Test Results microservice at vstmr.dev.azure.com. This is a different hostname from the build APIs and serves test result data publicly for public projects.
curl -s "https://vstmr.dev.azure.com/questdb/questdb/_apis/testresults/resultsbybuild?buildId={buildId}&publishContext=CI&outcomes=Failed&\$top=200&api-version=5.2-preview.1"
No authentication or special headers needed. Returns all failed test results (~1-5KB):
[
{
"automatedTestName": "test[/sql/sample_by_fill.test]",
"automatedTestStorage": "io.questdb.test.sqllogictest.SqlTest",
"outcome": "Failed",
"runId": 795795,
"durationInMs": 1067.0,
"id": 100001,
"testCaseTitle": "test[/sql/sample_by_fill.test]"
}
]
Key fields:
automatedTestName / testCaseTitle — the test nameautomatedTestStorage — the test class (e.g., io.questdb.test.sqllogictest.SqlTest)runId — which CI job run this failure came fromoutcome — always "Failed" given the filterDeduplicate by test name — the same test fails across multiple platforms (mac, windows, linux). Group by automatedTestName, collect runIds to know which platforms failed.
If this returns 0 results but Step 1b showed failures, fall back to Step 2b (log tail parsing). This can happen if the pipeline doesn't publish JUnit test results.
The vstmr endpoint returns test names but NOT errorMessage, stackTrace, or computerName. If AZURE_DEVOPS_PAT is set, enrich each failed test with full details.
Step 2 gives us runId per failed test. Use the authenticated test/runs/{runId}/results endpoint to get error details:
# For each unique runId from Step 2:
curl -s -u ":$AZURE_DEVOPS_PAT" \
-H "X-TFS-FedAuthRedirect: Suppress" \
"https://dev.azure.com/questdb/questdb/_apis/test/runs/{runId}/results?outcomes=Failed&api-version=7.0"
This returns full details per failed test:
errorMessage — the assertion failure or exception messagestackTrace — full Java stack traceautomatedTestName, automatedTestStorage — test identityfailingSince — when this test started failingfailureType — type of failureWith this data, skip directly to Step 4 (classification).
Note: test/runs?buildId=... (listing runs by build) requires Build: Read scope and returns 403 with Test Management: Read alone. But test/runs/{runId}/results works with just Test Management: Read — and we already have runIds from the unauthenticated Step 2.
If no PAT is set, suggest the user create one for richer failure data:
To get error messages and stack traces without downloading logs, set
AZURE_DEVOPS_PAT:
- Go to https://dev.azure.com/questdb/_usersSettings/tokens
- Click "New Token"
- Set scope: Test Management → Read (the only scope needed)
- Add to
~/.zshenv:export AZURE_DEVOPS_PAT=<token>Without it, I can still see which tests failed but need to download log tails for error details.
Only suggest this once per session, and only if log tail parsing is actually needed (i.e., the test names alone aren't enough for classification).
Use this when Step 2 returns 0 results or when error messages are needed and no PAT is available.
Extract org, project GUID, buildId, jobId from each failed check's URL:
https://dev.azure.com/{org}/{project_guid}/_build/results?buildId={buildId}&view=logs&jobId={jobId}
All failed checks share the same buildId. Deduplicate: fetch the timeline only once per buildId.
curl -s "https://dev.azure.com/{org}/{project_guid}/_apis/build/builds/{buildId}/timeline?api-version=7.0"
No authentication needed (public project).
Parse the JSON response. Records form a tree: Stage -> Job -> Task. For each failed check's jobId:
type == "Task" AND parentId == jobId AND result == "failed"name and log.id from each failed taskcurl -s "https://dev.azure.com/{org}/{project_guid}/_apis/build/builds/{buildId}/logs?api-version=7.0"
Response: {"value": [{"id": N, "lineCount": M, ...}, ...]}. Extract lineCount for each failed step's logId.
Maven Surefire writes the error summary at the END of its output. Use the line-range API to fetch only the last ~500 lines:
curl -s "https://dev.azure.com/{org}/{project_guid}/_apis/build/builds/{buildId}/logs/{logId}?api-version=7.0&startLine={lineCount - 500}&endLine={lineCount}"
This fetches ~50KB instead of ~200MB. Save to a temp file for parsing.
Look for these patterns in the downloaded tail:
Test error summary (exceptions during test execution):
[ERROR] Errors:
[ERROR] ClassName.testMethod:lineNum->...chain... >> ExceptionType message
Test failure summary (assertion mismatches):
[ERROR] Failures:
[ERROR] ClassName.testMethod:lineNum expected:<X> but was:<Y>
Totals line:
[ERROR] Tests run: N, Failures: M, Errors: K, Skipped: L
Compilation error (different pattern entirely):
[ERROR] COMPILATION ERROR
[ERROR] /path/to/File.java:[line,col] error: ...
For each failed test, extract:
SqlParserTest, WindowFunctionTest):lineNum in the chain)The same test might fail on multiple platforms (mac-griffin, windows-griffin, linux-griffin). Group failures by ClassName.testMethod — if the error message is the same across platforms, it's one logical failure. Note which platforms are affected.
gh pr diff $PR
Parse to understand:
For each failed test group, determine the category:
Category A — Auto-fixable (ALL must hold):
Failures: section, not Errors: section)guardAgainst..., throw SqlException blocks, early-return checks). If the PR removed a guard WITHOUT replacing the underlying functionality, the test likely exposes an unhandled code path — escalate to Category B/D. Only auto-fix if the PR replaced the guarded code with a new implementation that handles the case.The heuristic: "Did the PR replace the functionality, or just remove the gate?" Replacing → auto-fix. Removing gate without replacement → discuss.
Category B — Behavior precision:
Category C — Potential regression:
Category D — Potential incompleteness:
Category E — Non-test failure:
Find the test source file:
io.questdb.test.griffin.FooTest -> search in core/src/test/java/**/FooTest.javaGet the expected vs actual values:
assertEquals): use directlyassertQueryNoLeakCheck): need to search the full log<<< FAILURE! line for this test method:
# Search in chunks of 50K lines from the end, looking for the test method name + FAILURE
curl -s "...logs/{logId}?startLine={lineCount-50000}&endLine={lineCount}" -o /tmp/ci-chunk.txt
grep -n "testMethodName" /tmp/ci-chunk.txt | grep -i "FAILURE\|ERROR\|expected\|but had"
Read the test method in the source file. Find the assertion call and its expected value.
Update the expected value to match the actual output. Use the Edit tool.
Report what was changed:
Auto-fixed: FooTest#testBar
- Updated expected output: [brief description of what changed]
- Reason: PR changed [behavior X], test expected old output
- Platforms affected: mac-griffin, windows-griffin, linux-griffin
Present a structured report. Group by category, within each category group by similarity.
## CI Failures: PR #NNN — [PR title]
Analyzed N failed jobs across M platforms.
### Auto-fixed (if any)
- `FooTest#testBar`: updated expected output — [description]
### Needs Discussion
#### Potential Regression (Category C)
Tests in areas NOT touched by this PR:
- `BarTest#testQux`: NullPointerException at SomeClass.java:42
Platforms: linux-griffin, mac-griffin
[Stack trace summary]
#### Potential Incompleteness (Category D)
New logic may not handle these cases:
- `WindowFunctionTest#testWindowAsArg`: SqlException "Window function is not allowed in context of aggregation"
The PR added [feature X] but these tests show queries that combine window functions with aggregation.
Platforms: all
#### Behavior Precision (Category B)
Connected to PR changes but need review:
- `SqlParserTest#testWindowFuncOrder`: expected query model differs from actual
The PR changed [parser behavior X]; this test may need updating or may reveal unintended side effect.
#### Non-test Failures (Category E)
- Job `windows-cairo-2`: "Compile with Maven" step failed — compilation error in FooBar.java:123
After presenting, ask the user how to proceed with each group.
The test results microservice lives on a separate hostname. No authentication or special headers needed.
| Endpoint | Returns |
|---|---|
.../resultsbybuild?buildId={id}&publishContext=CI&outcomes=Failed&$top=200&api-version=5.2-preview.1 | Array of failed test results: automatedTestName, automatedTestStorage, outcome, runId, durationInMs |
.../resultdetailsbybuild?buildId={id}&publishContext=CI&groupBy=TestRun&$filter=Outcome eq Failed&shouldIncludeResults=true&queryRunSummaryForInProgress=false&api-version=5.2-preview.1 | Failed results grouped by test run, with counts per outcome |
Base: https://vstmr.dev.azure.com/questdb/questdb/_apis/testresults
Test Management: Read| Endpoint | Returns |
|---|---|
https://dev.azure.com/questdb/questdb/_apis/test/runs/{runId}/results?outcomes=Failed&api-version=7.0 | Full details: automatedTestName, errorMessage, stackTrace, failingSince, failureType |
Auth: curl -u ":$AZURE_DEVOPS_PAT". Note: test/runs?buildId=... (listing runs) needs Build: Read scope, but test/runs/{runId}/results works with just Test Management: Read since we get runIds from the unauthenticated vstmr call.
| Endpoint | Returns |
|---|---|
https://dev.azure.com/questdb/{project_guid}/_apis/test/resultsummarybybuild?buildId={id}&api-version=7.0-preview | Aggregate counts: total, passed, failed, not executed |
Base: https://dev.azure.com/questdb/{project_guid}/_apis/build
The project GUID is embedded in check URLs. Parse it from there rather than hardcoding.
| Endpoint | Returns |
|---|---|
/builds/{buildId}/timeline?api-version=7.0 | {records: [{id, parentId, type, name, result, state, log: {id}, order}]} |
/builds/{buildId}/logs?api-version=7.0 | {value: [{id, lineCount, createdOn}]} |
/builds/{buildId}/logs/{logId}?api-version=7.0&startLine=N&endLine=M | Plain text, lines N through M |
Stage — pipeline stage (parent of Jobs)Job — a CI job (parent of Tasks), maps to a GitHub checkTask — a step within a job, has log.id for log downloadresult="succeeded" → successresult="failed" → failureresult="skipped" → skippedresult="canceled" or "cancelled" → cancelledstate="completed" with no result → success