| name | ci-diagnostics |
| description | Diagnose Proton CI failures and performance comparison results from GitHub checks and uploaded reports. Make sure to use this skill whenever CI checks are mentioned, a PR has red or failing checks, the user pastes a CI URL, or asks about test failures in the pipeline, even if they just ask 'why is CI failing'. |
CI Diagnostics
Inputs
- PR number or URL
- optional check name substring
- optional direct report URL
First step: collect check status
For a PR:
gh pr view "$PR" --json title,body,url
gh pr checks "$PR"
REPO=$(gh repo view --json nameWithOwner --jq .nameWithOwner)
SHA=$(gh pr view "$PR" --json commits --jq '.commits[-1].oid')
gh api "repos/$REPO/commits/$SHA/status"
Use the commit status payload to find:
- failing or pending contexts
target_url links for uploaded HTML reports, raw logs, or performance artifacts
Proton report layout
CI uploads reports under:
<pr-number>/<commit-sha>/<normalized-check-name>...
The normalization logic is defined in:
Normalize a check name with lowercase and replacements for spaces, (, ), and ,.
Failure triage workflow
- List failing contexts from
gh pr checks or commit statuses.
- Open each
target_url report first.
- If the report is sparse, inspect the linked raw log.
- For test reports, summarize:
- failing test names
- first common error signature
- whether the failure looks deterministic, flaky, infra, or environment-specific
- Map failures back to touched areas in the diff.
Performance comparison workflow
Performance comparison artifacts upload:
report.html
all-queries.html
all-query-metrics.tsv
queries.rep
- images/flamegraphs
If you have a report.html URL, inspect sibling artifacts by replacing the filename in the same prefix.
When reviewing perf results:
- start from the summary in
report.html
- inspect
all-query-metrics.tsv for the biggest client_time regressions
- distinguish broad regressions from a few outlier queries
- correlate with touched execution paths, joins, windows, aggregations, or storage reads
Output expectations
Always report:
- failing checks
- best report/log URL for each failing check
- likely failure class: code bug, flaky test, infra, dependency, or timeout/resource limit
- smallest next debugging action
For performance changes also report:
- whether the regression is broad or narrow
- the most affected workload family
- whether more local benchmarking is needed before code changes