| name | build-health |
| description | Analyze VS Code rolling build health on Azure DevOps. Use when: the rolling build is red right now, you need a report for the last 100 builds, you need to identify the commit range that broke the build, you are on build champ duty, or you need a build health report. |
Build Health
Quickly diagnose the VS Code rolling build (Pipeline 111) on Azure DevOps. This skill has two modes:
- Fix the build if it is red right now by finding where the red streak started and identifying the compare range from last green to first red.
- Analyze and present the last 100 builds in a predictable report file.
The report file is the primary artifact. Generate it first, summarize what it shows, and only then ask whether the user wants heuristic culprit analysis.
When to Use
- The rolling build on
main is red and you need to find where it broke
- You need a stable report for the last 100 builds
- You're the build champ and want a quick health overview
- You need to separate recurring infra failures from likely code regressions
- You want to trace a build break back to a narrow compare range
Output Contract
Always produce a markdown report file before presenting conclusions.
- Default output directory:
/tmp/build-health
- Default report path:
/tmp/build-health/build-health-report.md
- Report format: markdown generated by
analyze-builds.mjs --format markdown --report ...
The report must contain these sections:
Current Status
Build Table — sorted newest → oldest
Incidents — sorted newest → oldest, with Incident #1 being the most recent
Top Failure Reasons
Suggested Next Step
Use the chat reply to summarize the report, not to replace it.
Prerequisites
- Azure CLI (
az) installed and authenticated (az login)
- Node.js on PATH
- Network access to
dev.azure.com (the fetch script calls Azure DevOps REST APIs)
Shared Setup
The scripts live inside this skill directory at <skill-dir>/scripts/. Always invoke them by absolute path. Derive <skill-dir> from the absolute path of this SKILL.md.
Use these defaults unless the user asks for something else:
OUT_DIR=/tmp/build-health
REPORT_FILE="$OUT_DIR/build-health-report.md"
1. Fetch Build Data
Run the fetch script from this skill directory. It downloads builds, timelines for failed builds, and log tails for failing test/compile tasks — all in parallel batches.
bash <skill-dir>/scripts/fetch-builds.sh --count 100 --out "$OUT_DIR"
Options:
--count N — Number of recent builds to fetch (default: 100)
--out DIR — Output directory (default: ./build-data)
--pipeline ID — Pipeline definition ID (default: 111)
--branch NAME — Branch to filter (default: main)
Run this in a terminal with mode=sync and a generous timeout (e.g. 300000ms). The script needs network access, so request unsandboxed execution if sandboxing is enabled.
2. Analyze the Data
Once the data is downloaded, always generate the markdown report file first:
node <skill-dir>/scripts/analyze-builds.mjs "$OUT_DIR" --format markdown --report "$REPORT_FILE"
This runs entirely offline against the downloaded data and produces a predictable artifact that the user can open and consume directly.
If you need a terminal-friendly version for yourself while working, optionally run:
node <skill-dir>/scripts/analyze-builds.mjs "$OUT_DIR" --format text
The markdown report includes:
- Per-build status — Each build with pass/fail, failure reasons, error excerpts, and commit links
- Break/fix transitions — When the build went red, when it recovered, how long each incident lasted
- Error details — Actual error messages from test logs (not just "exited with code 1")
- Commit links — GitHub compare URLs between the last green and first red build
- Summary — Overall success rate, top failure reasons, current build status
Workflow 1: Build Is Red Right Now
- Fetch the last 100 builds and generate the markdown report file.
- Read the report first. Do not guess the culprit yet.
- Summarize these points in chat:
- Whether the latest build is still red
- Which build was the first red build in the current incident
- The dominant failure pattern from the incident table
- The compare range from last green to first red, if available
- The report path
- If the oldest build in the current report is already red, fetch a larger window before attempting commit-range analysis.
- Only after the summary, ask the user whether they want culprit analysis across the compare range.
Use language like:
I generated /tmp/build-health/build-health-report.md. The current red incident starts at build X, the dominant failure pattern is Y, and the compare range is Z. Do you want me to continue with heuristic culprit analysis across that compare range?
If the user says yes to culprit analysis
Treat culprit analysis as reasoned triage, not as fact.
- Start from the compare range in the report.
- Cross-check the first error and dominant failure pattern.
- Distinguish likely code regressions from likely infra failures.
- If the evidence points to code, rank the most likely suspect commits and explain why each one matches the failure pattern.
- If the evidence points to infra, say that clearly and avoid inventing a culprit commit.
Keep the output explicit that this is heuristic reasoning.
Workflow 2: Analyze The Last 100 Builds
- Fetch the last 100 builds and generate the markdown report file.
- Use the report to summarize:
- Current build status
- Total incidents and ongoing incidents
- Top recurring failure reasons
- Long or noisy incidents
- Whether failures look like infra churn or specific product regressions
- Point the user to the report path and call out the most useful sections.
In this mode, do not jump into culprit analysis unless the user asks for it.
4. Common Failure Patterns
| Pattern | Typical Error | Action |
|---|
Electron Tests failing on one platform | Test assertion or timeout | Check if the failing test was touched in recent commits |
Electron Tests failing on ALL platforms | Could not fetch releases from update server | Update server issue — usually self-resolves |
Linux Alpine (ARM64) | Install dependencies timeout | Agent pool saturation — wait or escalate to infra |
Remote Tests timing out | The task has timed out after Data Loss tests | Remote test infra issue |
Copilot sanity tests | AssertionError: ok(provider) | Copilot extension registration issue — check recent copilot extension changes |
Publish Build | Retry failures | Artifact upload infra issue |
Notes
- The fetch script is incremental: re-running it skips already-downloaded timelines and logs
- Timelines are only downloaded for failed/partial builds (not green ones) to save time
- Log files contain the last 100 lines of the failing task — usually sufficient to see the error
- The analysis script groups failures by job name, so you can quickly see if one job is responsible for many incidents
- If the report window starts in the middle of an incident, expand the fetch range before doing commit-range analysis
- The markdown report is the stable handoff artifact; the chat summary should stay short and decision-oriented