name	microshift-ci:doctor
argument-hint	<release1,release2,...>
description	Analyze CI for multiple MicroShift releases and produce an HTML summary
user-invocable	true
allowed-tools	Skill, Bash, Read, Write, Glob, Grep, Agent

microshift-ci:doctor

Synopsis

/microshift-ci:doctor <release1,release2,...>

Description

Accepts a comma-separated list of MicroShift release versions, runs analysis for each release and for open rebase PRs, and produces a single HTML summary file consolidating all results. Uses deterministic scripts for data collection, artifact download, aggregation, and HTML generation. LLM agents are used only for per-job root cause analysis and Jira bug correlation.

Arguments

<ARGUMENTS> (required): Comma-separated list of release versions (e.g., 4.19,4.20,4.21,4.22)

Work Directory

Compute once at the start by running date +%y%m%d and substituting into the path below. In all commands, replace <WORKDIR> with the computed path — do not use shell variables.

/tmp/microshift-ci-claude-workdir.<YYMMDD>

Implementation Steps

Step 1: Prepare — Collect and Download All Artifacts

Goal: Deterministically collect all failed jobs and download their artifacts before any LLM analysis.

Actions:

Determine today's <WORKDIR> by running date +%y%m%d and substituting into /tmp/microshift-ci-claude-workdir.<YYMMDD>. Use this value in all subsequent commands.

Run the prepare script:

bash plugins/microshift-ci/scripts/doctor.sh prepare --component microshift --workdir <WORKDIR> <ARGUMENTS> --rebase

The script deterministically:
- For each release: fetches failed periodic jobs, downloads artifacts, writes <WORKDIR>/analyze-ci-release-<version>-jobs.json
- For rebase PRs: fetches PRs with failures, downloads artifacts, writes <WORKDIR>/analyze-ci-prs-jobs.json and <WORKDIR>/analyze-ci-prs-status.json
- Outputs a JSON summary listing all releases, job counts, and file paths
Read the JSON output to know which releases have jobs to analyze and how many

Job JSON field names (use these exactly — do NOT guess alternatives like job_name):

job — full job name
build_id — unique build identifier
artifacts_dir — local path to downloaded artifacts
url — Prow job URL
status — job result (failure, FAILURE, SUCCESS, PENDING)
pr_number — PR number (PR jobs only)

Error Handling:

If <ARGUMENTS> is empty, show usage and stop
If a release has no failed jobs, its jobs JSON will be an empty array — skip analysis for that release
If a release has an "error" field in the JSON summary, data collection failed for that release — report the error to the user but continue with other releases

Step 1b: Generate PCP Performance Graphs

Goal: Generate performance graphs from PCP archives for all jobs that have pmlogs.

Actions:

Run the graphs script (this is deterministic, no LLM needed):

bash plugins/microshift-ci/scripts/doctor.sh graphs --component microshift --workdir <WORKDIR>

The script finds PCP archives in downloaded artifacts and generates PNG graphs at <WORKDIR>/graphs/<build_id>/:
- 1_cpu_usage.png — CPU usage (user, system, I/O wait)
- 2_mem_usage.png — Memory usage (used, cached)
- 3_disk_io.png — Disk I/O (read/write OPS, await)
- 4_disk_usage.png — Disk usage by partition (% fill)
If prerequisites are missing (pcp2json, matplotlib), the script errors and stops.

Step 2: Analyze Each Job Using /microshift-ci:prow-job

Goal: Get detailed root cause analysis for each failed job using pre-downloaded artifacts.

Actions:

Use the JSON summary output from Step 1 to build agent prompts. Do NOT read the job JSON files into the main conversation — the prepare script already printed all job details (artifacts_dir, build_id, job name) and agents receive artifacts_dir directly in their prompt.

For every failed job across all releases and PRs, launch a separate Agent (using the Agent tool, NOT the Skill tool). For PR jobs, only launch agents for jobs with FAILURE status.

For release jobs:

Agent: subagent_type=general_purpose, prompt="Analyze this Prow job and save the report:
1. Run /microshift-ci:prow-job <ARTIFACTS_DIR>
2. After the analysis completes, save the FULL report output (including the --- STRUCTURED SUMMARY --- block) to:
   <WORKDIR>/analyze-ci-release-<RELEASE>-job-<N>-<JOB_ID>.txt
   Use the Write tool to save the file. The file must contain the complete analysis report."

For PR jobs:

Agent: subagent_type=general_purpose, prompt="Analyze this Prow job and save the report:
1. Run /microshift-ci:prow-job <ARTIFACTS_DIR>
2. After the analysis completes, save the FULL report output (including the --- STRUCTURED SUMMARY --- block) to:
   <WORKDIR>/analyze-ci-prs-job-<N>-pr<PR>-<JOB_NAME_SUFFIX>.txt
   Use the Write tool to save the file. The file must contain the complete analysis report."

Launch ALL agents (all releases + PRs) in a single message as foreground agents (do NOT use run_in_background). Foreground agents in the same message run concurrently — this is just as fast as background agents but keeps your turn active until all complete.
Say "Analyzing N jobs in parallel..." in your message text alongside the Agent tool calls.
When all agents return, immediately proceed to Step 3 in the same turn. Do NOT stop or end your turn between Step 2 and Step 3.

Step 3: Run Bug Correlation (Dry-Run)

Goal: Search Jira for existing bugs matching each failure. Results are embedded in the HTML report.

Actions:

For each release version, launch microshift-ci:create-bugs in dry-run mode as an Agent:

Agent: subagent_type=general_purpose, prompt="Run /microshift-ci:create-bugs <version>"

If rebase PR analysis produced job files, also launch microshift-ci:create-bugs for rebase PRs (check the PR jobs JSON to identify rebase PR source identifiers like rebase-release-4.22):
```
Agent: subagent_type=general_purpose, prompt="Run /microshift-ci:create-bugs rebase-release-<version>"
```
Launch all create-bugs agents in a single message as foreground agents (do NOT use run_in_background). They run concurrently.
Each agent produces <WORKDIR>/analyze-ci-bugs-<source>.json (including open bugs data for the Bugs tab)
When all agents return, immediately proceed to Step 4 in the same turn. Do NOT stop or end your turn between Step 3 and Step 4.

Error Handling:

If create-bugs fails for a release, note the failure but do not block other releases or HTML generation

Step 4: Finalize — Aggregate and Generate HTML Report

IMPORTANT: This step is MANDATORY. The task is incomplete without it. You MUST run this even if previous steps produced errors.

Goal: Deterministically aggregate results and generate the HTML report.

Actions:

Run the finalize script:

bash plugins/microshift-ci/scripts/doctor.sh finalize --component microshift --workdir <WORKDIR> <ARGUMENTS>

The script deterministically:
- Runs aggregate.py for each release and for PRs → summary.json files
- Runs create-report.py → microshift-ci-doctor-report.html
Report the script's output to the user

Step 5: Report Completion

Actions:

Display the path to the generated HTML file
Summarize: failed job counts per release, rebase PR status, bug correlation results

Example Output:

Summary:
  Periodics:
    Release 4.19: 3 failed periodic jobs
    Release 4.20: ERROR - data collection failed
    Release 4.21: 0 failed periodic jobs
    Release 4.22: 12 failed periodic jobs
  Pull Requests:
    2 rebase PRs with 5 total failed jobs

HTML report generated: <WORKDIR>/microshift-ci-doctor-report.html

Examples

Example 1: Analyze Multiple Releases

/microshift-ci:doctor 4.19,4.20,4.21,4.22

Example 2: Analyze Two Releases

/microshift-ci:doctor 4.21,4.22

Example 3: Single Release (still produces HTML)

/microshift-ci:doctor 4.22

Prerequisites

gsutil CLI must be installed for GCS access (uses anonymous access on public buckets)
gh CLI must be authenticated with access to openshift/microshift
MCP Jira server must be configured (for bug correlation)
Internet access to fetch job data from Prow/GCS
Bash shell, Python 3
pcp-export-pcp2json — for PCP graph generation
matplotlib Python package — for PCP graph plotting

Related Skills

microshift-ci:prow-job: Single job analysis (used by Step 2 agents)
microshift-ci:create-bugs: Bug correlation and creation (used in Step 3; can also be run with --create after this command)
microshift-ci:doctor-refresh: Regenerate the HTML report from existing data (e.g., after /microshift-ci:create-bugs --create)

Notes

Deterministic scripts handle: data collection, artifact download, aggregation, HTML generation
LLM agents handle: per-job root cause analysis (Step 2), Jira bug search and open bugs query (Step 3)
/microshift-ci:doctor-refresh regenerates the HTML report from existing data. Use it after /microshift-ci:create-bugs --create to include newly created bugs
All agents (all releases + PRs) are launched in a single parallel wave — no per-release agents
The prepare script downloads all artifacts upfront so prow-job agents use local paths (no redundant downloads)
The finalize script runs aggregation and HTML generation in one call
All intermediate files use prescribed filenames in <WORKDIR> — no improvised names
The HTML report is self-contained (no external CSS/JS dependencies)
If a release analysis fails, it is noted in the report but does not block other releases
If no rebase PRs are open, the Pull Requests tab shows "No open rebase pull requests found"

microshift-ci-doctor

المزيد من هذا المستودع

المزيد من هذا المستودع