| name | analyze-ci-failures |
| description | Analyze CircleCI/CI shell test failures for CUBRID PRs. Reads failed TC list, fetches CI results, reads test scripts and answer files, categorizes failures by root cause, and generates a structured report. Use when CI tests fail and the user wants to understand why. |
CUBRID CI Test Failure Analyzer
Analyze failed shell test cases from CircleCI (or other CI) for CUBRID PRs. Produces a categorized report with root cause analysis and fix proposals.
When to Use
- User says "analyze ci failures", "CI 실패 분석", "왜 TC 실패했어", "failed tc 분석"
- User shares a CircleCI URL with test failures
- User has a
failed_tc.txt or similar list of failed test cases
- User wants to understand why shell tests failed on a PR
Arguments
/analyze-ci-failures — Interactive: look for failed_tc.txt in cwd
/analyze-ci-failures <circleci-url> — Fetch failures from CircleCI
/analyze-ci-failures <file> — Read failure list from specified file
Inputs
The skill needs:
- Failed TC list: A file listing failed test case paths (e.g.,
failed_tc.txt), or a CircleCI URL to fetch from
- Test case directory: A directory containing the actual test scripts and answer files (e.g.,
~/cubrid-testcases-private-ex)
- Feature context: The branch/PR being tested (to understand what changes might cause failures)
Execution Steps
Step 1: Gather Inputs
- Locate the failed TC list:
- Check arguments for a file path or CircleCI URL
- Check cwd for
failed_tc.txt
- Ask user if not found
- Identify the test case base directory:
- Check if
~/cubrid-testcases-private-ex exists
- Check additional working directories
- Ask user if not found
- Identify the feature branch context:
git branch --show-current
git log --oneline HEAD --not develop | head -20 to understand feature changes
Step 2: Fetch CI Failure Details
If a CircleCI URL is provided, use an agent to fetch the actual test failure messages (diffs, error outputs). This provides the actual vs expected output, which is critical for root cause analysis.
Step 3: Read All Failed Test Cases (Parallel)
For each failed TC:
- Read the test script (
.sh file in cases/ directory)
- Read the answer file (
.answer file — expected output)
- Read supporting SQL files (
.sql files used by the test)
- Note what the test does: data types involved, operations tested (CRUD, unload/load, copydb, diagdb, etc.)
Use parallel reads — launch all file reads at once since they're independent.
Step 4: Analyze Feature Changes
Understand what the feature branch changes that could cause failures:
- Read key modified source files (use
git diff develop...HEAD --stat)
- Identify behavioral changes:
- New file types or storage mechanisms
- Changed output formats (diagdb, show heap header, etc.)
- Disabled or stubbed functions
- New error codes or changed error messages
- Use explore agents in parallel for deeper code analysis if needed
Step 5: Categorize and Analyze
For each failed TC, determine:
- Is it related to the feature? — Match test operations against feature changes
- Root cause hypothesis — Why the test fails given the feature changes
- Category — Group TCs by shared root cause
Common categories:
- Output format mismatch (answer file needs update)
- Disabled/stubbed functionality
- New storage path not handled by existing tool (unloaddb, copydb, etc.)
- Error code changes
- Timeout / CI flakiness
- Unrelated regression
Step 6: Generate Report
Write a structured markdown report with:
# Failed TC Analysis Report: <branch> (<PR link>)
## Background
(Feature description, key behavioral changes)
## Category N: <Root Cause> (X TCs) — <OOS-related? / Feature-related?>
| # | TC | What it tests | Failure | Related? |
|---|-----|--------------|---------|----------|
| ... | ... | ... | ... | ... |
**Root cause analysis**: ...
**Proposed fix**: ...
## Summary
| Category | Count | Related? | Root Cause |
|----------|-------|----------|------------|
| ... | ... | ... | ... |
## Priority Actions
1. P0: ...
2. P1: ...
Step 7: Save and Present
- Save report to
failed_tc_report.md in the project root (or user-specified location)
- Print a concise summary with counts: X related, Y unrelated, Z total
Output Conventions
Language
- Section headers (
##): English
- Table content and analysis: English (technical report for broad audience)
- Code, paths, function names: Keep as-is
Style
- Tables for TC listings within each category
- Code blocks for call flow diagrams showing broken vs expected paths
- Bold for key findings and root causes
- Backticks for all code references
- Horizontal rules between major sections
Tips
- When in doubt about source code behavior, use LSP (clangd) to analyze CUBRID C/C++ code. Use
lsp_hover to check types, lsp_goto_definition to trace function implementations, lsp_find_references to understand call sites, and lsp_diagnostics to catch issues. This is especially useful when tracing how a changed function affects downstream callers.
- When fetching CircleCI results, use API v1.1 (not v2) since v2 requires authentication. Example:
https://circleci.com/api/v1.1/project/github/CUBRID/cubrid/<build_num>/tests works without credentials.
- Always read the actual test script AND answer file — the script tells you what operations are tested, the answer tells you what output is expected
- Look for data types that exceed storage thresholds (e.g.,
varchar(20000), large JSON, CLOB/BLOB)
- Check for
diagdb, show heap header, cubrid spacedb in test scripts — these are sensitive to storage format changes
- Check for
unloaddb/loaddb/copydb — these require full record resolution
- TCs with no diff details from CI may need local reproduction to diagnose
- Group by root cause, not by symptom — multiple TCs often share a single underlying issue
Mandatory: Iterate with Grill-and-Revise
Every CI failure report must go through /grill-and-revise before being shared. Do not deliver a single-pass triage. Single-pass triage drifts toward weak root-cause hypotheses, mis-categorized TCs, and unsupported "Related?" calls. CI reports often drive merge or release decisions where mis-attribution is expensive.
This step is required, not optional. It applies to every report. No agent-side judgment — including size, scope, perceived triviality, or perceived risk — is a valid skip criterion. The only legitimate skip is when the user, in the message that triggered this skill, explicitly says "skip grill" or "don't grill this" (or unambiguous equivalent: "no grill", "skip the grill loop", "just push it"). If in doubt, do the grill loop.
How to hand off:
After saving the initial report to failed_tc_report.md, invoke /grill-and-revise with:
- Topic & purpose: CI failure analysis for
<branch> / <PR link>, audience is the PR author, QA, and CUBRID maintainers
- Output path: the same report file (the loop revises in place)
- Source material: the failed TC list, CircleCI output, the actual test scripts and answer files,
git diff develop...HEAD summary, key modified source files
- Review angle: every "Related?" call is justified by a concrete behavioral change, root-cause hypotheses are testable (not hand-wavy), proposed fixes are concrete (file/function-level), categorization groups by underlying cause not symptom, Priority Actions are actionable
- Round cap: default 5