// Autonomously run quality checks (detekt, Gradle tests) to establish baseline or detect regressions. Supports three modes (capture, compare, auto) for comprehensive regression detection. Triggers before starting development work, after completing changes, when checking code quality, or when explicitly requested. Compares current metrics against previous baseline to identify improvements or regressions in code quality and test coverage.
| name | baseline-check |
| description | Autonomously run quality checks (detekt, Gradle tests) to establish baseline or detect regressions. Supports three modes (capture, compare, auto) for comprehensive regression detection. Triggers before starting development work, after completing changes, when checking code quality, or when explicitly requested. Compares current metrics against previous baseline to identify improvements or regressions in code quality and test coverage. |
| allowed-tools | Bash, Read, Write, Glob |
This skill maintains code quality throughout development by:
Invoke this skill:
The skill supports three execution modes for different scenarios:
Purpose: Establish a quality baseline
When to use:
Command:
./.claude/skills/baseline-check/baseline-check.sh --capture [--issue SPI-XXX]
What it does:
./gradlew clean check --rerun-tasks (comprehensive, cache-proof execution).claude/baseline/Output:
.claude/baseline/{branch}-{timestamp}.json.claude/baseline/{branch}-{timestamp}.log.claude/baseline/latest-{branch}.jsonPurpose: Compare current state against a specific baseline
When to use:
Command:
# Auto-detect latest baseline for current branch
./.claude/skills/baseline-check/baseline-check.sh --compare
# Compare against specific baseline file
./.claude/skills/baseline-check/baseline-check.sh --compare .claude/baseline/main-20241120-100000.json
What it does:
Output:
═══════════════════════════════════════════════════════════
BASELINE vs CURRENT COMPARISON
═══════════════════════════════════════════════════════════
BASELINE:
• Total: 861 | Passed: 861 | Failed: 0 | Skipped: 0
CURRENT:
• Total: 873 | Passed: 868 | Failed: 5 | Skipped: 0
DELTA (Current - Baseline):
• Total: +12 | Passed: +7 | Failed: +5 | Skipped: 0
───────────────────────────────────────────────────────────
INTERPRETATION
───────────────────────────────────────────────────────────
✅ 12 new tests added
❌ 5 NEW test failures introduced
📋 Analyze if failures are:
- Expected (testing edge cases, deprecated endpoints)
- Bugs requiring fixes
- Pre-existing issues incorrectly attributed
═══════════════════════════════════════════════════════════
Purpose: Capture baseline and automatically compare if previous exists
When to use:
Command:
./.claude/skills/baseline-check/baseline-check.sh --auto [--issue SPI-XXX]
What it does:
Decision logic:
# Capture baseline (default mode)
./.claude/skills/baseline-check/baseline-check.sh
./.claude/skills/baseline-check/baseline-check.sh --capture
./.claude/skills/baseline-check/baseline-check.sh --capture --issue SPI-866
# Compare against baseline
./.claude/skills/baseline-check/baseline-check.sh --compare
./.claude/skills/baseline-check/baseline-check.sh --compare .claude/baseline/main-20241120-100000.json
# Autonomous mode (capture + optional compare)
./.claude/skills/baseline-check/baseline-check.sh --auto
./.claude/skills/baseline-check/baseline-check.sh --auto --issue SPI-866
# Adjust timeout (default: 300 seconds)
./.claude/skills/baseline-check/baseline-check.sh --capture --timeout 600
# Show help
./.claude/skills/baseline-check/baseline-check.sh --help
The skill generates structured JSON for programmatic access:
{
"timestamp": "2025-11-20T14:30:22Z",
"branch": "feat/spi-866-baseline-enhancement",
"commit": "abc123d",
"issue_id": "SPI-866",
"checks": {
"detekt": {
"available": true,
"status": "passed",
"total_offenses": 5,
"offenses_by_severity": {
"convention": 2,
"warning": 3,
"error": 0
},
"execution_time": 8
},
"tests": {
"available": true,
"status": "passed",
"total": 873,
"passed": 873,
"failures": 0,
"skipped": 0,
"execution_time": 145
}
},
"overall_status": "passed",
"summary": "All quality checks passed"
}
.claude/baseline/
├── main-20241120-100000.json # Metrics JSON
├── main-20241120-100000.log # Full console log
├── main-20241120-110000.json # Later baseline
├── main-20241120-110000.log # Later log
├── latest-main.json -> main-20241120-110000.json # Symlink to latest
├── feat-spi-866-20241120-120000.json # Feature branch baseline
├── feat-spi-866-20241120-120000.log # Feature branch log
└── latest-feat-spi-866.json # Feature branch latest symlink
Key features:
--issue flag for Linear integrationThe skill uses these indicators in comparison reports:
New Tests Added (Positive):
Delta Total > 0 indicates new test coverageTests Fixed (Positive):
Delta Failures < 0 indicates previously failing tests now passRegressions (Critical):
Delta Failures > 0 indicates NEW test failures introducedTests Removed (Warning):
Delta Total < 0 may indicate deleted tests (review needed)When regressions are detected, the report provides:
# Agent invokes before starting work
./.claude/skills/baseline-check/baseline-check.sh --capture --issue SPI-866
# Output shows baseline established
# Agent proceeds with implementation
# After implementation, agent runs compare mode
Agent prompt:
Before starting SPI-866, I'll establish a quality baseline to track regressions.
Running baseline check...
✓ Baseline captured: 861/861 tests passing, 5 detekt offenses
✓ Saved to: .claude/baseline/main-20241120-140000.json
Proceeding with implementation...
# Agent invokes after completing work
./.claude/skills/baseline-check/baseline-check.sh --compare
# Output shows delta analysis
# Agent reports findings to user
Agent prompt:
Implementation complete. Running regression check against baseline...
═══════════════════════════════════════════════════════════
BASELINE vs CURRENT COMPARISON
═══════════════════════════════════════════════════════════
BASELINE:
• Total: 861 | Passed: 861 | Failed: 0 | Skipped: 0
CURRENT:
• Total: 873 | Passed: 873 | Failed: 0 | Skipped: 0
DELTA (Current - Baseline):
• Total: +12 | Passed: +12 | Failed: 0 | Skipped: 0
───────────────────────────────────────────────────────────
INTERPRETATION
───────────────────────────────────────────────────────────
✅ 12 new tests added
═══════════════════════════════════════════════════════════
✓ No regressions detected
✓ Quality improved: 12 new tests for new functionality
✓ Ready for code review
# Agent uses auto mode for continuous monitoring
./.claude/skills/baseline-check/baseline-check.sh --auto
# First run: only captures
# Subsequent runs: captures + compares
Agent prompt (first run):
Running autonomous quality check...
✓ Baseline captured: 861/861 tests passing
ℹ️ No previous baseline for comparison (first run on this branch)
Baseline established for future comparisons.
Agent prompt (subsequent run):
Running autonomous quality check...
✓ Current: 873/873 tests passing
✓ Previous baseline detected, performing comparison...
✅ 12 new tests added
✅ No regressions detected
Quality maintained!
# After changes, comparison detects regressions
./.claude/skills/baseline-check/baseline-check.sh --compare
Agent prompt:
Running regression check...
═══════════════════════════════════════════════════════════
BASELINE vs CURRENT COMPARISON
═══════════════════════════════════════════════════════════
BASELINE:
• Total: 873 | Passed: 873 | Failed: 0 | Skipped: 0
CURRENT:
• Total: 885 | Passed: 880 | Failed: 5 | Skipped: 0
DELTA (Current - Baseline):
• Total: +12 | Passed: +7 | Failed: +5 | Skipped: 0
───────────────────────────────────────────────────────────
INTERPRETATION
───────────────────────────────────────────────────────────
✅ 12 new tests added
❌ 5 NEW test failures introduced
📋 Analyze if failures are:
- Expected (testing edge cases, deprecated endpoints)
- Bugs requiring fixes
- Pre-existing issues incorrectly attributed
═══════════════════════════════════════════════════════════
⚠️ REGRESSIONS DETECTED
I've detected 5 new test failures. Let me analyze the failure logs...
[Agent reviews .claude/baseline/*.log file]
The failures are in SessionManagerTest. These appear to be legitimate bugs
introduced by the recent changes. I recommend:
1. Review the failing tests in the log file
2. Fix the SessionManager implementation
3. Re-run baseline check to verify fixes
Should I help investigate the failures?
baseline-check plays a critical role in TDD workflow:
During GREEN Phase:
During REFACTOR Phase:
Quality Gate:
--issue SPI-XXXlatest-{branch}.json./gradlew clean check --rerun-tasks--rerun-tasks forces fresh execution, no FROM-CACHE false positives--timeout.claude/baseline/ (should be in .gitignore)The skill succeeds when it:
The skill fails when:
Cause: Running --compare without a previous baseline
Solution: Run --capture first, or use --auto mode
Cause: timeout command not available on macOS
Solution: Install coreutils (brew install coreutils) or script falls back to no timeout
Cause: Gradle output format changed or tests didn't run
Solution: Script automatically falls back to XML parsing
Cause: Intentional breaking changes or deprecated endpoint tests
Solution: Document expectations in PR, consider test-specific skips if appropriate