| name | bisect |
| description | Use this skill to find which commit introduced a bug or regression. Uses git bisect with automated build and test. Trigger when a bug appeared recently, a query started failing, performance regressed, or the user wants to compare behavior between two commits.
|
| argument-hint | [good-commit] [bad-commit] [test-command-or-sql] |
| disable-model-invocation | true |
Commit Comparison / Bisect Tool
Find which commit introduced a bug using git bisect with automated build+test. Compare query output across commits against DuckDB CPU baseline.
Reference: See .claude/skills/_shared/build-and-query.md for shared infrastructure (build modes, query execution, result comparison, change tracking).
Workflow
Mode 1: Automated Bisect (commit range)
-
Parse arguments:
$ARGUMENTS[0] = good commit (SHA, tag, or "N commits ago" e.g., "10 commits ago")
$ARGUMENTS[1] = bad commit (default: HEAD)
$ARGUMENTS[2] = test command or SQL query
- If "N commits ago" syntax used, resolve:
git rev-parse HEAD~N
-
Pre-flight checks:
- Warn about uncommitted changes:
git status --porcelain
- If dirty, ask user to stash or commit first
- Show commit range:
git log --oneline <good>..<bad>
- Estimate bisect steps: approximately
log2(N) where N is number of commits
-
Establish CPU baseline (if test is a SQL query):
Run the query via DuckDB CPU to get the expected correct result. Save to a temp file.
-
Create bisect test script at /tmp/claude-1000/sirius_bisect_test.sh.
Ask the user which build preset to use: release (fastest), relwithdebinfo (with debug symbols), or clang-debug (full debug):
#!/bin/bash
set -e
<test_command>
For SQL queries, the script also:
- Captures GPU output
- Compares against the saved CPU baseline
- Exits 0 if match (good), 1 if mismatch (bad), 125 if build fails (skip)
-
Execute automated bisect:
git bisect start <bad> <good>
git bisect run /tmp/claude-1000/sirius_bisect_test.sh
Show progress updates during bisect (current step / total estimated steps).
-
Report the first bad commit:
- Show commit message, author, date
- Show the diff:
git show <bad-commit>
- Analyze the changes and explain what likely caused the regression
-
Pipeline-level comparison (optional):
Run tools/parse_pipeline_log.py on logs from the last good commit and first bad commit to compare per-operator row counts.
-
Cleanup:
git bisect reset
Mode 2: Manual Comparison (two specific commits)
If the user provides just two specific commits (not a range for bisect):
- Checkout commit A, build, run query, capture output + logs
- Checkout commit B, build, run query, capture output + logs
- Diff both the query outputs and the logs, highlighting:
- Result differences (wrong values, missing/extra rows)
- Code path differences (different operators used, different pipeline stages)
- Performance differences (timing, memory usage from logs)
- Return to original branch:
git checkout <original-branch>
Key Design Decisions
- Exit code 125 skips commits that don't build (common in CUDA projects where intermediate commits may break)
- Support both SQL query tests and unit test invocations
- Warn about uncommitted changes before starting bisect (bisect changes HEAD)
- CPU baseline captured once before bisect starts, reused for all steps
- For SQL queries, results are sorted before comparison to handle ordering differences
Important Notes
git bisect changes HEAD -- the user should not have uncommitted work
- Each bisect step requires a full rebuild, which can be slow for large codebases
- If many commits don't build, bisect may take longer than expected due to skips
- The user can interrupt bisect at any time with
git bisect reset