with one click
test-confidence
// AI-driven test execution. Opus decides what to run and how confident to be, based on your diff.
// AI-driven test execution. Opus decides what to run and how confident to be, based on your diff.
Stage and commit changes with a clear, concise commit message.
Send one-off email blasts to Gumroad creators directly via production console, no PR or deploy needed. Handles drafting, recipient query, count validation via Metabase, dry-run preview, and execution. Use when asked to "email creators", "send an announcement", "blast sellers", "notify users", or any request to send emails to Gumroad creators at scale.
| name | test-confidence |
| description | AI-driven test execution. Opus decides what to run and how confident to be, based on your diff. |
| argument-hint | --full to run to 100% | --strict to halt on pre-existing failures |
| allowed-tools | Bash(git *), Bash(bundle exec rspec *), Bash(cat *), Bash(find *), Bash(wc *), Bash(head *), Bash(tail *), Bash(grep *), Bash(bin/test-confidence *) |
Run bin/test-confidence to have Opus 4.7 analyze your diff, decide the risk level, plan which tests to run and in what order, and set confidence milestones. The AI decides the shape of the curve based on this specific diff.
bin/test-confidence # Run to 99%, stop. Skips past pre-existing failures.
bin/test-confidence --full # Run to 100%
bin/test-confidence --strict # Halt on any failure, including pre-existing
ANTHROPIC_API_KEY is auto-sourced from .env if not exported.
If $ARGUMENTS is provided, pass it through: bin/test-confidence $ARGUMENTS
tmp/test-confidence/ for a cached plan; reuses if found--full, continues running remaining tests toward 100%When a spec fails, the script applies a two-step check:
app/models/user.rb for spec/models/user_spec.rb) appear in the diff?
main.
The verify step is skipped (heuristic verdict trusted) when the diff includes a db/migrate/*.rb (DB schema drift) or Gemfile.lock (bundler drift), or when no merge-base is available. --strict skips both checks and halts on every failure.
Cost profile: zero overhead in the common no-failure case; verify only runs on heuristic-flagged "pre-existing" failures. Catches the dangerous direction (silent regression skipped) while keeping false-alarm investigation costs bounded.
The key insight: Opus decides ad hoc how many tests are needed for each confidence level. A comment-only change might need 2 tests for 99%. A payment model refactor might need 100.
Run this before every commit. It replaces manually picking which specs to run.
$ARGUMENTS