| name | ship |
| description | Run the full ship flow — verify quality, ensure test coverage, update artifacts, smoke test, push, create PR, and merge when CI is green. Trigger when user says "ship", "ship it", "fix and ship", or asks to push and merge a branch. |
| user_invocable | true |
| metadata | {"internal":true} |
Run the full ship flow: verify quality, ensure test coverage, update artifacts, smoke test, then push, create PR, and merge when CI is green.
This skill implements the complete "Shipping" definition and Pre-PR Checklist from AGENTS.md. When the user says "ship" or "fix and ship", execute ALL phases below — not just the push/merge steps.
Arguments
$ARGUMENTS - Optional: description of what is being shipped (used for PR title/body context and to scope the quality checks)
Instructions
Phase 1: Pre-flight
- Confirm we're NOT on
main or master
- If
HEAD is detached and the current task has local changes ready to ship, create a branch first instead of stopping
- Confirm whether uncommitted changes belong to the task being shipped
- If the worktree is dirty only because of the current task, keep going: validate, commit, and ship those changes
- If unrelated uncommitted changes exist, stop and tell the user
Phase 2: Test Coverage
Review the changes on this branch (use git diff origin/main...HEAD and git log origin/main..HEAD) and ensure comprehensive test coverage:
- Identify all changed code paths — every new/modified function, module, builtin, tool
- Verify existing tests cover the changes — run
cargo test --all-features and check for failures
- Write missing tests for any uncovered code paths:
- Positive tests: happy path, valid inputs, expected state transitions
- Negative tests: invalid inputs, error conditions, boundary cases, permission failures, missing resources
- Security tests: if change touches parser, interpreter, VFS, network, git, or user input — add tests per
specs/005-security-testing.md
- Compatibility tests: if change affects Bash behavior parity — add differential tests comparing against real Bash
- Run all tests to confirm green:
just test
- If any test fails, fix the code or test until green
Phase 3: Artifact Updates
Review the changes and update project artifacts where applicable. Skip items that aren't affected.
- Specs (
specs/): if the change adds/modifies behavior covered by a spec, update the relevant spec file to stay in sync
- Threat model (
specs/006-threat-model.md): if the change introduces new attack surfaces, external inputs, authentication/authorization changes, or data handling — add or update threat entries using the TM-<CATEGORY>-<NNN> format and add // THREAT[TM-XXX-NNN] code comments at mitigation points
- AGENTS.md: if the change adds new specs, commands, or modifies development workflows — update the relevant section
- Implementation status (
specs/009-implementation-status.md): if feature status changed, update the status table
- Documentation (
crates/bashkit/docs/): if the change affects public APIs, tools, or features — update the relevant guide markdown files
Phase 3b: Code Simplification
Review all changed code for opportunities to simplify:
- Identify duplication — look for repeated patterns that could share a helper or be consolidated
- Reduce complexity — simplify nested logic, long match arms, deeply indented blocks
- Remove dead code — unused functions, unreachable branches, commented-out code
- Check naming — ensure functions, variables, and types have clear, descriptive names
- Verify no over-engineering — remove unnecessary abstractions, feature flags, or indirection that don't serve the current change
If simplification changes are made, loop back to Phase 2 to verify tests still pass.
Phase 3c: Security Review
Analyze all changed code for security vulnerabilities:
- Input validation — check that user-supplied data (script input, file paths, environment variables, command arguments) is validated before use
- Injection risks — look for command injection, path traversal, environment variable injection, or shell metacharacter issues
- Sandbox escapes — if changes touch VFS, builtins, or process execution, verify they cannot escape the sandbox (see
specs/006-threat-model.md)
- Resource exhaustion — check for unbounded loops, unbounded allocations, or missing limits on user-controlled sizes
- Error handling — ensure errors don't leak internal state, file paths, or sensitive information
- Unsafe code — review any
unsafe blocks for soundness; prefer safe alternatives
If security issues are found, fix them, add regression tests, and update specs/006-threat-model.md if a new threat category is identified.
Phase 3d: Design Quality Review
Review all changed code for shortcuts, lazy abstractions, and premature compromises. This is a greenfield project — correctness and clean design matter more than compatibility or speed of delivery. Take the time to find better abstractions.
- No shortcut abstractions — reject copy-paste patterns disguised as "good enough". If two things look similar, determine whether they are actually the same concept. If yes, unify properly. If no, keep them separate with clear names — don't force a bad shared interface.
- No lazy wrappers — every abstraction must earn its place. A wrapper that just forwards calls adds indirection without value. An enum variant that exists "just in case" is dead weight. If a layer doesn't add meaning, remove it.
- Right abstraction level — check that traits, types, and module boundaries model the actual domain, not implementation accidents. A
StringOrList enum is a parser leak; a Pattern type is a domain concept. Prefer the latter.
- No stringly-typed interfaces — look for magic strings, string matching on variant names, ad-hoc parsing of structured data. Replace with enums, newtypes, or proper typed APIs.
- No premature generics — a function generic over three trait bounds used in one call site is harder to read than a concrete function. Generalize only when there are (or will immediately be) multiple real callers.
- No compatibility shims — this is greenfield. If an interface is wrong, change it. Don't add adapters, conversion layers, or deprecated alternatives. Fix call sites instead.
- Error types are first-class — check that error enums are specific and actionable, not catch-all
Other(String) buckets. Each variant should guide the caller's recovery logic.
- Module boundaries enforce invariants — if a
pub field or function lets outside code break a module's assumptions, tighten visibility. Constructors and accessors exist to protect invariants, not to be "nice".
If design issues are found, refactor, update tests (loop back to Phase 2), and update specs if the change alters documented behavior.
Phase 4: Smoke Testing
Smoke test impacted functionality to verify it works end-to-end:
- CLI changes: run
just run with relevant commands, verify output
- Builtin/interpreter changes: run example scripts via
just run-script <file> to verify behavior
- Tool changes: if LLM tool interface changed, run a quick tool invocation test
- Python bindings: if Python code changed, run
ruff check crates/bashkit-python && ruff format --check crates/bashkit-python
If smoke testing reveals issues, fix them and loop back to Phase 2 (tests must still pass).
Phase 5: Quality Gates
git fetch origin main && git rebase origin/main
- If rebase fails with conflicts, abort and tell the user to resolve manually
just pre-pr
- If it fails, run
just fmt to auto-fix, then retry once
- If still failing, stop and report
Phase 6: Push and PR
git push -u origin <current-branch>
Check for existing PR:
gh pr view --json url 2>/dev/null
If no PR exists, create one:
- Title: conventional commit style from the branch commits
- Body: summary of What, Why, How, and what tests were added/verified
- Use
gh pr create
If a PR already exists, update it if needed and report its URL.
Phase 7: Wait for CI and Merge
- Check CI status with
gh pr checks (poll every 30s, up to 15 minutes)
- If CI is green, merge with
gh pr merge --squash --auto
- If CI fails, report the failing checks and stop
- NEVER merge when CI is red
Phase 8: Post-merge
After successful merge:
- Report the merged PR URL
- Done
Rules
- Phases 2-4 (tests, artifacts, simplification, security review, smoke testing) are the quality core — do NOT skip them.
- The
$ARGUMENTS context helps scope which tests, specs, and smoke tests are relevant.
- For "fix and ship" requests: implement the fix first, then run
/ship to validate and merge.
- Never close a half-done issue. If the PR only covers a subset of the issue's tasks/checkboxes, use
Part of #N instead of Closes #N or Fixes #N. Only use closing keywords when every task in the issue is complete. Premature closure hides remaining work.