| name | check-python-script |
| description | Audits a standalone Python 3 script against 25 deterministic checks (shebang, `__main__` guard, argparse shape, declared dependencies, ruff-backed AST lints, line count, secret patterns) plus nine judgment dimensions (output discipline, input validation, dependency posture, performance intent, naming, function design, module-scope discipline, literal intent, commenting intent) and a Tier-3 cross-script collision check. Use when the user wants to "audit a python script", "lint a python script", or "review this script". Not for general-purpose shell scripts — route to `/build:check-bash-script`.
|
| allowed-tools | Read, Write, Edit, Bash, Grep, Glob |
| argument-hint | [path] |
| user-invocable | true |
| references | ["../../_shared/references/python-script-best-practices.md","references/check-collision.md","references/check-commenting-intent.md","references/check-dependency-posture.md","references/check-function-design.md","references/check-input-validation.md","references/check-literal-intent.md","references/check-module-scope-discipline.md","references/check-naming.md","references/check-output-discipline.md","references/check-performance-intent.md"] |
| license | MIT |
Check Python Script
Audit a standalone Python 3 script for structural soundness, dependency posture, ruff-backed lint cleanliness, and adherence to the project's Python conventions. The rubric — what makes a Python script load-bearing, the anatomy template, the patterns that work — lives in python-script-best-practices.md.
This skill follows the check-skill pattern. Tier-1 detection is in 6 base scripts plus 2 profile-specific scripts emitting JSON envelopes via _common.py. Tier-2 has 9 base judgment dimensions plus up to 3 profile-specific dimensions read inline by the primary agent. Tier-3 is collision (cross-script duplication).
Profiles. This skill audits three python-script shapes: cli (default), library, skill-helper. The applicable Tier-1 scripts and Tier-2 dimensions vary per profile. See python-script-profiles.md for the complete applicability matrix.
When to use
Also fires when the user phrases the request as:
- "check my python script"
- "is this script safe"
- "what's wrong with my script"
- "why is my script failing"
Workflow
1. Scope
Read $ARGUMENTS. Resolve to a .py file or directory walking top-level for Python scripts. Confirm scope aloud.
Resolve profile. If $ARGUMENTS carries --profile=<name> (one of cli, library, skill-helper), use that. Else run the detector and use its result:
PROFILE=$(python3 "${SHARED_SCRIPTS}/detect_python_profile.py" "$TARGET")
The detector is heuristic and best-effort; ambiguous cases default to cli. Print the resolved profile to the user before continuing — if it looks wrong, they can re-run with --profile=<name>.
2. Tier-1 Deterministic Checks
Per-profile invocation list (full matrix in python-script-profiles.md):
SCRIPTS="${SKILL_DIR}/scripts"
TARGETS="$ARGUMENTS"
bash "$SCRIPTS/check_secrets.sh" $TARGETS
bash "$SCRIPTS/check_deps.sh" $TARGETS
bash "$SCRIPTS/check_ruff.sh" $TARGETS
bash "$SCRIPTS/check_size.sh" $TARGETS
if [[ "$PROFILE" != "library" ]]; then
bash "$SCRIPTS/check_structure.sh" $TARGETS
bash "$SCRIPTS/check_argparse.sh" $TARGETS
fi
case "$PROFILE" in
library)
python3 "$SCRIPTS/check_library_discipline.py" $TARGETS
;;
skill-helper)
python3 "$SCRIPTS/check_skill_helper_contract.py" $TARGETS
;;
esac
Each script emits a JSON array of envelopes. recommended_changes is canonical — copy through verbatim.
Script-to-rules map (Tier-1 rule_ids per profile):
| Script | rule_ids | Severity | Profiles |
|---|
check_secrets.sh | secret | fail | cli, library, skill-helper |
check_structure.sh | shebang, guard-missing, guard-shape, syntax | fail | cli, skill-helper |
check_structure.sh | main-returns, keyboard-interrupt | warn | cli, skill-helper |
check_argparse.sh | argparse-when-argv, add-argument-help, subprocess-check | warn | cli, skill-helper |
check_deps.sh | declared-deps | warn | cli, library, skill-helper |
check_ruff.sh | ruff-D100, ruff-SIM115, ruff-PLW1514, ruff-PTH, ruff-F401, ruff-ANN, ruff-format, ruff-fstring-modernize | warn | cli, library, skill-helper |
check_ruff.sh | ruff-E722, ruff-shell-true, ruff-S307, ruff-F403, ruff-S108 | fail | cli, library, skill-helper |
check_size.sh | size | warn | cli, library, skill-helper |
check_library_discipline.py | library-no-side-effects | fail | library |
check_library_discipline.py | library-public-api-declared | warn | library |
check_skill_helper_contract.py | skill-helper-stdin-json | fail | skill-helper |
check_skill_helper_contract.py | skill-helper-atomic-write, skill-helper-distinct-error-codes | warn | skill-helper |
The previously-INFO exec-bit rule is dropped (the pattern has no INFO; the executable-bit check still runs in _ast_checks.py for parity but emits no finding).
ruff-S602 and ruff-S604 consolidate into single rule_id ruff-shell-true (both about shell=True in subprocess). ruff-UP031 and ruff-UP032 consolidate into ruff-fstring-modernize (both about printf-style → f-string).
Tier-2 exclusion list. Any FAIL in secret, shebang, guard-missing, guard-shape, syntax, ruff-E722, ruff-shell-true, ruff-S307, ruff-F403, or ruff-S108 excludes the script from Tier-2.
Missing-tool degradation. check_ruff.sh emits all 13 envelopes with overall_status: inapplicable and exits 0 when ruff is absent. Other scripts continue.
3. Tier-2 Judgment Dimensions
For each script that passed the Tier-2 exclusion gate, evaluate against the 9 base judgment rules at references/check-*.md (all profiles), plus the profile-specific dimensions from the table below:
Profile-specific dimensions:
Evaluator policy: see check-skill-pattern.md §Evaluator policy. Read all 9 rule files first, then evaluate the script in one LLM call.
4. Tier-3 Cross-Script Collision
Evaluate against check-collision.md. Surface duplicate logic across scripts (e.g., copy-pasted helper functions, duplicated argparse setups) as warn. Single-script scope returns inapplicable.
5. Report
Merge findings from all 3 tiers. Sort fail before warn before inapplicable; Tier-1 before Tier-2 before Tier-3. Each Recommendation: line copies through recommended_changes verbatim.
6. Opt-In Repair Loop
Ask once: "Apply fixes? Enter y (all), n (skip), or comma-separated numbers."
For each selected finding:
- Direct edit — shebang,
__main__ guard, argparse help text, pathlib.Path over os.path, etc. Show diff; write on confirmation.
- Routed to another skill — substantial rewrites →
/build:build-python-script.
- Tier-2/3 judgment — ask the user; rewrite the section; show diff.
After each fix, re-run the relevant Tier-1 script.
Anti-Pattern Guards
- Per-dimension LLM call. Collapse into one locked-rubric call per script.
- LLM-evaluating format compliance. Shebang, guard, argparse shape — handle deterministically.
- Ambiguous compliance reported as PASS. Surface as WARN.
- Bulk-applying fixes. Per-finding confirmation required.
- Re-evaluating scripted rules in Tier-2. Trust the
pass envelope.
- Suppressing the inapplicable envelope. When ruff is absent, the 13 ruff envelopes emit
inapplicable — surface them.
- Embellishing scripts'
recommended_changes. Copy through; do not paraphrase.
Key Instructions
- Run Tier-1 first; gate LLM evaluation on structural validity.
check_ruff.sh consolidates 15 ruff codes into 13 rule_ids (ruff-shell-true covers S602+S604; ruff-fstring-modernize covers UP031+UP032).
- Recovery: read-only outside the Repair Loop.
Handoff
Chainable to: /build:build-python-script (rebuild non-compliant scripts from scratch).