| name | desloppify |
| description | Multi-language codebase health scanner. Use when the user explicitly asks to run desloppify, scan for technical debt, get a health score, or create a cleanup plan. Do NOT trigger for general code review, renaming, or fixing individual bugs.
|
Desloppify
1. Your Job
Maximise the strict score honestly. Your main cycle: scan → plan → execute → rescan. Follow the scan output's INSTRUCTIONS FOR AGENTS — don't substitute your own analysis.
Don't be lazy. Do large refactors and small detailed fixes with equal energy. If it takes touching 20 files, touch 20 files. If it's a one-line change, make it. No task is too big or too small — fix things properly, not minimally.
2. The Workflow
Three phases, repeated as a cycle.
Monorepos and multi-project directories
If the workspace contains multiple programs (e.g., frontend + backend in sibling folders), scan each one separately — do not scan the parent directory:
desloppify --lang typescript scan --path ./frontend
desloppify --lang python scan --path ./backend
Each --path target should be a single coherent project. Scanning a parent that contains multiple programs mixes state and path context, producing unreliable results.
Phase 1: Scan and review — understand the codebase
desloppify scan --path .
desloppify status
After scanning, always run desloppify next — it tells you exactly what to do, in order. Don't interpret the scan output yourself or ask the user what to do. Just run next and follow its instructions.
The scan will tell you if subjective dimensions need review. Follow its instructions. To trigger a review manually:
desloppify review --prepare
Phase 2: Plan — decide what to work on
After reviews, triage stages and plan creation appear in the execution queue surfaced by next. Complete them in order — next tells you what each stage expects in the --report:
desloppify next
desloppify plan triage --stage observe --report "themes and root causes..."
desloppify plan triage --stage reflect --report "comparison against completed work..."
desloppify plan triage --stage organize --report "summary of priorities..."
desloppify plan triage --complete --strategy "execution plan..."
For automated triage: desloppify plan triage --run-stages --runner codex (Codex), --runner claude (Claude), or --runner rovodev (Rovo Dev). Options: --only-stages, --dry-run, --stage-timeout-seconds.
Then shape the queue. The plan shapes everything next gives you — next is the execution queue, not the full backlog. Don't skip this step.
desloppify plan
desloppify plan queue
desloppify plan reorder <pat> top
desloppify plan cluster create <name>
desloppify plan focus <cluster>
desloppify plan skip <pat>
Phase 3: Execute — grind the queue to completion
Trust the plan and execute. Don't rescan mid-queue — finish the queue first.
Branch first. Create a dedicated branch — never commit health work directly to main:
git checkout -b desloppify/code-health
desloppify config set commit_pr 42
The loop:
desloppify next
git add <files> && git commit -m "desloppify: fix 3 deferred_import findings"
desloppify plan commit-log record
git push -u origin desloppify/code-health
Score may temporarily drop after fixes — cascade effects are normal, keep going.
If next suggests an auto-fixer, run desloppify autofix <fixer> --dry-run to preview, then apply.
When the queue is clear, go back to Phase 1. New issues will surface, cascades will have resolved, priorities will have shifted. This is the cycle.
3. Reference
Key concepts
- Tiers: T1 auto-fix → T2 quick manual → T3 judgment call → T4 major refactor.
- Auto-clusters: related findings are auto-grouped in
next. Drill in with next --cluster <name>.
- Zones: production/script (scored), test/config/generated/vendor (not scored). Fix with
zone set.
- Wontfix cost: widens the lenient↔strict gap. Challenge past decisions when the gap grows.
Scoring
Overall score = 25% mechanical + 75% subjective.
- Mechanical (25%): auto-detected issues — duplication, dead code, smells, unused imports, security. Fixed by changing code and rescanning.
- Subjective (75%): design quality review — naming, error handling, abstractions, clarity. Starts at 0% until reviewed. The scan will prompt you when a review is needed.
- Strict score is the north star: wontfix items count as open. The gap between overall and strict is your wontfix debt.
- Score types: overall (lenient), strict (wontfix counts), objective (mechanical only), verified (confirmed fixes only).
Reviews
Four paths to get subjective scores:
- Local runner (Codex):
desloppify review --run-batches --runner codex --parallel --scan-after-import — automated end-to-end.
- Local runner (Claude):
desloppify review --prepare → launch parallel subagents → desloppify review --import merged.json — see skill doc overlay for details.
- Local runner (Rovo Dev):
desloppify review --run-batches --runner rovodev --parallel --scan-after-import — automated end-to-end via acli rovodev run subprocesses.
- Cloud/external:
desloppify review --external-start --external-runner claude → follow session template → --external-submit.
- Manual path:
desloppify review --prepare → review per dimension → desloppify review --import file.json.
Batch output vs import filenames: Individual batch outputs from subagents must be named batch-N.raw.txt (plain text/JSON content, .raw.txt extension). The .json filenames in --import merged.json or --import findings.json refer to the final merged import file, not individual batch outputs. Do not name batch outputs with a .json extension.
Subagent parallelism limit: Do not launch every review batch at once. Run subagents in small waves, usually 3-5 concurrent agents, and wait for a wave to finish before starting the next. If agents return empty, partial, or rate-limit-shaped results, reduce the wave size and retry only failed batches. Launching 20+ subagents at once can exhaust API quota and produce no usable review output.
- Import first, fix after — import creates tracked state entries for correlation.
- Target-matching scores trigger auto-reset to prevent gaming. Use the blind-review workflow described in your agent overlay doc (e.g.
docs/CLAUDE.md, docs/HERMES.md).
- Even moderate scores (60-80) dramatically improve overall health.
- Stale dimensions auto-surface in
next — just follow the queue.
Integrity rules: Score from evidence only — no prior chat context, score history, or target-threshold anchoring. When evidence is mixed, score lower and explain uncertainty. Assess every requested dimension; never drop one.
Review output format
Return machine-readable JSON for review imports. For --external-submit, include session from the generated template:
{
"session": {
"id": "<session_id_from_template>",
"token": "<session_hmac_from_template>"
},
"assessments": {
"<dimension_from_query>": 0
},
"findings": [
{
"dimension": "<dimension_from_query>",
"identifier": "short_id",
"summary": "one-line defect summary",
"related_files": ["relative/path/to/file.py"],
"evidence": ["specific code observation"],
"suggestion": "concrete fix recommendation",
"confidence": "high|medium|low"
}
]
}
findings MUST match query.system_prompt exactly (including related_files, evidence, and suggestion). Use "findings": [] when no defects found. Import is fail-closed: invalid findings abort unless --allow-partial is passed. Assessment scores are auto-applied from trusted internal or cloud session imports. Legacy --attested-external remains supported.
Import paths
- Robust session flow (recommended):
desloppify review --external-start --external-runner claude → use generated prompt/template → run printed --external-submit command.
- Durable scored import (legacy):
desloppify review --import findings.json --attested-external --attest "I validated this review was completed without awareness of overall score and is unbiased."
- Findings-only fallback:
desloppify review --import findings.json
Reviewer agent prompt
Runners that support agent definitions (Cursor, Copilot, Gemini) can create a dedicated reviewer agent. Use this system prompt:
You are a code quality reviewer. You will be given a codebase path, a set of
dimensions to score, and what each dimension means. Read the code, score each
dimension 0-100 from evidence only, and return JSON in the required format.
Do not anchor to target thresholds. When evidence is mixed, score lower and
explain uncertainty.
See your editor's overlay section below for the agent config format.
Plan commands
desloppify plan reorder <cluster> top
desloppify plan reorder <a> <b> top
desloppify plan reorder <pat> before -t X
desloppify plan cluster reorder a,b top
desloppify plan resolve <pat>
desloppify plan reopen <pat>
desloppify backlog
Commit tracking
desloppify plan commit-log
desloppify plan commit-log record
desloppify plan commit-log record --note "why"
desloppify plan commit-log record --only "smells::*"
desloppify plan commit-log history
desloppify plan commit-log pr
desloppify config set commit_tracking_enabled false
After resolving findings as fixed, the tool shows uncommitted work, committed history, and a suggested commit message. After committing externally, run record to move findings from uncommitted to committed and auto-update the linked PR description.
Agent directives
Directives are messages shown to agents at lifecycle phase transitions — use them to switch models, set constraints, or give context-specific instructions.
desloppify directives
desloppify directives set execute "Switch to claude-sonnet-4-6. Focus on speed."
desloppify directives set triage "Switch to claude-opus-4-6. Read carefully."
desloppify directives set review "Use blind packet. Do not anchor on previous scores."
desloppify directives unset execute
Available phases: execute, review, triage, workflow, scan (and fine-grained variants like review_initial, triage_postflight, etc.).
Quick reference
desloppify next --count 5
desloppify next --cluster <name>
desloppify backlog --count 5
desloppify show <pattern>
desloppify show --status open
desloppify plan skip --permanent "<id>" --note "reason" --attest "..."
desloppify exclude <path>
desloppify config show
desloppify scan --path . --reset-subjective
4. Fix Tool Issues Upstream
When desloppify itself appears wrong or inconsistent — a bug, a bad detection, a crash, confusing output — fix it and open a PR. If you can't confidently fix it, file an issue instead.
Fix and PR (preferred)
Clone the tool repo to a temp directory, make the fix there, and verify it works against the project you're scanning before pushing.
git clone https://github.com/peteromallet/desloppify.git /tmp/desloppify-fix
cd /tmp/desloppify-fix
git checkout -b fix/<short-description>
Make your changes, then run the test suite and verify the fix against the original project:
python -m pytest desloppify/tests/ -q
python -m desloppify scan --path <project-root>
Once it looks good, push and open a PR:
git add <files> && git commit -m "fix: <what and why>"
git push -u origin fix/<short-description>
gh pr create --title "fix: <short description>" --body "$(cat <<'EOF'
## Problem
<what went wrong — include the command and output>
## Fix
<what you changed and why>
EOF
)"
Clean up after: rm -rf /tmp/desloppify-fix
File an issue (fallback)
If the fix is unclear or the change needs discussion, open an issue at https://github.com/peteromallet/desloppify/issues with a minimal repro: command, path, expected output, actual output.
Prerequisite
command -v desloppify >/dev/null 2>&1 && echo "desloppify: installed" || echo "NOT INSTALLED — run: uvx --from git+https://github.com/peteromallet/desloppify.git desloppify"
If uvx is not available: pip install desloppify[full] && desloppify setup