| name | refactor-audit |
| description | Audit a codebase scope for tech debt, generate a prioritized refactoring plan. Covers dead code, complexity hotspots, DRY violations, dependency health, type safety gaps, test coverage, architecture/coupling issues, security smells, and naming/readability problems. Outputs an actionable plan at the requested detail level — from high-level subtasks for a ralph loop to precise file-and-function changes for immediate execution.
|
Refactor Audit
Systematic codebase audit and refactoring plan generator. Run periodically to
identify and reduce tech debt.
When to use
- Periodic tech debt sweeps
- Before a major feature push (clean the area you'll be working in)
- After a big feature lands (clean up what it left behind)
- When onboarding to an unfamiliar area of the codebase
- When something "feels wrong" but you can't pinpoint it
Phase 0 — Scope & Configuration
Before doing any analysis, ask the user for these inputs. Present them as
questions, don't assume defaults silently.
0.1 Target scope
Ask: "What should I audit?"
Accept any of:
- A directory path (e.g.
packages/server/modules/auth)
- A package name in a monorepo (e.g.
@speckle/viewer)
- A glob pattern (e.g.
src/components/**)
- "everything" (single-package repo only — refuse for monorepos, ask them to pick a package)
If the user gives a vague answer like "the auth stuff", use grep/find to locate
the relevant directories and confirm with them before proceeding.
0.2 Plan detail level
Ask: "How detailed should the refactoring plan be?"
Two modes:
| Mode | When to use | What you produce |
|---|
| Subtask | Plan will be split into subtasks for a ralph loop or multiple agent sessions | High-level task descriptions with context, goals, and acceptance criteria. Subtask agents will do their own file-level research. |
| Execution | You or the user will implement changes immediately in this session | Precise file paths, function names, line ranges, exact changes to make, and execution order. Like plan mode output. |
0.3 Audit categories
Ask: "Run all checks or focus on specific areas?"
Default is all. If the user wants to focus, let them pick from:
- Dead code & unused exports
- Complexity hotspots
- DRY violations & duplication
- Dependency health
- Type safety gaps
- Test coverage gaps
- Architecture & coupling
- Security smells
- Naming & readability
Phase 1 — Discovery
Gather context about the target scope before analyzing anything.
1.1 Structural survey
find <scope> -type f \( -name "*.ts" -o -name "*.tsx" -o -name "*.vue" -o -name "*.js" -o -name "*.jsx" -o -name "*.py" \) | head -200
find <scope> -type f \( -name "*.ts" -o -name "*.tsx" -o -name "*.vue" \) -exec wc -l {} \; | sort -rn | head -30
find <scope> -type d -maxdepth 3 | head -50
1.2 Tooling detection
Detect what's available in the project. Check for:
- Package manager: look for
pnpm-lock.yaml, yarn.lock, package-lock.json, poetry.lock
- Linter/formatter: check for
.eslintrc*, eslint.config.*, .prettierrc*, biome.json, ruff.toml
- Test framework: check for
vitest.config.*, jest.config.*, playwright.config.*, pytest.ini, pyproject.toml
- Type checker:
tsconfig.json, pyrightconfig.json, mypy.ini
- Build system:
vite.config.*, nuxt.config.*, webpack.config.*, rollup.config.*
- Monorepo root:
pnpm-workspace.yaml, lerna.json, nx.json, turbo.json
Record what's present — you'll use these tools in Phase 2.
1.3 Existing standards
Check for:
<% if (target === 'claude') { -%>
CLAUDE.md / .claude/rules/ — project-level AI instructions
.claude/skills/ — reusable AI workflows and slash commands
<% } else if (target === 'copilot') { -%>
.github/copilot-instructions.md / .github/instructions/ — project-level AI instructions
.github/skills/ — reusable AI workflows and slash commands
<% } else if (target === 'cursor') { -%>
.cursor/rules/ — project-level AI instructions
.cursor/skills/ — reusable AI workflows and slash commands
<% } else { -%>
- Project-level AI configuration files and skill directories
<% } -%>
CONTRIBUTING.md — team conventions
.editorconfig
- AI instructions/skills
- Custom lint rules or eslint plugins
Note any project-specific patterns or conventions so your recommendations
don't fight the existing codebase style.
1.4 Recent churn (optional but valuable)
If git is available:
git log --since="3 months ago" --name-only --pretty=format: -- <scope> | sort | uniq -c | sort -rn | head -20
git log --since="6 months ago" --pretty=format:%an -- <scope> | sort | uniq -c | sort -rn | head -10
Phase 2 — Analysis
Run each enabled audit category. For each finding, record:
- What: the specific issue
- Where: file path and line range (or function/component name)
- Severity: CRITICAL / HIGH / MEDIUM / LOW
- Effort: S (< 30 min) / M (30 min – 2 hrs) / L (2+ hrs)
- Why it matters: one sentence on the concrete risk or cost
Category 1: Dead code & unused exports
Goal: Find code that can be deleted with zero behavior change.
Techniques:
Severity guide:
- CRITICAL: Dead files (entire modules with no importers)
- HIGH: Unused exported functions/components
- MEDIUM: Unused local variables, unreachable branches
- LOW: Commented-out code, unused type definitions
Category 2: Complexity hotspots
Goal: Find functions and components that are too complex to maintain safely.
Techniques:
- File size scan: flag files over 300 lines (WARNING), over 500 lines (CRITICAL)
- Function length: flag functions over 50 lines (WARNING), over 100 lines (CRITICAL)
- Nesting depth: flag anything nested >3 levels deep
- Cyclomatic complexity: count branches (if/else/switch/ternary/&&/||/catch)
- 1-5: fine
- 6-10: WARNING — consider splitting
- 11+: CRITICAL — must split
- Vue components: flag components with >5
ref()/reactive() declarations (extract composable)
- Look for functions with >4 parameters (use options object)
Prioritization formula:
(lines / 100) + (branch_count × 0.5) + (nesting_depth × 2)
Score >5 = HIGH priority for refactoring.
Category 3: DRY violations & duplication
Goal: Find duplicated logic that should be consolidated.
Techniques:
Important distinction: Structural similarity is NOT always a DRY violation.
Two functions that look alike but serve different domains should stay separate.
Only flag duplication where the logic is genuinely shared knowledge.
Severity guide:
- HIGH: Same logic duplicated 3+ times, or duplicated logic that has diverged (bug in one copy, not others)
- MEDIUM: Same logic in 2 places
- LOW: Similar patterns that could share a utility but work fine as-is
Category 4: Dependency health
Goal: Find outdated, vulnerable, or unnecessary dependencies.
Techniques:
- If npm/pnpm:
pnpm outdated or check package.json for pinned ancient versions
- Look for dependencies that are only used in 1-2 files (could be replaced with a small utility)
- Check for multiple packages doing the same thing (e.g., both
axios and fetch wrappers)
- Look for deprecated packages (check for deprecation notices in package.json or README)
- Check for packages that pull in huge transitive dependency trees for minor functionality
- Look for dependencies that should be devDependencies (or vice versa)
Severity guide:
- CRITICAL: Known security vulnerabilities (if
npm audit / pnpm audit available, run it)
- HIGH: Deprecated packages, major version behind
- MEDIUM: Multiple packages for same purpose, unnecessary large dependencies
- LOW: Minor version behind, could-be-devDependency misplacements
Category 5: Type safety gaps
Goal: Find places where TypeScript's type system is being bypassed.
Techniques:
grep -rn "as any\|: any\|<any>" <scope> --include="*.ts" --include="*.tsx" --include="*.vue" | wc -l
grep -rn "as any" <scope> --include="*.ts" --include="*.vue"
grep -rn ": any" <scope> --include="*.ts" --include="*.vue"
grep -rn "@ts-ignore\|@ts-expect-error\|@ts-nocheck" <scope> --include="*.ts" --include="*.vue"
grep -rn "eslint-disable.*@typescript" <scope> --include="*.ts" --include="*.vue"
grep -rn "!\." <scope> --include="*.ts" --include="*.vue" | grep -v "!=\|!=="
- Also look for: untyped function parameters,
Object or {} as types, excessive use of type assertions
- Check
tsconfig.json for permissive settings (strict: false, noImplicitAny: false)
Severity guide:
- HIGH:
any in function signatures (spreads through the call chain), @ts-nocheck on entire files
- MEDIUM:
as any type assertions, non-null assertions in complex logic
- LOW:
@ts-expect-error with explanation comments (at least they're documented)
Category 6: Test coverage gaps
Goal: Find critical code paths that lack test coverage.
Techniques:
Severity guide:
- CRITICAL: Business logic with no tests AND high complexity
- HIGH: Public API functions/components with no tests
- MEDIUM: Utility functions with no tests, test files with very few assertions
- LOW: Type-only files, pure config files without tests
Category 7: Architecture & coupling
Goal: Find structural problems that make the codebase hard to change.
Techniques:
- Circular dependencies: trace import chains looking for cycles
grep -rn "^import" <scope> --include="*.ts" | grep -v node_modules
(Or use madge --circular if available)
- Layer violations: if the project has a layered architecture (e.g., components shouldn't import from server modules), check for cross-layer imports
- God modules: files that are imported by >15 other files (high fan-in = risky to change)
grep -rn "from.*['\"]" <scope> --include="*.ts" --include="*.vue" | \
grep -oP "from ['\"]([^'\"]+)['\"]" | sort | uniq -c | sort -rn | head -20
- Coupling indicators: files that always change together in git history
- Barrel file bloat:
index.ts files that re-export everything, causing import cycle risks and bundle bloat
- Misplaced logic: UI components containing business logic, utility files containing domain logic
Severity guide:
- CRITICAL: Circular dependency cycles
- HIGH: God modules (>20 importers), layer violations
- MEDIUM: Barrel file bloat, high coupling between unrelated modules
- LOW: Minor structural inconsistencies
Category 8: Security smells
Goal: Find patterns that could lead to security issues.
Techniques:
grep -rn "password\|secret\|api_key\|apikey\|token\|private_key" <scope> --include="*.ts" --include="*.vue" | grep -v "test\|spec\|mock\|\.d\.ts\|type\|interface"
grep -rn "query.*\`\|execute.*\`\|raw.*\`" <scope> --include="*.ts"
grep -rn "innerHTML\|dangerouslySetInnerHTML\|v-html" <scope> --include="*.ts" --include="*.vue" --include="*.tsx"
grep -rn "eval(\|new Function(\|setTimeout.*['\"]" <scope> --include="*.ts" --include="*.js"
grep -rn "exec(\|execSync(\|spawn(" <scope> --include="*.ts"
- Check for missing input validation on API endpoints
- Look for CORS wildcards (
Access-Control-Allow-Origin: *)
- Check for disabled security headers
Severity guide:
- CRITICAL: Hardcoded secrets, SQL injection patterns, eval with user input
- HIGH: XSS vectors, missing input validation on public endpoints
- MEDIUM: Overly permissive CORS, missing rate limiting patterns
- LOW: Using
v-html with sanitized content (still worth noting)
Category 9: Naming & readability
Goal: Find names that mislead or confuse.
Techniques:
- Single-letter variables outside of loop indices and arrow function shorthands
- Boolean variables/functions not starting with
is/has/can/should/will
- Functions >30 chars (probably doing too much if you need that many words)
- Inconsistent naming conventions within the scope (mixing camelCase and snake_case)
- Misleading names: function named
getX that has side effects, or isX that returns non-boolean
- Abbreviated names that aren't universally known (
mgr, proc, val, tmp, cb)
- Generic names:
data, info, result, item, stuff, thing, handler, manager, service, utils, helpers
Severity guide:
- HIGH: Misleading names (function behavior contradicts its name)
- MEDIUM: Generic names on important domain concepts
- LOW: Minor inconsistencies, slightly too-long names
Phase 3 — Prioritization
After all checks complete, sort all findings into a prioritized list.
Scoring formula
For each finding, compute:
priority_score = severity_weight + effort_bonus + churn_bonus + coupling_bonus
severity_weight:
CRITICAL = 10
HIGH = 7
MEDIUM = 4
LOW = 1
effort_bonus (favor quick wins):
S (< 30 min) = +3
M (30 min–2 hr) = +1
L (2+ hr) = +0
churn_bonus (if git data available):
File changed >10 times in 3 months = +3
File changed 5–10 times = +1
File changed <5 times = +0
coupling_bonus (more importers = higher risk):
>15 importers = +3
5–15 importers = +1
<5 importers = +0
Sort descending by priority_score. Group into tiers:
- Tier 1 (Do first): score ≥ 12
- Tier 2 (Do soon): score 8–11
- Tier 3 (Do eventually): score 4–7
- Tier 4 (Nice to have): score < 4
Phase 4 — Plan Generation
Generate the refactoring plan in the requested detail level.
Subtask mode output
For each tier (starting from Tier 1), generate task descriptions like:
## Task: [short descriptive title]
**Category:** [which audit category]
**Tier:** [1-4]
**Estimated effort:** [S/M/L]
**Findings addressed:** [count]
### Context
[2-3 sentences explaining what the problem is and why it matters.
Include enough info for a fresh agent session to understand the situation.]
### Goal
[1-2 sentences on what "done" looks like]
### Scope
[List the files/directories involved — just names, the executing agent will read them]
### Acceptance criteria
- [ ] [Specific, verifiable criterion]
- [ ] [Another criterion]
- [ ] All existing tests still pass
- [ ] No new lint errors introduced
Group related findings into single tasks where they touch the same files.
Aim for tasks that take 15-60 minutes each. Split anything larger.
Execution mode output
For each tier, generate precise instructions:
## Refactoring: [short title]
**Priority score:** [X] | **Severity:** [X] | **Effort:** [S/M/L]
### Changes
1. **`path/to/file.ts` (lines 45-78):**
- Extract the nested if/else block in `processPayment()` into a separate
`validatePaymentMethod(method: PaymentMethod): ValidationResult` function
- Move it to `path/to/payment-validation.ts`
- Update imports in `path/to/file.ts` and `path/to/other-consumer.ts`
2. **`path/to/another-file.ts` (lines 12-15):**
- Replace `as any` with proper generic: `Record<string, PaymentConfig>`
- The type definition exists in `path/to/types.ts` line 34
### Verification
- Run: `pnpm test --filter @scope/package`
- Run: `pnpm typecheck`
- Confirm no new lint errors: `pnpm lint`
Phase 5 — Summary
After the plan is generated, present a brief summary:
## Audit Summary
**Scope:** [what was analyzed]
**Files scanned:** [count]
**Total findings:** [count]
**Breakdown:**
- Tier 1 (critical): [count] findings → [count] tasks
- Tier 2 (high): [count] findings → [count] tasks
- Tier 3 (medium): [count] findings → [count] tasks
- Tier 4 (low): [count] findings → [count] tasks
**Top 3 systemic issues:**
1. [Pattern that appears across multiple findings]
2. [Another pattern]
3. [Another pattern]
**Quick wins (< 30 min, high impact):**
- [Task name] — [one-line description]
- [Task name] — [one-line description]
**Estimated total effort:** [rough range in hours]
Principles
These principles guide the analysis. When in conflict, earlier items take
precedence:
- Don't break things. Every recommended change must preserve existing behavior unless explicitly flagged as a behavior change.
- Tests first. If a refactoring target has no tests, the first task is always "write characterization tests" before changing anything.
- Small, safe changes. Prefer many small refactorings over fewer large ones. Each task should be independently shippable.
- Duplication is better than the wrong abstraction. Don't recommend DRY consolidation unless the duplicated logic is genuinely shared knowledge, not just structurally similar.
- Existing conventions win. Recommendations should follow the project's existing patterns. Don't suggest a complete architectural overhaul when the issue is localized.
- Delete before refactor. If code can be deleted instead of refactored, prefer deletion.
- Tooling over judgment. When a linter, type checker, or test runner can verify a finding, use it. LLM judgment is the fallback, not the primary detector.