| name | skill-optimizer |
| description | SkillOpt-flavored offline training loop for any SKILL.md. Treats accumulated learn-rule corrections as training trajectories, proposes bounded patches via an optimizer LLM, gates each candidate against a held-out validation set built from the user's own past corrections, and ships only candidates that demonstrably improve the score. Inspired by Microsoft SkillOpt's ReflACT pipeline (rollout → reflect → aggregate → select → update → evaluate) adapted to pro-workflow's SQLite store. Use when a skill has accumulated 8+ learn-rule rows and the user wants the skill itself to get better, not just longer. |
Skill Optimizer
Train an existing SKILL.md the way a deep-learning optimizer trains weights: via rollouts, gradient-like reflections, validation-gated acceptance. No model retraining; only the skill markdown changes.
When to use
Use this skill when:
- A pro-workflow skill has accumulated 8+ learn-rule rows for it
- The user reports the skill is "getting bloated" or "rules keep being repeated"
- The user wants offline, budget-capped improvement over multiple sessions
Do not use when:
- Skill has fewer than 8 trajectories (nothing to learn from)
- The user wants real-time edits (this is offline, single-shot)
- No
ANTHROPIC_API_KEY (or equivalent provider key) is available
Architecture (mirrors SkillOpt's six-stage loop)
rollout pull recent learnings from SQLite (existing learn-rule rows)
reflect optimizer LLM analyzes a minibatch, proposes add/delete/replace patches
aggregate vote-merge patches across minibatches
select clip by LR budget (default: 3 adds, 2 deletes, 3 replaces per step)
update apply selected patches to a candidate skill content
evaluate evaluator LLM scores candidate against held-out validation items
gate accept candidate only if weighted score >= current + acceptThreshold
slow update at epoch boundary, consolidate accepted edits into a coherent rewrite
Failed candidates are stored in a rejection buffer and fed back to the next reflect step so the optimizer doesn't propose the same patch twice.
Run it
/skill-optimize <slug> [options]
Options (all optional; sensible defaults shown):
| Flag | Default | Notes |
|---|
--epochs N | 3 | Outer loop count |
--batch-size N | 8 | Trajectories per minibatch |
--minibatches N | 2 | Minibatches per epoch |
--holdout N | 6 | Validation items reserved (max ~25% of trajectories) |
--budget-usd X | 0.50 | Hard cap; loop aborts when spent |
--optimizer-model M | claude-sonnet-4-6 | Reflect + slow-update model |
--evaluator-model M | claude-haiku-4-5-20251001 | Gate model (cheaper) |
--max-adds N | 3 | LR budget per step |
--max-deletes N | 2 | |
--max-replaces N | 3 | |
--accept-threshold X | 0.0 | Minimum score delta to accept candidate |
--max-skill-tokens N | 2000 | Hard cap on candidate length |
--slow-every N | 2 | Epochs between consolidation passes |
--json | off | Machine-readable output |
Kill switch: touch ~/.pro-workflow/STOP aborts the loop between steps.
Output
- Candidate accepted → SKILL.md overwritten, hash stamp appended in HTML comment
- Run details persist in
optimization_runs, optimization_candidates, optimization_patches, optimization_rejections
- Validation set persists in
optimization_validation (reusable across runs)
Inspect after:
sqlite3 ~/.pro-workflow/data.db "SELECT id, skill_slug, initial_score, best_score, accepted_steps, rejected_steps, spent_usd FROM optimization_runs ORDER BY id DESC LIMIT 5"
Rules
- Validation set is frozen at run start. Never re-derive from new corrections mid-run.
- One candidate per step. No parallel branches.
- Slow-update output is itself a candidate; it must pass the gate to replace the best.
- The optimizer LLM and evaluator LLM may be different models. Mixing a strong optimizer with a cheap evaluator is the SkillOpt-recommended config.
- If
spent_usd >= budget_usd at any step boundary, the loop ends with stopped_reason="budget exhausted".
- Patches whose anchor is no longer present in the skill (because a prior patch in the same step removed it) are recorded as rejected with reason
anchor_missing.
Provenance
Inspired by Microsoft SkillOpt (arXiv:2605.23904). The six-stage rollout/reflect/aggregate/select/update/evaluate pipeline, LR budget, rejection buffer, and slow / meta update mechanics are adapted to pro-workflow's existing SQLite + learn-rule data plane. No SkillOpt code is reused. "ReflACT" is not a SkillOpt term and is not used here; the loop is referred to by stage names only.