Ejecuta cualquier Skill en Manus
con un clic

Ejecuta cualquier Skill en Manus con un clic

$pwd:

optimizing-skills

Name: Optimizing Skills
Author: oaustegard

// Disciplined, validation-gated revision of an EXISTING skill so each edit is a measured improvement rather than a guess. Use when editing, revising, or tuning a skill that already exists and there is evidence it underperforms (observed failures, drift, complaints) — invoke by name, or have versioning-skills / creating-skill defer to it before applying edits. Not for authoring a brand-new skill from scratch (use creating-skill) or one-off prose.

Ejecutar en Manus

$ git log --oneline --stat

stars:124

forks:5

updated:29 de mayo de 2026, 12:28

Explorador de archivos

4 archivos

SKILL.md

readonly

name	optimizing-skills
description	Disciplined, validation-gated revision of an EXISTING skill so each edit is a measured improvement rather than a guess. Use when editing, revising, or tuning a skill that already exists and there is evidence it underperforms (observed failures, drift, complaints) — invoke by name, or have versioning-skills / creating-skill defer to it before applying edits. Not for authoring a brand-new skill from scratch (use creating-skill) or one-off prose.
metadata	{"version":"0.2.0"}

Optimizing Skills

Treat the skill document as the parameter under optimization: change it only when the change demonstrably beats the version you already ship. This is the discipline distilled from SkillOpt (microsoft/SkillOpt, arXiv:2605.23904) — its training apparatus dropped, its reproducibility discipline kept. The point is to stop editing skills on intuition and start editing them on evidence.

Core principle

A skill edit is only worth shipping if it strictly improves measured behavior on a held-out check. Most edits that feel like improvements don't move the needle, and some quietly regress. The gate below is what separates a real improvement from a confident guess.

The gate — run it every revision

Assemble a held-out check set. 3–8 representative tasks/prompts the skill should handle well, and it must include the failure(s) that prompted this revision. Keep the set fixed across the revision so before/after scores are comparable.
Hold two versions. best = what you currently ship (never let it silently degrade). candidate = best + your proposed edits.
Score both on the check set. "Run" here = dispatch each check task to the Agent tool (subagent_type=general-purpose) with the skill version in context, or evaluate by hand for small sets. Score per criterion, not one collapsed pass/fail. When a task carries several criteria, the criterion that decides accept/reject is the failure that prompted this revision; the others are regression guards that must not get worse. Collapsing criteria masks the win: in the down-skilling-v1.2.0 retro, the edit drove architectural hallucination 60%→0% while an unrelated length criterion stayed 0/5 in both arms — a single combined pass/fail scored that as a 0–0 tie and would have rejected a large, real improvement.
Accept only if candidate strictly beats best on the triggering-failure criterion, with no regression guard worse. Ties → reject, keep best. An edit that doesn't move the needle does not ship.

When the skill's own output is compiled by an Agent (down-skilling and creating-skill produce a prompt an author writes from the SKILL), score ≥2 author samples per version, or fix one author across both arms. A single author sample per arm lets author capability dominate the edit effect: the same down-skilling edit measured 95%→0% with one author pair and 60%→0% with another — real either way, but n=1 cannot tell a real edit from a lucky author.

This two-tier best/candidate split is the heart of it: a working revision can explore, but the shipped skill only ever ratchets upward.

Bounded edits (the "textual learning rate")

Cap edits per revision — default ~4 distinct add/replace/delete operations, fewer as the skill matures. Large speculative rewrites drift and destroy your ability to attribute a regression to a cause. If a revision wants more edits than the budget, rank and keep the top ones (below) and let the rest wait.

Reflect: failures first, then successes

Separate the evidence before proposing edits:

Failure reflection. Across the failing cases, find the common, systematic pattern — not a one-off edge case. Propose edits that fix the pattern. Failures take priority in any merge.
Success reflection. Across cases that already work, find generalizable patterns worth encoding so they survive future edits. Reinforce; don't duplicate.

For both: edits must generalize (never hardcode task-specific values), and must not duplicate content already in the skill — patch genuine gaps only.

Rank when over budget

When candidate edits exceed the budget, keep them in this priority order:

Systematic impact — fixes a recurring failure across many cases, not one.
Complementarity — fills a real gap rather than restating existing content.
Generality — phrased as a durable principle, not tied to one task/entity.
Actionability — concrete, followable guidance over vague advice.

Drop the rest. They can return next revision if still warranted.

Protect the hard-won core

If a skill has a battle-tested core that routine edits keep eroding, fence it off and treat it as off-limits to fast edits. Revisit it only on a deliberate longitudinal review: compare the same check tasks across several versions to catch slow drift and regressions that single-edit review misses. (SkillOpt fences this region with HTML-comment markers and only rewrites it at epoch boundaries — the same idea, manual cadence.)

Carry memory across revisions

After a revision, record what you learned about editing this skill — which kinds of edits helped, which were brittle, redundant, or harmful — via remember() tagged with the skill name. Before the next revision, recall() it. This is the compounding part: each revision starts smarter than the last, the way SkillOpt's optimizer-side meta-skill conditions its future edits.

Edit mechanics

Edits are literal string operations (the Edit tool): the target text must match exactly or the edit is a silent no-op. Keep targets unique and verbatim. Prefer append / insert-after-heading / replace-exact / delete-exact, and verify each edit landed before scoring.

When NOT to use this

Authoring a brand-new skill from scratch → creating-skill.
Tracking/rolling back versions during development → versioning-skills.
One-off prose with no reuse → just write it.

Checklist

Held-out check set assembled (includes the triggering failure), fixed for the revision
Edits bounded (~4 max), each generalizable and non-duplicative
Failure patterns addressed before success reinforcement
Candidate scored per-criterion against best; accept decided by the triggering-failure criterion, others as regression guards; shipped only if strictly better
For Agent-compiled artifacts (down-skilling, creating-skill): ≥2 author samples per version, or a fixed author across arms
Hard-won core left untouched unless doing a deliberate longitudinal review
Lesson about editing this skill recorded via remember()

For the deeper "dispatch reflection/scoring to the Agent tool" recipe and the adapted reflection/ranking prompt templates, see references/skillopt-provenance.md.

related-skills.json

mismo repositorio

invoking-gemini.md

from "oaustegard/claude-skills"

Invokes Google Gemini models for structured outputs, image generation, multi-modal tasks, and Google-specific features. Use when users request Gemini, image generation, structured JSON output, Google API integration, or cost-effective parallel processing.

2026-05-28124

convening-experts.md

from "oaustegard/claude-skills"

Convenes expert panels for problem-solving. Use when user mentions panel, experts, multiple perspectives, MECE, DMAIC, RAPID, Six Sigma, root cause analysis, strategic decisions, or process improvement.

2026-05-28124

flowing.md

from "oaustegard/claude-skills"

DAG workflow runner that encodes control flow in code, not prose. Use when a procedure has 3+ steps with branching, retries, or validation that must be enforced — gates as `when=`, edge contracts as `validate=`, predicate loops as `retry_until=`. The runner owns the graph; the LLM provides leaves. Also covers parallel execution, checkpoint resume, detached side-effects.

2026-05-28124

orchestrating-agents.md

from "oaustegard/claude-skills"

Orchestrates parallel API instances, delegated sub-tasks, and multi-agent workflows with streaming and tool-enabled delegation patterns. Use for parallel analysis, multi-perspective reviews, or complex task decomposition.

2026-05-28124

orchestrating-skills.md

from "oaustegard/claude-skills"

Skill-aware orchestration with context routing. Decomposes complex tasks into skill-typed subtasks, extracts targeted context subsets, executes subagents in parallel, and synthesizes results. Self-answers trivial lookups inline. No SDK dependency — uses raw HTTP via httpx. Use when tasks require multiple analytical perspectives, when context is large and subtasks only need portions, or when orchestrating-agents spawns too many redundant subagents.

2026-05-28124

down-skilling.md

from "oaustegard/claude-skills"

Distill Opus-level reasoning into optimized instructions for Haiku 4.5 (and Sonnet). Generates explicit, procedural prompts with n-shot examples that maximize smaller model performance on a given task. Use when user says "down-skill", "distill for Haiku", "optimize for Haiku", "make this work on Haiku", "generate Haiku instructions", or needs to delegate a task to a smaller model with high reliability.

2026-05-26124

package.json

"author": "oaustegard"

"repository": "oaustegard/claude-skills"

Abrir repositorio de GitHub Ver repositorios del creador

$ install --global

$ download --local

Ejecutar en Manus

$ useful --forSOC

Desarrolladores de softwareOcupaciones informáticas y matemáticas15-1252L4

name	optimizing-skills
description	Disciplined, validation-gated revision of an EXISTING skill so each edit is a measured improvement rather than a guess. Use when editing, revising, or tuning a skill that already exists and there is evidence it underperforms (observed failures, drift, complaints) — invoke by name, or have versioning-skills / creating-skill defer to it before applying edits. Not for authoring a brand-new skill from scratch (use creating-skill) or one-off prose.
metadata	{"version":"0.2.0"}

Optimizing Skills

Core principle

The gate — run it every revision

Assemble a held-out check set. 3–8 representative tasks/prompts the skill should handle well, and it must include the failure(s) that prompted this revision. Keep the set fixed across the revision so before/after scores are comparable.
Hold two versions. best = what you currently ship (never let it silently degrade). candidate = best + your proposed edits.
Score both on the check set. "Run" here = dispatch each check task to the Agent tool (subagent_type=general-purpose) with the skill version in context, or evaluate by hand for small sets. Score per criterion, not one collapsed pass/fail. When a task carries several criteria, the criterion that decides accept/reject is the failure that prompted this revision; the others are regression guards that must not get worse. Collapsing criteria masks the win: in the down-skilling-v1.2.0 retro, the edit drove architectural hallucination 60%→0% while an unrelated length criterion stayed 0/5 in both arms — a single combined pass/fail scored that as a 0–0 tie and would have rejected a large, real improvement.
Accept only if candidate strictly beats best on the triggering-failure criterion, with no regression guard worse. Ties → reject, keep best. An edit that doesn't move the needle does not ship.

This two-tier best/candidate split is the heart of it: a working revision can explore, but the shipped skill only ever ratchets upward.

Bounded edits (the "textual learning rate")

Reflect: failures first, then successes

Separate the evidence before proposing edits:

Failure reflection. Across the failing cases, find the common, systematic pattern — not a one-off edge case. Propose edits that fix the pattern. Failures take priority in any merge.
Success reflection. Across cases that already work, find generalizable patterns worth encoding so they survive future edits. Reinforce; don't duplicate.

For both: edits must generalize (never hardcode task-specific values), and must not duplicate content already in the skill — patch genuine gaps only.

Rank when over budget

When candidate edits exceed the budget, keep them in this priority order:

Systematic impact — fixes a recurring failure across many cases, not one.
Complementarity — fills a real gap rather than restating existing content.
Generality — phrased as a durable principle, not tied to one task/entity.
Actionability — concrete, followable guidance over vague advice.

Drop the rest. They can return next revision if still warranted.

Protect the hard-won core

Carry memory across revisions

Edit mechanics

When NOT to use this

Authoring a brand-new skill from scratch → creating-skill.
Tracking/rolling back versions during development → versioning-skills.
One-off prose with no reuse → just write it.

Checklist

Held-out check set assembled (includes the triggering failure), fixed for the revision
Edits bounded (~4 max), each generalizable and non-duplicative
Failure patterns addressed before success reinforcement
Candidate scored per-criterion against best; accept decided by the triggering-failure criterion, others as regression guards; shipped only if strictly better
For Agent-compiled artifacts (down-skilling, creating-skill): ≥2 author samples per version, or a fixed author across arms
Hard-won core left untouched unless doing a deliberate longitudinal review
Lesson about editing this skill recorded via remember()

For the deeper "dispatch reflection/scoring to the Agent tool" recipe and the adapted reflection/ranking prompt templates, see references/skillopt-provenance.md.

optimizing-skills

Optimizing Skills

Core principle

The gate — run it every revision

Bounded edits (the "textual learning rate")

Reflect: failures first, then successes

Rank when over budget

Protect the hard-won core

Carry memory across revisions

Edit mechanics

When NOT to use this

Checklist

Más de este repositorio

Más de este repositorio

Optimizing Skills

Core principle

The gate — run it every revision

Bounded edits (the "textual learning rate")

Reflect: failures first, then successes

Rank when over budget

Protect the hard-won core

Carry memory across revisions

Edit mechanics

When NOT to use this

Checklist