Run any Skill in Manus with one click

$pwd:

skill-engineering

Name: Skill Engineering
Author: xobotyi

// Design and iterate Claude Code skills: SKILL.md structure, description formulas, content architecture, and quality evaluation. Invoke whenever task involves any interaction with Claude Code skills — creating, reviewing, evaluating, debugging, or improving skills.

Run Skill in Manus

$ git log --oneline --stat

stars:8

forks:0

updated:April 14, 2026 at 14:19

File Explorer

10 files

SKILL.md

readonly

package.json

"author": "xobotyi"

"repository": "xobotyi/cc-foundry"

View GitHub Repository

$ install --globalskills.sh

$ download --local

Run Skill in Manus

[HINT] Download the complete skill directory including SKILL.md and all related files

name	skill-engineering
description	Design and iterate Claude Code skills: SKILL.md structure, description formulas, content architecture, and quality evaluation. Invoke whenever task involves any interaction with Claude Code skills — creating, reviewing, evaluating, debugging, or improving skills.

Skill Engineering

Skills are prompt templates that extend Claude with domain expertise. A skill lives in skill-name/SKILL.md with an optional references/ directory for deepening material. SKILL.md must be behaviorally self-sufficient — an agent reading only SKILL.md, without loading any references, must be able to do the job correctly. References provide depth, not breadth. Description triggers activation; instructions shape behavior. Claude sees only name and description at startup, then loads full SKILL.md content when triggered.

**Skills are prompts.** Before writing or improving a skill, invoke `prompt-engineering` to load instruction design techniques.

Skill(ai-helpers:prompt-engineering)

Skip only for trivial edits (typos, formatting).

Route to Reference

SKILL.md format, frontmatter rules, directory layout — [${CLAUDE_SKILL_DIR}/references/spec.md] All frontmatter fields (open standard + Claude Code extensions), name rules, string substitutions, progressive disclosure mechanics (1%/8K budget, 250-char cap), discovery/precedence, instruction budget, SDK behavior differences
Creating a skill from scratch — [${CLAUDE_SKILL_DIR}/references/creation.md] Step-by-step creation workflow, scope sizing, description writing formula with examples, evaluation-driven development, archetype deep dives with structural patterns, pre-delivery checklist
Evaluating skill quality (review, audit) — [${CLAUDE_SKILL_DIR}/references/evaluation.md] Scoring rubric (5 dimensions × 20pts), activation rate benchmarks, trigger keyword density, testing protocol, common issues by score range
Skill not triggering, wrong output, refinement — [${CLAUDE_SKILL_DIR}/references/iteration.md] Activation reliability research (20%→100%), commitment mechanisms, description optimization tiers, output fixes, restructuring and splitting guidance
Multi-file skills, scripts, subagents, hooks — [${CLAUDE_SKILL_DIR}/references/advanced-patterns.md] Fork pattern, workflow skills, composable skills, dynamic context injection, permission scoping, plugin packaging, Agent SDK integration, agent teams, skill-scoped hooks, model/effort overrides
Debugging activation failures, script errors — [${CLAUDE_SKILL_DIR}/references/troubleshooting.md] Diagnostic steps for structure, activation, output, script, reference, token budget, and plugin cache issues. YAML multiline bug documentation.
Writing persuasive instructions, reasoning — [${CLAUDE_SKILL_DIR}/references/prompt-techniques.md] Instruction budget research, prompt interference detection (Arbiter), CoT trade-offs in persistent context, instruction strengthening escalation, format control, debugging instruction failures
Skill security — authoring and vetting — [${CLAUDE_SKILL_DIR}/references/security.md] Vulnerability taxonomy (26.1% of skills affected), consent gap, enterprise vetting checklist, secure authoring rules, red flags quick reference

Read the relevant reference before proceeding.

Description Formula

The description determines when Claude activates your skill. It's the highest-leverage field — poor descriptions cause missed activations.

[What it does] + [When to invoke — broad domain claim with trigger examples]

What it does — functional description of the skill's purpose. State what the skill covers concretely, not a slogan or tagline.
When to invoke — "Invoke whenever task involves any interaction with X." Claims the domain broadly, then lists specific triggers as examples under the broad claim.

Good — functional description + broad claim:

description: >-
  Go language conventions, idioms, and toolchain. Invoke when task
  involves any interaction with Go code — writing, reviewing,
  refactoring, debugging, or understanding Go projects.

Bad — vague, no trigger surface:

description: Helps with documents

Principles

Lead with function, not slogans. The first sentence must describe what the skill covers concretely. Slogans waste description tokens on zero activation value.
Claim broadly, then list specifics. "Invoke whenever task involves any interaction with X — creating, editing, debugging" beats "Invoke when creating, editing, or debugging X."
Aggressive triggering, graceful de-escalation. Better to trigger and de-escalate inside the skill than to miss activations. Native activation is unreliable (20-50% without enforcement hooks). Design skills to be useful when loaded, not to depend on perfect auto-activation.
Skill dependencies belong in SKILL.md body, not descriptions. Prerequisites like "load prompt-engineering first" are handled by the skill body — putting them in descriptions wastes trigger space.
Philosophy belongs in SKILL.md body. Guiding principles shape behavior inside the skill — not in the description where they have zero activation value.

Activation Reliability

Native skill activation is unreliable. Measured rates across 250+ sandboxed evaluations:

No hook / simple instruction hook: ~20-50% activation
LLM pre-eval hook (API pre-screening): ~80% (can fail on multi-skill prompts)
Forced-eval hook (explicit YES/NO per skill): ~84-100% (most consistent, zero false positives)
Manual /skill-name invocation: 100%

The forced-eval hook works via a commitment mechanism: Claude must evaluate each skill, state YES/NO with reason, then follow through. Simple passive suggestions are ignored. Description optimization alone cannot break the ~50% systemic ceiling — hooks are the path to consistent activation.

Description optimization tiers, hook implementation details: see [${CLAUDE_SKILL_DIR}/references/iteration.md].

Content Architecture

SKILL.md must be behaviorally self-sufficient. An agent reading only SKILL.md — without loading any references — must be able to do the job correctly. References provide depth, not breadth.

What Goes Where

Behavioral rules → SKILL.md body — agent must follow these during work; can't afford a missed reference read
Catalog/lookup content → references/ — agent reads on-demand for specific lookups
Situational content → references/ — only needed in specific phases
Voluminous structural content → references/ — too large to inline, inherently lookup-oriented

If an agent skipping a reference would produce wrong output, that content is behavioral and belongs in SKILL.md.

Working-Resolution / High-Resolution

When a reference contains both rules and depth, use a two-resolution split:

SKILL.md — working-resolution: the thesis, core rules, summary that enables correct behavior
references/ — high-resolution: detailed rubrics, extended examples, full catalogs, edge case coverage

The agent works correctly at working resolution. References let it zoom in.

Structured Data Formats

Format choice affects LLM accuracy by up to 16pp on identical content. Apply these rules when creating or reviewing any skill content:

Use KV lists for independent-entry data — route tables, tool references, scoring rubrics, configuration mappings, hook events, permission modes, model aliases. Each entry stands alone; no cross-row scanning is needed.

Use markdown tables only for genuinely 2D comparisons where cross-criteria scanning IS the point — decision matrices, feature comparisons across two or more alternatives.

Use numbered lists only for sequential steps where order matters — "1. Read input → 2. Validate → 3. Output."

Use bullet lists for rules, directives, and conventions with no ordering — if items can be reordered without changing meaning, use bullets.

The table test: if removing a column would lose comparative meaning → table. Otherwise → KV list.

Audit format choices when creating or reviewing a skill. Converting misused tables to KV lists is a mechanical improvement with measurable accuracy gain (+8.8pp on lookup tasks).

Before (wrong — independent entries in a table):

| Hook Event    | When It Fires          |
|---------------|------------------------|
| PreToolUse    | Before each tool call  |
| PostToolUse   | After each tool call   |
| UserPromptSubmit | On user message     |

After (correct — KV list for independent entries):

- **PreToolUse** — fires before each tool call
- **PostToolUse** — fires after each tool call
- **UserPromptSubmit** — fires on user message submission

Correct — genuine 2D comparison (removing either column loses meaning):

| Frontmatter                      | User invoke | Claude invoke |
| :------------------------------- | :---------- | :------------ |
| (default)                        | Yes         | Yes           |
| `disable-model-invocation: true` | Yes         | No            |
| `user-invocable: false`          | No          | Yes           |

Full format benchmarks and selection rules: see [${CLAUDE_SKILL_DIR}/references/spec.md] and the prompt-engineering skill's structured-data-formats reference.

Route-to-Reference Lists

When a skill has references, include a route list describing what depth each reference provides. Use $\{CLAUDE_SKILL_DIR\} for all reference paths — it resolves to the skill's absolute directory at load time.

Each entry names the topic, provides the path, and describes the contents — enabling informed read decisions. Without content descriptions, agents either over-read (wasting context) or skip (missing depth).

Writing Instructions

Skills are prompts. Apply prompt engineering fundamentals.

Degrees of Freedom

Match instruction specificity to task fragility:

High freedom — multiple approaches are valid, decisions depend on context
Medium freedom — a preferred pattern exists but some variation is acceptable
Low freedom — operations are fragile, consistency is critical, exact sequence required

Declarative vs Procedural

Declarative (bullet-list rules, constraints) — for behavioral boundaries, conventions, safety guardrails. Models utilize factual constraints more reliably across varied inputs. Use for the majority of skill content.
Procedural (numbered steps) — for tasks with strict ordering. Cap at ~10-15 steps; decompose beyond that into sub-procedures (Hierarchical Task Networks).

Default to declarative. Research shows declarative knowledge provides greater performance benefits than procedural in the majority of tasks.

Instruction Placement

Models follow a U-shaped attention curve: instructions at the beginning and end are followed most reliably; middle content suffers from attention decay.

Top 20% (primacy zone): Identity, philosophy, critical constraints
Middle: Detailed rules by topic, route list, examples
Bottom 20% (recency zone): Reinforced critical rules, quality checks

Dual-placement strategy: For rules that absolutely must be followed, state them near the top AND reinforce at the end. Use different phrasing — frame as a principle at the top, as a checklist item at the bottom.

Every Instruction Must Earn Its Place

Research shows unnecessary requirements reduce task success even when the model can follow them. Every instruction competes for attention. Before adding a rule, verify the model's default behavior is insufficient — if deleting the rule doesn't change output quality, remove it.

State Rules as Positive Directives

State rules as positive directives in the body section where they're contextually relevant: "Use pipeline() for stream composition" — not "Don't use .pipe()" in a separate anti-pattern table. Keep an anti-pattern table only when the "don't" side is genuinely non-obvious from the positive rule.

Instruction strengthening patterns, interference detection, CoT trade-offs: see [${CLAUDE_SKILL_DIR}/references/prompt-techniques.md].

Skill Archetypes

Workflow Skill

Sequential phases with clear inputs/outputs and checkpoints. SKILL.md contains the complete workflow; references provide detailed rubrics, templates, or extended checklists. Use disable-model-invocation: true for skills with side effects.

Knowledge Skill

Complete specification for a tool, format, or API. Everything inline — the agent needs the full spec to do the work. References are rare; when present, they hold example collections. Knowledge skills often exceed 500 lines — acceptable when all content is behavioral.

Coding Discipline Skill

Conventions and rules for a language, framework, or platform. Key patterns: philosophy bookends, declarative rules as bullet lists (8-17 rules typical), Application section (writing mode vs reviewing mode), Integration section (relationship to other skills, precedence rules).

Extended structural patterns for all archetypes: see [${CLAUDE_SKILL_DIR}/references/creation.md].

Quick Templates

Simple skill (no references):

---
name: my-skill
description: >-
  [What it does]. Invoke whenever task involves any interaction
  with [domain] — [specific triggers].
---

# My Skill

## Instructions

[Clear, imperative steps or declarative rules]

## Examples

**Input:** [request]
**Output:** [expected result]

Skill with references:

---
name: my-skill
description: >-
  [What it does]. Invoke whenever task involves any interaction
  with [domain] — [specific triggers].
---

# My Skill

[Philosophy or purpose statement]

## References

- **[topic]** — `$\{CLAUDE_SKILL_DIR\}/references/[file].md`
  [type of depth: tables, examples, patterns]

## [Topic Sections]

[Working-resolution rules — complete behavioral spec]

[Pointers to references for extended examples, lookup tables]

Security

Skills execute with system-prompt authority — Claude treats SKILL.md as trusted instructions. Research shows 26.1% of skills in the wild contain vulnerabilities (14 patterns across prompt injection, data exfiltration, privilege escalation, and supply chain categories).

When authoring skills:

Never embed secrets in SKILL.md, references, or scripts
Set allowed-tools to the minimum required — don't claim Bash if Read suffices
Sanitize user-supplied content before interpolating into generated prompts or commands
Pin all dependencies with exact version constraints
Prefer instruction-only skills over script-bundled when possible (scripts are 2.12× more likely to contain vulnerabilities)

When vetting third-party skills: read all content including scripts, verify behavior in sandbox, check for adversarial instructions and exfiltration patterns.

Full vulnerability taxonomy and enterprise vetting checklist: see [${CLAUDE_SKILL_DIR}/references/security.md].

Critical Rules

Deletion test before adding. Every rule competes for attention. Before adding a rule to a skill, verify the model's default behavior is insufficient. If removing the rule doesn't change output quality, it shouldn't exist.
References must not duplicate SKILL.md. References provide genuinely different depth: detailed rubrics, extended examples, full catalogs, comparison tables, edge case coverage. Not restated rules.
Description is activation, not documentation. Every token in the description must increase activation probability. Slogans, philosophy, cross-skill dependencies, and filler verbs have zero activation value.
Declarative by default. Use numbered steps only for workflows with strict ordering. Bullet-list rules for everything else.
One skill, one purpose. If scope creeps, split. Broad skills produce mediocre results because instructions compete for attention.
Check for interference. New rules can conflict with existing harness directives or other loaded skills. Test in fresh context; the executing model smooths over contradictions silently (Observer's Paradox).

Quick Checks

Before deploying:

Related Skills

prompt-engineering — load first for instruction design techniques (skills are prompts)
subagent-engineering — skills and subagents complement each other; skills run inline, subagents run in isolation
output-style-engineering — output styles replace the system prompt; skills extend it
claude-code-sdk — consult for SKILL.md frontmatter fields, plugin layout, and invocation control details