Run any Skill in Manus with one click

skill-creator

Create, edit, audit, and evaluate golem-powers skills. Use for new skills, structural skill edits, workflow/adapter changes, pre-deploy validation, skill evals, benchmarks, live A/B tests, session JSONL mining, batch miners, and handoff digests. Triggers: create skill, edit skill, audit skill, validate skill, skill eval, live eval, mine session. NOT for invoking existing skills or convergence weaving.

Run Skill in Manus

Overview

Install command

npx skills add https://github.com/EtanHey/golems --skill skill-creator

Copy and paste this command into Claude Code to install the skill

Source

EtanHey/golems

Stars3

Forks2

UpdatedJune 3, 2026 at 21:38

File Explorer

24 files

SKILL.md

readonly

More from this repository

same repository

frustration-capture

EtanHey/golems

Capture user corrections as high-importance BrainLayer entries. Use when user says no/wrong/stop, repeats instructions, expresses frustration, or during session mining. Triggers on: user correction, 'I told you', 'not that', frustration signal.

2026-06-033

drive-usage

EtanHey/golems

Brain Drive filing discipline — where every artifact goes + how to name it. Use WHENEVER touching Google Drive / Brain Drive: uploading, creating folders, saving research prompts/results, audits, plans, transcripts, dashboards, or when about to leave a durable artifact in docs.local/. Teaches the numbered folder model (01_STANDARDS / 02_GROUNDING / 03_RESEARCH / 04_INGEST / 06_ARCHIVE), date-prefixed naming, and the rule: FILE durable artifacts in the right Drive folder — docs.local/ is cache-only. NOT for querying Drive via Gemini (use /braindrive) or web research (use /gemini-research); for >100KB heavy archival defer to /google-drive-archive.

2026-06-033

repogolem

EtanHey/golems

Launch agents in any repo via repoGolem launchers ({name}Claude, {name}Codex, {name}Cursor). Unified flags: -s skip, -c continue, -m model, -p headless, -w worktree cwd. 40 projects registered. Triggers on: spawn agent, launcher, repoGolem, brainlayerClaude, flags.

2026-06-023

weave

EtanHey/golems

Orchestrator-only convergence workflow: mine recent Claude/Codex JSONLs, weave cited findings into an action ledger, route every finding to a disposition, then red-team facts against raw logs. Triggers: weave, weave now, run weave, session weave, convergence weave. Use only when fleet is quiet; NOT for single-session mining or web research.

2026-06-023

architectural-conformance-audit

EtanHey/golems

Pre-R0 sprint gate that diffs implementation vs SOTA research output verbatim. Surfaces cited counter-examples and architectural mismatches before sprint hooks fire. Triggers: 'before R0', 'architectural audit', 'verify against research'. NOT for per-PR review or post-merge.

2026-06-013

code-review

EtanHey/golems

Full code review lifecycle: requesting reviews (CodeRabbit, Greptile, Bugbot, GitHub PR comments) and receiving feedback (classify issues, implement fixes, push back on wrong suggestions). Use when: creating a PR review, reading review comments, handling reviewer feedback, fixing review items, or deciding whether to accept or reject a suggestion. NOT for: running tests directly or CI/CD pipeline issues (use relevant repo tools).

2026-06-013

Source

EtanHey

EtanHey/golems

View GitHub Repository View Creator Repositories

Install command

Download

Run Skill in Manus

Useful forSOC

Software DevelopersComputer and Mathematical Occupations15-1252L4

name

skill-creator

description

Skill Creator

Take skills from "drafted" to "proven" through rigorous multi-layer evaluation.

Installation

bash ~/Gits/golems/skills/golem-powers/skill-creator/scripts/install.sh

The installer is idempotent — safe to re-run. It:

Symlinks the skill into ~/.claude/commands/skill-creator (golem-powers convention)
Ensures ~/Gits/skill-creator/.claude/agents/ and scripts/ exist as the project-scope home
Symlinks the session-miner agent definition into the project-scope agents dir (gates the sub-agent so only skillCreator-family sessions can spawn it)
Symlinks the parser into the project-scope scripts dir

Run bash scripts/install.sh --dry-run to preview without changes. bash scripts/install.sh --help for usage.

Core Loop: DRAFT → EVAL → RED → GREEN → SMOKE → SHIP

Every skill change goes through this pipeline. No exceptions.

1. DRAFT — Understand the Skill

Read the SKILL.md or MCP tool definition completely
brain_search for: prior evals, user complaints, known issues, related skills
Identify: what does this skill claim to do? What's the baseline without it?
Document intent, inputs, outputs, edge cases
Classify: capability uplift (may become obsolete) vs encoded preference (durable)

2. EVAL DESIGN — Create Structured Test Cases

Design eval cases as structured JSON in evals/evals.json:

{
  "id": 1,
  "name": "descriptive-kebab-name",
  "category": "compliance|structure|quality",
  "description": "What this eval tests",
  "prompt": "The input scenario given to the agent",
  "assertions": [
    {
      "type": "tool_usage|content|negative",
      "name": "assertion-name",
      "description": "What correct behavior looks like"
    }
  ]
}

Cover: happy path, edge cases, failure modes, interaction with other skills. See references/scoring-rubric.md for scoring methodology.

3. RED — Baseline Measurement (without skill)

Run each eval WITHOUT the skill loaded
Score baseline performance across all eval cases
Document baseline scores
Gate: If baseline >70%, the skill may not be adding value — flag it

4. GREEN — With-Skill Measurement

Run each eval WITH the skill loaded
Score delta between with_skill and without_skill
Gate: Delta <10% = marginal, consider retiring. Delta >30% = clearly valuable.
Maximum 3 iterations before flagging for human review

5. SMOKE — Live Agent Testing

Two tiers based on skill importance:

Tier A: Static Smoke (all skills)

Review evals.json assertions manually
Verify no contradictions between skill instructions and assertions
Check that strongest discriminators are tested

Tier B: Live Agent Smoke (flagship skills only)

Run live-eval-runner.sh for top 3 discriminator evals
Agent routing: Claude skills → Sonnet, code skills → Codex, audit skills → Cursor
Compare captured output against assertions
Gate: Live delta must be within 15% of static delta
Results stored in evals/results/live-{date}.json and brain_store'd

See workflows/live-eval.md for the full live eval workflow.

6. SHIP — Package and Document

Ensure every skill ships with its evals (no eval = no ship)
brain_store eval results, delta scores, issues found
Update skill metadata (version, last-eval-date, compliance-score)
Follow workflows/create-skill.md and workflows/audit-skill.md for SKILL.md structure compliance before shipping structural changes.
REGISTER a NEW skill — committed ≠ installed. A new golem-powers skill is invisible to every agent until it's symlinked into ~/.claude/skills/<name> + ~/.claude/commands/<name> (run golem-install, which auto-discovers + links every golem-powers/*/ dir; or symlink the one new skill). Then VERIFY it appears in the available-skills list — committing/merging does NOT register it. (2026-05-30: /weave was committed + merged but unusable by any agent until the symlinks existed. See workflows/create-skill.md "Final Step: REGISTER".)

Eval Methodology

with_skill vs without_skill comparison is MANDATORY. No exceptions.

Behavioral Compliance Scoring

Weight	Dimension	What it measures
70%	Compliance	Does the agent follow the skill's instructions?
20%	Structure	Does the output match expected format?
10%	Quality	Is the output actually good/useful?

Scoring Thresholds

Condition	Action
Baseline >70%	Skill may not add value — flag and explain
Delta <10%	Marginal — consider if complexity is worth it
Delta >30%	Clearly valuable — ship with confidence
Compliance <50% with skill	Instructions unclear — rewrite before shipping

Agent Routing for Live Evals

Skill type	Test with	Why
Claude behavior skills	Sonnet (default)	Tests actual Claude compliance
Code implementation skills	Codex (default, no model flag)	Tests code quality
Audit/review skills	Cursor (default)	Tests review thoroughness

Workflows

Workflow	When to use
create-skill.md	Creating a new skill from scratch
audit-skill.md	Auditing existing skill structure, scripts, workflows, and registration before deploy
live-eval.md	Running live A/B tests with real agents
ab-compare.md	Comparing skill versions or platforms
mine-session.md	Mining a Claude Code session JSONL into a 10-section markdown digest (handoff docs, EOD waves, claim verification)

References

Reference	Content
scoring-rubric.md	Full scoring methodology and rubric
subagent-vs-skill.md	Classification reference — READ BEFORE SCAFFOLDING any new capability. Distinguishes skill (slash-triggered, parent-context) from sub-agent (name-spawned, isolated-context). Documents the misroute pattern (GitHub openai/codex#18823) that even Codex itself routinely makes.

Integration with Other Skills

/cmux-agents — spawning live eval agents in cmux panes
/never-fabricate — Read() every file before reporting results
/pr-loop — shipping skill changes through the full PR lifecycle

Packaged Sub-Agents (skill-creator family only)

These sub-agents ship in BOTH Claude and Codex formats. They are invokable only from sessions with cwd inside ~/Gits/skill-creator/ (i.e., skillCreatorClaude / skillCreatorCodex / skillCreatorRepoGolem). Sessions in other repos cannot spawn them directly — they must dispatch a skillCreator first.

Canonical source-of-truth lives in this skill's agents/ dir. scripts/install.sh symlinks them into the project repo's project-scope dirs (.claude/agents/ for Claude, .codex/agents/ for Codex). The dual-format packaging means the same sub-agent is available regardless of whether the parent is Claude Code or Codex CLI.

Sub-agent	Claude format	Codex format	Workflow
`session-miner`	`agents/session-miner.md` (model: inherit)	`agents/session-miner.toml` (model: gpt-5.3-codex-spark)	mine-session.md

Both formats invoke the same deterministic parser at scripts/session-miner.py. The Claude format uses subagent_type="session-miner" via the Agent tool; the Codex format is referenced by name in natural-language prompts (session_miner, mine X to Y).

For when to choose sub-agent vs skill: see references/subagent-vs-skill.md.

Critical Rules

Never claim "tests pass" without running them. Read actual output.
Every skill ships with evals. No eval = no ship.
brain_search BEFORE starting work — someone may have already evaluated it.
brain_store AFTER completing work — results, delta scores, decisions.
Max 3 iterations per skill before flagging for human review.
Read() every file you cite. No citing from compacted summaries.
Sequential skill compounding — build skills one at a time, each learning from the last.