Jeden Skill in Manus ausführen
mit einem Klick

Jeden Skill in Manus mit einem Klick ausführen

skill-scorer

Rates any SKILL.md on a 0-100 scale across 10 dimensions with a SHIP / REWORK / SCRAP verdict. Evaluates trigger precision, instruction clarity, output predictability, edge case coverage, anti-hallucination guardrails, developer experience, composability, open-source readiness, wow factor, and real-world utility. No flattery — calibrated against Anthropic's own skill-creator rubric. Use this skill whenever someone says "score this skill", "rate my skill", "is this skill good", "skill review", "skill audit", "roast my SKILL.md", "grade this", "will this skill work", "evaluate my skill", "how good is this", or pastes a SKILL.md and asks for feedback. Also trigger when comparing two skills, benchmarking a skill collection, or asking "what's wrong with this skill" — even without the word "score".

In Manus ausführen

Sterne64

Forks0

Aktualisiert26. Mai 2026 um 16:23

Quelle

mturac

mturac/simulacra

GitHub-Repository öffnen Creator-Repositorys ansehen

Installationsbefehl

Download

In Manus ausführen

Nützlich fürSOC

ProjektmanagementspezialistenWirtschafts- und Finanzberufe13-1082L4

SKILL.md

readonly

Mehr aus diesem Repository

gleiches Repository

simulacra

mturac/simulacra

Interactive code simulation engine that spawns ghost personas to review your code before it ships. Each ghost walks through your code line-by-line from a different failure perspective: The Adversary (race conditions, double-submit, idempotency), The Enterprise Edge (business logic gaps, multi-currency, B2B tax exemptions, locale), Chaos Monkey (timeouts, partial failures, network partitions, OOM). Produces a chaos timeline, structured bug report with severity ratings, and three resolution tiers (quick fix, robust architecture, generated test suite). Use this skill whenever the user mentions code review, PR review, "find bugs", "what could go wrong", "will this break", "is this safe", "stress test", "edge cases", design review, pre-merge check, dry run, chaos test, or wants a second opinion on any code — even if they don't explicitly ask for "ghost personas" or "simulation". Also trigger when someone pastes a code snippet with any question about its correctness or safety.

2026-05-2664

name

skill-scorer

description

Skill Scorer — 0-100 Rating for Any SKILL.md

Calibrated against Anthropic's own skill-creator guidelines and the patterns of 50k+ star repos (Superpowers, Caveman, mattpocock/skills). Average skill scores 45-55. If yours scores 80+, ship it.

Dependencies

None. Standalone. Reads any SKILL.md content.

CRITICAL: Auto-start

SKILL.md content in the message or just created in conversation → skip to Step 2. No preamble.

Step 1. Get the skill

Content in context? Use it. Otherwise:

Paste your SKILL.md or tell me which skill to score.

If the user names a skill in the current environment, read it.

Step 2. Score 10 Dimensions (each 0-10)

Be precise. 6.5 is 6.5, not 7. Every score needs one sentence of evidence referencing the actual skill content.

D1: Trigger Precision (0-10)

Will it fire when it should? Will it stay quiet when it shouldn't?

Evaluate the YAML description field:

Specific trigger phrases listed? (+2)
Natural language variations covered? (+2)
"Pushy" quality per Anthropic guidance? (+2)
Edge-case triggers (bare paste, implicit intent)? (+2)
False positive risk controlled? (+2)

Range	Calibration
0-3	Vague description, undertriggers or fires on everything
4-6	Some triggers but gaps in natural phrasing
7-8	Solid coverage, most natural phrasings caught
9-10	Comprehensive — includes implicit triggers and edge cases

D2: Instruction Clarity (0-10)

Can Claude follow this without improvising?

Unambiguous step sequence? (+2)
Decision points with explicit branches? (+2)
Deterministic enough that two instances produce similar output? (+3)
No contradictory instructions? (+1.5)
Clear scope boundaries (what it does AND doesn't do)? (+1.5)

D3: Output Predictability (0-10)

Does the user know what they'll get?

Output format explicitly defined? (+3)
Templates or examples provided? (+3)
Length/scope appropriate and specified? (+2)
Structured data output (JSON, table) available? (+2)

D4: Edge Case Coverage (0-10)

What happens with weird input?

Empty/missing input handled? (+2)
Extremely long input addressed? (+2)
Invalid/unexpected input types? (+2)
Multi-language or mixed input? (+2)
Graceful degradation path? (+2)

D5: Anti-Hallucination Guardrails (0-10)

Does it prevent Claude from fabricating?

Explicit "don't fabricate" rules? (+2.5)
"If unsure, say so" instruction? (+2.5)
Failure modes section (what skill does NOT do)? (+2.5)
Separation of known vs. needs-verification? (+2.5)

D6: Developer Experience (0-10)

Is it pleasant to use?

Engaging interaction style? (+2.5)
Personality without being annoying? (+2.5)
Appropriate for target audience? (+2.5)
Memorable enough to recommend? (+2.5)

D7: Composability (0-10)

Does it play well with other tools?

Output format consumable by other skills/tools? (+3)
Explicit handoff points? (+3)
Modular (not monolithic)? (+2)
Avoids duplicating existing skills? (+2)

D8: Open-Source Readiness (0-10)

Would a stranger install this from a README?

Self-contained (no private dependencies)? (+2)
Documented well enough for strangers? (+2)
Solves a real, recognized problem? (+2)
Clear naming and discoverability? (+2)
Cross-agent compatibility mentioned? (+2)

D9: Wow Factor (0-10)

Does it make people go "oh damn"?

Novel approach or angle? (+3)
Demo-able in 30 seconds? (+3)
Would someone tweet/share this? (+2)
Solves a problem in a way no one else has? (+2)

D10: Real-World Utility (0-10)

Would someone use this weekly?

Solves a recurring problem? (+3)
Problem is painful enough people seek solutions? (+3)
Faster/better than manual alternative? (+2)
Would someone miss it if removed? (+2)

Step 3. Calculate Composite Score

Formula

RAW = (D1 × 1.2) + (D2 × 1.2) + (D3 × 1.0) + (D4 × 0.8) +
      (D5 × 1.0) + (D6 × 1.0) + (D7 × 0.6) + (D8 × 1.0) +
      (D9 × 1.2) + (D10 × 1.0)

SCORE = RAW  // max possible = 100

Weight rationale: Trigger precision (D1) and wow factor (D9) weighted highest — a skill that never fires or that nobody cares about is dead on arrival. Composability (D7) weighted lowest — standalone value matters more for v1.

Verdict

Score	Verdict	Meaning
80-100	🟢 SHIP IT	Publish. Polish the README and go.
60-79	🟡 REWORK	Good bones. Fix the weak dimensions first.
40-59	🟠 MAJOR REWORK	Concept may be sound. Execution needs rewrite.
0-39	🔴 SCRAP	Start over or reconsider if this should be a skill.

Calibration Anchors

Anthropic's docx skill = ~72 (solid execution, low wow factor)
Superpowers = ~88 (methodology is the product, cross-agent, massive adoption)
Caveman = ~85 (extreme wow, simple concept, measurable benchmark)
A typical first-draft skill = ~35-45

These are reference points, not dogma. Score based on the rubric, not by anchoring to these numbers.

Step 4. Output Format

## 🎯 SKILL SCORE: [Name]

| Dim | Dimension | Score | Evidence |
|-----|-----------|-------|----------|
| D1 | Trigger Precision | X/10 | [one line referencing the skill] |
| D2 | Instruction Clarity | X/10 | ... |
| D3 | Output Predictability | X/10 | ... |
| D4 | Edge Case Coverage | X/10 | ... |
| D5 | Anti-Hallucination | X/10 | ... |
| D6 | Developer Experience | X/10 | ... |
| D7 | Composability | X/10 | ... |
| D8 | Open-Source Readiness | X/10 | ... |
| D9 | Wow Factor | X/10 | ... |
| D10 | Real-World Utility | X/10 | ... |

## COMPOSITE: XX/100 — [VERDICT EMOJI] [VERDICT]

### 🔥 Top 3 Strengths
1. [Specific, with evidence]
2. ...
3. ...

### 💀 Top 3 Weaknesses
1. [Specific, with actionable fix]
2. ...
3. ...

### 🛠️ Priority Fixes
1. [Highest-impact change — specific enough to implement now]
2. [Second]
3. [Third]

Structured Output

Always append after the narrative:

{
  "skill_score": {
    "version": "2.0",
    "skill_name": "...",
    "dimensions": {
      "trigger_precision": 0.0,
      "instruction_clarity": 0.0,
      "output_predictability": 0.0,
      "edge_case_coverage": 0.0,
      "anti_hallucination": 0.0,
      "developer_experience": 0.0,
      "composability": 0.0,
      "open_source_readiness": 0.0,
      "wow_factor": 0.0,
      "real_world_utility": 0.0
    },
    "composite": 0,
    "verdict": "SHIP|REWORK|MAJOR_REWORK|SCRAP",
    "top_strengths": ["...", "...", "..."],
    "top_weaknesses": ["...", "...", "..."],
    "priority_fixes": ["...", "...", "..."]
  }
}

Step 5. Interactive Options

What next?
A) Rewrite the weakest dimension
B) Optimize the trigger description (rewrite YAML)
C) Generate a README.md from this skill
D) Compare against another skill (side-by-side)
E) Score another skill
F) Run the Anthropic compliance check (progressive disclosure, token budget, etc.)

Step 6. Anthropic Compliance Check (option F)

When requested, run this secondary checklist against Anthropic's skill-creator guidelines:

Check	Pass/Fail	Notes
YAML frontmatter has `name` + `description`
Description ≤ 1024 chars
SKILL.md body ≤ 500 lines (~5000 tokens)
Progressive disclosure (3 tiers used correctly)
No hardcoded API keys or secrets
References files have guidance on when to load
Cross-agent install commands documented
Failure modes section present
Examples or demo included

Operational Rules

No flattery. "Great start!" is banned. Be specific or be quiet.
No score inflation. Average = 45-55 out of 100. Not 70.
Evidence required. Every score references the actual skill text.
Fixes must be actionable. "Improve the description" → useless. "Add these 5 trigger phrases: [list]" → useful.
Self-aware. This skill can score itself. Be honest when it does.
No fabricated weaknesses. If something is good, say so.
Match user's language.
Structured JSON output is mandatory.

What Skill Scorer Does NOT Do

Guarantee GitHub stars (that's marketing + timing + luck)
Test the skill at runtime (scores the document, not execution)
Evaluate the underlying AI model capability
Rate artifact/UI aesthetic quality

Compatibility

Works with: Claude Code, Claude.ai, Codex CLI, Cursor, Gemini CLI, OpenCode, GitHub Copilot, Factory Droid, Windsurf, Roo, Goose, Kilo.

Install:

/plugin marketplace add <owner>/simulacra
/plugin install skill-scorer@simulacra