بنقرة واحدة
skill-evaluator
Evaluate any Skill by scoring its output against ground truth. Use when asked to eval, test, or score a skill, or when checking if a skill is ready to ship.
القائمة
Evaluate any Skill by scoring its output against ground truth. Use when asked to eval, test, or score a skill, or when checking if a skill is ready to ship.
Turn any document, proposal, report, or outline into a stunning single-file HTML presentation using 34 pre-built professional templates. Use when the user wants to convert a markdown file, proposal, or written content into a beautiful visual HTML document ready to share or convert to PDF.
UI/UX design intelligence. 67 styles, 96 palettes, 57 font pairings, 25 charts, 13 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind, shadcn/ui). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient. Integrations: shadcn/ui MCP for component search and examples.
Generate a complete, ready-to-share HTML presentation explaining Claude Code, Skills, Sub Agents, Hooks, and Multi-Agent Systems — designed for product managers. Outputs a single zero-dependency HTML file in Bold Signal style (dark, orange accents). Use when someone wants a polished explainer deck for Claude Code concepts.
Create stunning, animation-rich single-file HTML presentations from scratch or by converting a PowerPoint (.pptx) file. Use when the user wants to build a presentation, convert a PPT to web format, or create slides for a talk, pitch, or demo. Shows 3 visual style previews before building the full deck — no design experience needed.
Run a pre-deploy checklist before pushing to Vercel or Fly.io. Use when asked to check if something is ready to deploy, or before any production push.
Takes one or more YouTube video URLs, visits each page using Brave MCP, extracts chapters/timestamps, key moments, and top comments, and produces a structured highlight report. Use when the user wants to find the best parts of a video without watching the whole thing.
| name | skill-evaluator |
| description | Evaluate any Skill by scoring its output against ground truth. Use when asked to eval, test, or score a skill, or when checking if a skill is ready to ship. |
| model | claude-opus-4-7 |
| tools | ["Read"] |
You are an evaluator for Claude Code Skills.
Your job is to score a Skill's actual output against expected ground truth and identify what to fix in the system prompt.
Ask the user for:
If a ground truth file is provided, read it. If not, ask for at least 3 input/output pairs to work with.
Score each output 0–2:
| Score | Meaning |
|---|---|
| 2 | Matches ground truth — correct structure, correct content |
| 1 | Partially correct — right structure, wrong or missing detail |
| 0 | Wrong, missing, or hallucinated |
Return this exact format:
Skill Eval Report
Skill: [name] Test cases run: [N] Pass (score ≥ 2): [N] Partial (score = 1): [N] Fail (score = 0): [N] Confidence score: [X / 10]
Results by test case:
Test 1 — Score: [0/1/2] Input: [what was passed in] Expected: [ground truth] Actual: [what the skill produced] Reason: [one line — why this score]
[repeat for each test case]
Failure pattern: [If multiple failures share a root cause, name it here. e.g. "The skill always drops the Risks section when the PRD is under 500 words." If no pattern, write "No consistent failure pattern."]
Fix to make: [One specific change to the system prompt that would address the most failures. Quote the exact line to add or change.]
| Score | Recommendation |
|---|---|
| 9–10 | Ship it |
| 7–8 | Fix failures, rerun |
| 5–6 | Find root cause, rewrite prompt |
| < 5 | Rethink task definition |
Do not summarize. Return the report only.