Run any Skill in Manus with one click

$pwd:

skill-forge-review

Name: Skill Forge Review
Author: AgriciDaniel

// Audit and validate existing Claude Code skills for quality, triggering accuracy, structure compliance, and best practices. Scores skills on a 0-100 scale and provides prioritized improvement recommendations. Use when user says "review skill", "audit skill", "check skill", "validate skill", or "skill quality".

Run Skill in Manus

$ git log --oneline --stat

stars:58

forks:28

updated:March 6, 2026 at 16:13

SKILL.md

readonly

name	skill-forge-review
description	Audit and validate existing Claude Code skills for quality, triggering accuracy, structure compliance, and best practices. Scores skills on a 0-100 scale and provides prioritized improvement recommendations. Use when user says "review skill", "audit skill", "check skill", "validate skill", or "skill quality".

Skill Review & Validation

Process

Step 1: Locate Skill Files

Accept input as:

Path to a skill directory
Skill name (search in ~/.claude/skills/)
URL to a GitHub repository

Read all .md files, scripts, and asset files.

Step 2: Structure Validation

Run python scripts/validate_skill.py <path> for programmatic checks.

Manual verification:

SKILL.md exists (exact case)
No README.md inside skill folder
Folder name matches name field
Valid kebab-case naming (1-64 chars)
No "claude" or "anthropic" in name

Step 3: Frontmatter Audit

Check	Pass Criteria
Name format	kebab-case, 1-64 chars, no leading/trailing hyphens
Description present	Non-empty, 1-1024 characters
Description has WHAT	Explains capabilities
Description has WHEN	Includes trigger phrases
Description has keywords	Domain-specific terms included
No XML tags	No < or > characters
Optional fields valid	license, compatibility (<500 chars), metadata

Step 4: Triggering Analysis

Assess the description for activation quality:

Under-triggering risks:

Too generic ("Helps with projects")
Missing common paraphrases
No domain keywords
Missing file type mentions (if relevant)

Over-triggering risks:

Too broad ("Processes documents")
Overlaps with built-in Claude capabilities
Missing negative triggers for disambiguation

Generate test queries:

5 queries that SHOULD trigger the skill
5 queries that SHOULD NOT trigger
3 edge cases (ambiguous queries)

Step 5: Instruction Quality

Criterion	Score (0-10)
Specificity	Are instructions actionable? (not "validate properly")
Completeness	All workflows covered?
Error handling	Common failures addressed?
Examples	Concrete examples provided?
Progressive disclosure	Detailed docs in references/ not SKILL.md?
Length	Under 500 lines / 5000 tokens?
Cross-references	Clear links to references/scripts?

Step 6: Architecture Review (Multi-skill)

For skills with sub-skills:

Main skill has clear routing table
Sub-skills have focused responsibilities
Cross-references are valid (files exist)
Naming follows parent-child convention
Shared references in parent, not duplicated
Agents have clear roles (if Tier 4)

Step 7: Script Quality (if present)

Docstrings with purpose, input, output
CLI interface (argparse or similar)
Structured output (JSON)
Error handling (try/except with clear messages)
No hardcoded paths or secrets
Minimal dependencies

Step 8: Generate Skill Health Score

Scoring methodology (0-100):

Category	Weight	Checks
Frontmatter Quality	25%	Name, description, format
Trigger Accuracy	20%	WHAT + WHEN + keywords
Instruction Quality	25%	Specificity, completeness, examples
Structure Compliance	15%	File naming, organization, references
Script Quality	10%	If applicable (full marks if no scripts needed)
Progressive Disclosure	5%	Proper use of 3-level system

Step 9: Generate Trigger Eval Set

After reviewing, generate a structured trigger eval set for ongoing testing:

Run python scripts/generate_eval_set.py <path> to auto-generate a starter set
Review and refine the generated queries:
- Ensure 8-10 should-trigger queries cover different phrasings and edge cases
- Ensure 8-10 should-not-trigger queries are near-misses (not obviously irrelevant)
- Include casual speech, typos, and uncommon domain uses in should-trigger set
Save the eval set to evals/evals.json in the skill directory

Good queries are realistic and specific (include file paths, context, domain details). Bad queries are overly generic ("format this data") or obviously irrelevant.

Run python scripts/optimize_description.py <path> --eval-set evals/evals.json to score the current description and get improvement suggestions
Recommend running /skill-forge eval <path> for full functional evaluation

Step 10: Generate Report

# Skill Review: [name]

## Health Score: [X]/100

## Summary
[2-3 sentence assessment]

## Scores by Category
| Category | Score | Notes |
|----------|-------|-------|
| Frontmatter | X/25 | [issues] |
| Triggering | X/20 | [issues] |
| Instructions | X/25 | [issues] |
| Structure | X/15 | [issues] |
| Scripts | X/10 | [issues] |
| Disclosure | X/5 | [issues] |

## Critical Issues (fix immediately)
- [issue 1]
- [issue 2]

## High Priority (fix within 1 week)
- [issue 1]

## Recommendations
- [suggestion 1]
- [suggestion 2]

## Suggested Test Queries
### Should Trigger
1. [query]
2. [query]
3. [query]

### Should NOT Trigger
1. [query]
2. [query]
3. [query]

related-skills.json

same repository

skill-forge-benchmark.md

from "AgriciDaniel/skill-forge"

Benchmark Claude Code skill performance with variance analysis, tracking pass rate, execution time, and token usage across iterations. Runs multiple trials per eval for statistical reliability, aggregates results into benchmark.json, and generates comparison reports between skill versions. Use when user says "benchmark skill", "measure skill performance", "skill metrics", "compare skill versions", "skill performance", "track skill improvement", "skill regression test", or "skill A/B test".

2026-03-0658

skill-forge-eval.md

from "AgriciDaniel/skill-forge"

Run evaluation pipelines on Claude Code skills to test triggering accuracy, workflow correctness, and output quality. Spawns executor, grader, comparator, and analyzer sub-agents for parallel evaluation. Generates eval_metadata.json, grading.json, and feedback reports. Use when user says "eval skill", "test skill", "run evals", "evaluate skill", "skill evals", "test skill quality", "run skill tests", or "skill evaluation".

2026-03-0658

skill-forge.md

from "AgriciDaniel/skill-forge"

Ultimate Claude Code skill creator and architect. Designs, scaffolds, builds, reviews, evolves, and publishes production-grade Claude Code skills following the Agent Skills open standard and 3-layer architecture (directive, orchestration, execution). Handles single-file skills, multi-skill orchestrators with sub-skills and subagents, MCP-enhanced workflows, and full skill ecosystems. Industry detection for skill domain. Triggers on: "create skill", "build skill", "new skill", "skill creator", "skill builder", "skill-forge", "design skill", "scaffold skill", "review skill", "improve skill", "publish skill", "skill architecture", "convert skill", "port skill", "multi-platform", "cross-platform", "eval skill", "test skill", "benchmark skill", "skill evals", "measure skill", "skill performance", "skill A/B test".

2026-03-0658

skill-forge-evolve.md

from "AgriciDaniel/skill-forge"

Improve and iterate on existing Claude Code skills based on usage feedback, test results, or changing requirements. Handles under/over-triggering fixes, instruction refinement, new sub-skill addition, and architecture evolution. Use when user says "improve skill", "fix skill", "skill not triggering", "skill triggers too much", "update skill", or "evolve skill".

2026-03-0658

skill-forge-build.md

from "AgriciDaniel/skill-forge"

Scaffold and build Claude Code skills from plans or descriptions. Generates SKILL.md files, sub-skills, scripts, references, agents, and templates following the Agent Skills standard. Use when user says "build skill", "scaffold skill", "generate skill", "create SKILL.md", or "implement skill".

2026-02-1658

skill-forge-convert.md

from "AgriciDaniel/skill-forge"

Convert Claude Code skills to work on OpenAI Codex, Google Gemini CLI, Google Antigravity, and Cursor. Analyzes platform-specific features, generates target files (openai.yaml, AGENTS.md, GEMINI.md, .mdc rules), adapts frontmatter, converts MCP config, and produces compatibility reports. Use when user says "convert skill", "port skill", "multi-platform", "skill for codex", "skill for gemini", "skill for antigravity", "skill for cursor", "cross-platform skill", "convert to codex", "convert to gemini", "convert to antigravity", or "convert to cursor".

2026-02-1658

package.json

"author": "AgriciDaniel"

"repository": "AgriciDaniel/skill-forge"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Software DevelopersComputer and Mathematical Occupations15-1252L4

skill-forge-review

Skill Review & Validation

Process

Step 1: Locate Skill Files

Step 2: Structure Validation

Step 3: Frontmatter Audit

Step 4: Triggering Analysis

Step 5: Instruction Quality

Step 6: Architecture Review (Multi-skill)

Step 7: Script Quality (if present)

Step 8: Generate Skill Health Score

Step 9: Generate Trigger Eval Set

Step 10: Generate Report

More from this repository

Skill Review & Validation

Process

Step 1: Locate Skill Files

Step 2: Structure Validation

Step 3: Frontmatter Audit

Step 4: Triggering Analysis

Step 5: Instruction Quality

Step 6: Architecture Review (Multi-skill)

Step 7: Script Quality (if present)

Step 8: Generate Skill Health Score

Step 9: Generate Trigger Eval Set

Step 10: Generate Report

More from this repository