一键在 Manus 中运行任何 Skill

$pwd:

skill-forge-benchmark

Name: Skill Forge Benchmark
Author: AgriciDaniel

// Benchmark Claude Code skill performance with variance analysis, tracking pass rate, execution time, and token usage across iterations. Runs multiple trials per eval for statistical reliability, aggregates results into benchmark.json, and generates comparison reports between skill versions. Use when user says "benchmark skill", "measure skill performance", "skill metrics", "compare skill versions", "skill performance", "track skill improvement", "skill regression test", or "skill A/B test".

在 Manus 中运行

$ git log --oneline --stat

stars:58

forks:28

updated:2026年3月6日 16:30

SKILL.md

readonly

related-skills.json

同仓库

skill-forge-eval.md

from "AgriciDaniel/skill-forge"

Run evaluation pipelines on Claude Code skills to test triggering accuracy, workflow correctness, and output quality. Spawns executor, grader, comparator, and analyzer sub-agents for parallel evaluation. Generates eval_metadata.json, grading.json, and feedback reports. Use when user says "eval skill", "test skill", "run evals", "evaluate skill", "skill evals", "test skill quality", "run skill tests", or "skill evaluation".

2026-03-0658

skill-forge.md

from "AgriciDaniel/skill-forge"

Ultimate Claude Code skill creator and architect. Designs, scaffolds, builds, reviews, evolves, and publishes production-grade Claude Code skills following the Agent Skills open standard and 3-layer architecture (directive, orchestration, execution). Handles single-file skills, multi-skill orchestrators with sub-skills and subagents, MCP-enhanced workflows, and full skill ecosystems. Industry detection for skill domain. Triggers on: "create skill", "build skill", "new skill", "skill creator", "skill builder", "skill-forge", "design skill", "scaffold skill", "review skill", "improve skill", "publish skill", "skill architecture", "convert skill", "port skill", "multi-platform", "cross-platform", "eval skill", "test skill", "benchmark skill", "skill evals", "measure skill", "skill performance", "skill A/B test".

2026-03-0658

skill-forge-evolve.md

from "AgriciDaniel/skill-forge"

Improve and iterate on existing Claude Code skills based on usage feedback, test results, or changing requirements. Handles under/over-triggering fixes, instruction refinement, new sub-skill addition, and architecture evolution. Use when user says "improve skill", "fix skill", "skill not triggering", "skill triggers too much", "update skill", or "evolve skill".

2026-03-0658

skill-forge-review.md

from "AgriciDaniel/skill-forge"

Audit and validate existing Claude Code skills for quality, triggering accuracy, structure compliance, and best practices. Scores skills on a 0-100 scale and provides prioritized improvement recommendations. Use when user says "review skill", "audit skill", "check skill", "validate skill", or "skill quality".

2026-03-0658

skill-forge-build.md

from "AgriciDaniel/skill-forge"

Scaffold and build Claude Code skills from plans or descriptions. Generates SKILL.md files, sub-skills, scripts, references, agents, and templates following the Agent Skills standard. Use when user says "build skill", "scaffold skill", "generate skill", "create SKILL.md", or "implement skill".

2026-02-1658

skill-forge-convert.md

from "AgriciDaniel/skill-forge"

Convert Claude Code skills to work on OpenAI Codex, Google Gemini CLI, Google Antigravity, and Cursor. Analyzes platform-specific features, generates target files (openai.yaml, AGENTS.md, GEMINI.md, .mdc rules), adapts frontmatter, converts MCP config, and produces compatibility reports. Use when user says "convert skill", "port skill", "multi-platform", "skill for codex", "skill for gemini", "skill for antigravity", "skill for cursor", "cross-platform skill", "convert to codex", "convert to gemini", "convert to antigravity", or "convert to cursor".

2026-02-1658

package.json

"author": "AgriciDaniel"

"repository": "AgriciDaniel/skill-forge"

打开 GitHub 仓库查看创作者相关仓库

$ install --global

$ download --local

在 Manus 中运行

$ useful --forSOC

软件质量保证分析师与测试员计算机与数学类职业15-1253L4

skill-forge-benchmark

Skill Benchmarking & Performance Tracking

Process

Step 1: Define Benchmark Configuration

Step 2: Execute Benchmark Runs

Step 3: Aggregate Results

Step 4: Compare with Previous Iterations

Step 5: Generate Benchmark Report

Step 6: Threshold Gating

Error Handling

Integration with Other Sub-Skills

Skill Benchmarking & Performance Tracking

Process

Step 1: Define Benchmark Configuration

Step 2: Execute Benchmark Runs

Step 3: Aggregate Results

Step 4: Compare with Previous Iterations

Step 5: Generate Benchmark Report

Step 6: Threshold Gating

Error Handling

Integration with Other Sub-Skills

skill-forge-benchmark

同仓库更多 Skills

同仓库更多 Skills

Skill Benchmarking & Performance Tracking

Process

Step 1: Define Benchmark Configuration

Step 2: Execute Benchmark Runs

Step 3: Aggregate Results

Step 4: Compare with Previous Iterations

Step 5: Generate Benchmark Report

Step 6: Threshold Gating

Error Handling

Integration with Other Sub-Skills

Skill Benchmarking & Performance Tracking

Process

Step 1: Define Benchmark Configuration

Step 2: Execute Benchmark Runs

Step 3: Aggregate Results

Step 4: Compare with Previous Iterations

Step 5: Generate Benchmark Report

Step 6: Threshold Gating

Error Handling

Integration with Other Sub-Skills