Skip to main content
在 Manus 中运行任何 Skill
一键导入
$pwd:

skill-forge-benchmark

// Benchmark Claude Code skill performance with variance analysis, tracking pass rate, execution time, and token usage across iterations. Runs multiple trials per eval for statistical reliability, aggregates results into benchmark.json, and generates comparison reports between skill versions. Use when user says "benchmark skill", "measure skill performance", "skill metrics", "compare skill versions", "skill performance", "track skill improvement", "skill regression test", or "skill A/B test".

$ git log --oneline --stat
stars:58
forks:28
updated:2026年3月6日 16:30
SKILL.md
readonly