Skip to main content
Manusで任意のスキルを実行
ワンクリックで

skill-forge-benchmark

// Benchmark Claude Code skill performance with variance analysis, tracking pass rate, execution time, and token usage across iterations. Runs multiple trials per eval for statistical reliability, aggregates results into benchmark.json, and generates comparison reports between skill versions. Use when user says "benchmark skill", "measure skill performance", "skill metrics", "compare skill versions", "skill performance", "track skill improvement", "skill regression test", or "skill A/B test".

$ git log --oneline --stat
stars:58
forks:28
updated:2026年3月6日 16:30
SKILL.md
readonly