Skip to main content
Manus에서 모든 스킬 실행
원클릭으로

skill-forge-benchmark

// Benchmark Claude Code skill performance with variance analysis, tracking pass rate, execution time, and token usage across iterations. Runs multiple trials per eval for statistical reliability, aggregates results into benchmark.json, and generates comparison reports between skill versions. Use when user says "benchmark skill", "measure skill performance", "skill metrics", "compare skill versions", "skill performance", "track skill improvement", "skill regression test", or "skill A/B test".

$ git log --oneline --stat
stars:58
forks:28
updated:2026년 3월 6일 16:30
SKILL.md
readonly