Skip to main content
Execute qualquer Skill no Manus
com um clique

skill-forge-benchmark

// Benchmark Claude Code skill performance with variance analysis, tracking pass rate, execution time, and token usage across iterations. Runs multiple trials per eval for statistical reliability, aggregates results into benchmark.json, and generates comparison reports between skill versions. Use when user says "benchmark skill", "measure skill performance", "skill metrics", "compare skill versions", "skill performance", "track skill improvement", "skill regression test", or "skill A/B test".

$ git log --oneline --stat
stars:58
forks:28
updated:6 de março de 2026 às 16:30
SKILL.md
readonly