Skip to main content
Manusで任意のスキルを実行
ワンクリックで

manage-evals

// This skill should be used when the user asks to "trigger an eval", "run evaluation", "run swebench", "run gaia", "run benchmark", "compare eval runs", "compare evaluation results", "check eval regression", "compare benchmark results", "what changed in the eval", "diff eval runs", or mentions triggering, comparing, or reporting on SWE-bench, GAIA, or other benchmark evaluation results. Provides workflow for triggering evaluations on different benchmarks, finding and comparing runs, and reporting performance differences.

$ git log --oneline --stat
stars:744
forks:261
updated:2026年5月11日 19:08
ファイルエクスプローラー
3 ファイル
SKILL.md
readonly