Skip to main content
Manus에서 모든 스킬 실행
원클릭으로

manage-evals

// This skill should be used when the user asks to "trigger an eval", "run evaluation", "run swebench", "run gaia", "run benchmark", "compare eval runs", "compare evaluation results", "check eval regression", "compare benchmark results", "what changed in the eval", "diff eval runs", or mentions triggering, comparing, or reporting on SWE-bench, GAIA, or other benchmark evaluation results. Provides workflow for triggering evaluations on different benchmarks, finding and comparing runs, and reporting performance differences.

$ git log --oneline --stat
stars:744
forks:261
updated:2026년 5월 11일 19:08
파일 탐색기
3 개 파일
SKILL.md
readonly