Skip to main content
Manusで任意のスキルを実行
ワンクリックで

run-evaluation

// Run a VLA model evaluation against a simulation benchmark. Use this skill whenever the user wants to evaluate, benchmark, test, or run a model on a sim environment — even if they say it casually like 'try OpenVLA on LIBERO' or 'get me CALVIN scores'. Covers the full workflow: serving the model, launching the benchmark, sharding for speed, merging results, and interpreting output.

$ git log --oneline --stat
stars:331
forks:28
updated:2026年5月7日 10:27
SKILL.md
readonly