Skip to main content
Run any Skill in Manus
with one click

nemo-evaluator-sdk

Stars9,996
Forks745
UpdatedJanuary 15, 2026 at 20:38

Evaluates LLMs across 100+ benchmarks from 18+ harnesses (MMLU, HumanEval, GSM8K, safety, VLM) with multi-backend execution. Use when needing scalable evaluation on local Docker, Slurm HPC, or cloud platforms. NVIDIA's enterprise-grade platform with container-first architecture for reproducible benchmarking.

Installation

Install with Codex or Claude Copy this prompt, paste it into Codex, Claude, or another assistant, and let it review the skill page and install it for you.

File Explorer
5 files
SKILL.md
readonly