Run any Skill in Manus with one click

$pwd:

evaluation

Name: Evaluation
Author: NVIDIA

// Evaluates accuracy of quantized or unquantized LLMs using NeMo Evaluator Launcher (NEL). Triggers on "evaluate model", "benchmark accuracy", "run MMLU", "evaluate quantized model", "run nel". Handles deployment, config generation, and evaluation execution. Not for quantizing models (use ptq), deploying/serving models (use deployment), or comparing completed baseline-vs-quantized results (use compare-results).

Run Skill in Manus

$ git log --oneline --stat

stars:2,749

forks:405

updated:May 22, 2026 at 18:00

File Explorer

18 files

SKILL.md

readonly

related-skills.json

same repository

compare-results.md

from "NVIDIA/Model-Optimizer"

Establish baseline-vs-candidate evaluation plans, delegate missing evaluations, compare validated results, and decide quantization feasibility. Use when the user asks to compare baseline vs quantized runs, explain an accuracy drop/regression, verify whether a quantized checkpoint is acceptable, or compare NEL/MLflow evaluation outputs. Do NOT use for generic single-model evaluation without comparison intent (use evaluation), live NEL status/debugging (use launching-evals), or generic MLflow browsing without a comparison goal (use accessing-mlflow).

2026-05-212.7k

deployment.md

from "NVIDIA/Model-Optimizer"

Serve a quantized or unquantized LLM checkpoint as an OpenAI-compatible API endpoint using vLLM, SGLang, or TRT-LLM. Use when user says "deploy model", "serve model", "start vLLM server", "launch SGLang", "TRT-LLM deploy", "AutoDeploy", "benchmark throughput", "serve checkpoint", or needs an inference endpoint from a HuggingFace or ModelOpt-quantized checkpoint. Do NOT use for quantizing models (use ptq) or evaluating accuracy (use evaluation).

2026-05-212.7k

launching-evals.md

from "NVIDIA/Model-Optimizer"

Run, monitor, analyze, and debug LLM evaluations via nemo-evaluator-launcher. Covers running evaluations, checking status and live progress, debugging failed runs, exporting artifacts and logs, and analyzing results. ALWAYS triggers on mentions of running evaluations, checking progress, debugging failed evals, analyzing or analysing runs or results, run directories or artifact paths on clusters, Slurm job issues, invocation IDs, or inspecting logs (client logs, server logs, SSH to cluster, tail logs, grep logs). Do NOT use for creating or modifying evaluation configs.

2026-05-212.7k

monitor.md

from "NVIDIA/Model-Optimizer"

Monitor submitted jobs (PTQ, evaluation, deployment) on SLURM clusters. Use when the user asks "check job status", "is my job done", "monitor my evaluation", "what's the status of the PTQ", "check on job <slurm_job_id>", or after any skill submits a long-running job. Also triggers on "nel status", "squeue", or any request to check progress of a previously submitted job.

2026-05-212.7k

ptq.md

from "NVIDIA/Model-Optimizer"

This skill should be used when the user asks to "quantize a model", "run PTQ", "post-training quantization", "NVFP4 quantization", "FP8 quantization", "INT8 quantization", "INT4 AWQ", "quantize LLM", "quantize MoE", "quantize VLM", or needs to produce a quantized HuggingFace or TensorRT-LLM checkpoint from a pretrained model using ModelOpt.

2026-05-212.7k

release-cherry-pick.md

from "NVIDIA/Model-Optimizer"

Cherry-pick merged PRs labeled for a release branch into that branch, then open a PR and apply the cherry-pick-done label. Use when asked to "cherry-pick PRs for release/X.Y.Z", "pick PRs to release branch", or "cherry-pick labeled PRs".

2026-04-272.7k

package.json

"author": "NVIDIA"

"repository": "NVIDIA/Model-Optimizer"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Software Quality Assurance Analysts and TestersComputer and Mathematical Occupations15-1253L4

Config Generation Progress: - [ ] Step 0: Check workspace (if MODELOPT_WORKSPACE_ROOT is set) - [ ] Step 1: Check if nel is installed and if user has existing config - [ ] Step 2: Build the base config file - [ ] Step 3: Configure model path and parameters - [ ] Step 4: Fill in remaining missing values - [ ] Step 5: Confirm tasks (iterative) - [ ] Step 6: Advanced - Multi-node (Data Parallel) - [ ] Step 7: Advanced - Interceptors - [ ] Step 7.5: Check container registry auth for private images (SLURM only) - [ ] Step 8: Run the evaluation - [ ] Step 8.1: Dry-run / NEL CLI config validation - [ ] Step 8.2: Limited-samples canary - [ ] Step 8.3: Full evaluation - [ ] Step 9: Verify completed evaluation run

Field in config.json

What to set

Example

max_position_embeddings

--max-model-len <value>

131072 → --max-model-len 131072

auto_map exists

--trust-remote-code

Only add if model has custom code

Model card signal

What to set

Reasoning model (thinking/CoT)

--reasoning-parser and --reasoning-parser-plugin if a custom parser is provided

Tool-calling support

--enable-auto-tool-choice --tool-call-parser <parser>

Custom vLLM flags documented

Add as specified (e.g., --mamba_ssm_cache_dtype float32)

Framework

Default image

Registry

vLLM

vllm/vllm-openai:latest

DockerHub

SGLang

lmsysorg/sglang:latest

DockerHub

TRT-LLM

nvcr.io/nvidia/tensorrt-llm/release:...

NGC

Evaluation tasks

nvcr.io/nvidia/eval-factory/*:26.03

NGC

# If using pre_cmd or post_cmd (review pre_cmd content before enabling — it runs arbitrary commands): export NEMO_EVALUATOR_TRUST_PRE_CMD=1 # If using nemo_skills.* tasks with self-deployment: export DUMMY_API_KEY=dummy

nel status <canary_invocation_id> nel info <canary_invocation_id> --logs ssh <user>@<host> "grep -i 'traceback\|exception\|error\|failed\|oom\|killed\|timeout\|unauthorized\|rate limit\|sandbox\|container\|judge\|parse\|scoring' <log_path>/*.log"

Field in config.json

What to set

Example

max_position_embeddings

--max-model-len <value>

131072 → --max-model-len 131072

auto_map exists

--trust-remote-code

Only add if model has custom code

Model card signal

What to set

Reasoning model (thinking/CoT)

--reasoning-parser and --reasoning-parser-plugin if a custom parser is provided

Tool-calling support

--enable-auto-tool-choice --tool-call-parser <parser>

Custom vLLM flags documented

Add as specified (e.g., --mamba_ssm_cache_dtype float32)

Framework

Default image

Registry

vLLM

vllm/vllm-openai:latest

DockerHub

SGLang

lmsysorg/sglang:latest

DockerHub

TRT-LLM

nvcr.io/nvidia/tensorrt-llm/release:...

NGC

Evaluation tasks

nvcr.io/nvidia/eval-factory/*:26.03

NGC

name	evaluation
description	Evaluates accuracy of quantized or unquantized LLMs using NeMo Evaluator Launcher (NEL). Triggers on "evaluate model", "benchmark accuracy", "run MMLU", "evaluate quantized model", "run nel". Handles deployment, config generation, and evaluation execution. Not for quantizing models (use ptq), deploying/serving models (use deployment), or comparing completed baseline-vs-quantized results (use compare-results).
license	Apache-2.0

name	evaluation
description	Evaluates accuracy of quantized or unquantized LLMs using NeMo Evaluator Launcher (NEL). Triggers on "evaluate model", "benchmark accuracy", "run MMLU", "evaluate quantized model", "run nel". Handles deployment, config generation, and evaluation execution. Not for quantizing models (use ptq), deploying/serving models (use deployment), or comparing completed baseline-vs-quantized results (use compare-results).
license	Apache-2.0

evaluation

NeMo Evaluator Launcher Assistant

Workspace and Pipeline Integration

Workflow

NeMo Evaluator Launcher Assistant

Workspace and Pipeline Integration

Workflow

evaluation

More from this repository

More from this repository

NeMo Evaluator Launcher Assistant

Workspace and Pipeline Integration

Workflow

NeMo Evaluator Launcher Assistant

Workspace and Pipeline Integration

Workflow