Run any Skill in Manus with one click

$pwd:

add-eval-dataset-config

Name: Add Eval Dataset Config
Author: THUDM

// Guide for adding and validating evaluation dataset configuration in slime. Use when user wants to configure eval datasets via --eval-config or --eval-prompt-data, add per-dataset overrides, or customize evaluation rollout behavior.

Run Skill in Manus

$ git log --oneline --stat

stars:5,863

forks:844

updated:March 2, 2026 at 03:01

SKILL.md

readonly

name	add-eval-dataset-config
description	Guide for adding and validating evaluation dataset configuration in slime. Use when user wants to configure eval datasets via --eval-config or --eval-prompt-data, add per-dataset overrides, or customize evaluation rollout behavior.

Add Eval Dataset Config

Configure evaluation datasets in slime with explicit dataset-level overrides and predictable runtime behavior.

When to Use

Use this skill when:

User asks to add evaluation datasets for periodic eval
User asks to migrate from --eval-prompt-data to structured --eval-config
User asks for per-dataset eval overrides (sampling params, keys, rm_type, metadata)

Step-by-Step Guide

Step 1: Choose Config Entry Method

Supported inputs:

Structured config file: --eval-config <yaml>
Legacy CLI pairs: --eval-prompt-data <name1> <path1> <name2> <path2> ...

If --eval-interval is set, eval datasets must be configured.

Step 2: Build YAML with Required Fields

Each dataset needs at least:

name
path

Example:

eval:
  defaults:
    n_samples_per_eval_prompt: 1
    temperature: 0.7
    top_p: 1.0
  datasets:
    - name: aime
      path: /path/to/aime.jsonl
      rm_type: math
      input_key: prompt
      label_key: answer
      metadata_overrides:
        split: test

Step 3: Understand Override Priority

slime/utils/eval_config.py resolves fields in this order:

Dataset-level values in eval.datasets[*]
eval.defaults
CLI args fallback (for example eval_* or rollout_* fields)

Common overridable fields include:

Runtime: n_samples_per_eval_prompt, temperature, top_p, top_k, max_response_len
Sample keys: input_key, label_key, tool_key, metadata_key
Extra: rm_type, custom_generate_function_path, metadata_overrides

Step 4: Wire Eval Function if Needed

By default, eval uses --eval-function-path (defaults to rollout function path). Use a separate eval function when inference/eval behavior must differ from training rollout.

Step 5: Validate Parsing and Runtime

Start with config parsing sanity by running a short launch command.
Confirm dataset entries are loaded into args.eval_datasets.
Verify output keys match eval logging/metrics expectations.

Common Mistakes

Missing name in dataset entries
Odd-length --eval-prompt-data pairs
Setting --eval-interval without any eval dataset
Mixing reward dict outputs without eval_reward_key configuration

Reference Locations

Eval config model: slime/utils/eval_config.py
Eval config resolution: slime/utils/arguments.py
Eval rollout path: slime/rollout/sglang_rollout.py
Customization docs: docs/en/get_started/customization.md

related-skills.json

same repository

add-dynamic-filter.md

from "THUDM/slime"

Guide for adding dynamic/filter hooks in slime rollout pipeline. Use when user wants sample-group selection during rollout, buffer filtering before training, or per-sample masking/processing hooks.

2026-03-025.9k

add-reward-function.md

from "THUDM/slime"

Guide for adding a custom reward function in slime and wiring it through --custom-rm-path (and optional reward post-processing). Use when user wants new reward logic, remote/service reward integration, or task-specific reward shaping.

2026-03-025.9k

add-rollout-function.md

from "THUDM/slime"

Guide for adding a new rollout function in slime and wiring it through --rollout-function-path. Use when user wants to implement custom rollout data generation logic, custom train/eval rollout outputs, or migrate from the default sglang rollout path.

2026-03-025.9k

add-tests-and-ci.md

from "THUDM/slime"

Guide for adding or updating slime tests and CI wiring. Use when tasks require new test cases, CI registration, test matrix updates, or workflow template changes.

2026-03-025.9k

package.json

"author": "THUDM"

"repository": "THUDM/slime"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Data ScientistsComputer and Mathematical Occupations15-2051L4

Software DevelopersL4

name	add-eval-dataset-config
description	Guide for adding and validating evaluation dataset configuration in slime. Use when user wants to configure eval datasets via --eval-config or --eval-prompt-data, add per-dataset overrides, or customize evaluation rollout behavior.

Add Eval Dataset Config

Configure evaluation datasets in slime with explicit dataset-level overrides and predictable runtime behavior.

When to Use

Use this skill when:

User asks to add evaluation datasets for periodic eval
User asks to migrate from --eval-prompt-data to structured --eval-config
User asks for per-dataset eval overrides (sampling params, keys, rm_type, metadata)

Step-by-Step Guide

Step 1: Choose Config Entry Method

Supported inputs:

Structured config file: --eval-config <yaml>
Legacy CLI pairs: --eval-prompt-data <name1> <path1> <name2> <path2> ...

If --eval-interval is set, eval datasets must be configured.

Step 2: Build YAML with Required Fields

Each dataset needs at least:

name
path

Example:

eval:
  defaults:
    n_samples_per_eval_prompt: 1
    temperature: 0.7
    top_p: 1.0
  datasets:
    - name: aime
      path: /path/to/aime.jsonl
      rm_type: math
      input_key: prompt
      label_key: answer
      metadata_overrides:
        split: test

Step 3: Understand Override Priority

slime/utils/eval_config.py resolves fields in this order:

Dataset-level values in eval.datasets[*]
eval.defaults
CLI args fallback (for example eval_* or rollout_* fields)

Common overridable fields include:

Runtime: n_samples_per_eval_prompt, temperature, top_p, top_k, max_response_len
Sample keys: input_key, label_key, tool_key, metadata_key
Extra: rm_type, custom_generate_function_path, metadata_overrides

Step 4: Wire Eval Function if Needed

By default, eval uses --eval-function-path (defaults to rollout function path). Use a separate eval function when inference/eval behavior must differ from training rollout.

Step 5: Validate Parsing and Runtime

Start with config parsing sanity by running a short launch command.
Confirm dataset entries are loaded into args.eval_datasets.
Verify output keys match eval logging/metrics expectations.

Common Mistakes

Missing name in dataset entries
Odd-length --eval-prompt-data pairs
Setting --eval-interval without any eval dataset
Mixing reward dict outputs without eval_reward_key configuration

Reference Locations

Eval config model: slime/utils/eval_config.py
Eval config resolution: slime/utils/arguments.py
Eval rollout path: slime/rollout/sglang_rollout.py
Customization docs: docs/en/get_started/customization.md

add-eval-dataset-config

Add Eval Dataset Config

When to Use

Step-by-Step Guide

Step 1: Choose Config Entry Method

Step 2: Build YAML with Required Fields

Step 3: Understand Override Priority

Step 4: Wire Eval Function if Needed

Step 5: Validate Parsing and Runtime

Common Mistakes

Reference Locations

More from this repository

More from this repository

Add Eval Dataset Config

When to Use

Step-by-Step Guide

Step 1: Choose Config Entry Method

Step 2: Build YAML with Required Fields

Step 3: Understand Override Priority

Step 4: Wire Eval Function if Needed

Step 5: Validate Parsing and Runtime

Common Mistakes

Reference Locations