Run any Skill in Manus with one click

$pwd:

add-reward-function

Name: Add Reward Function
Author: THUDM

// Guide for adding a custom reward function in slime and wiring it through --custom-rm-path (and optional reward post-processing). Use when user wants new reward logic, remote/service reward integration, or task-specific reward shaping.

Run Skill in Manus

$ git log --oneline --stat

stars:5,863

forks:844

updated:March 2, 2026 at 03:01

SKILL.md

readonly

name	add-reward-function
description	Guide for adding a custom reward function in slime and wiring it through --custom-rm-path (and optional reward post-processing). Use when user wants new reward logic, remote/service reward integration, or task-specific reward shaping.

Add Reward Function

Implement custom reward logic and connect it to slime rollout/training safely.

When to Use

Use this skill when:

User asks to add new reward computation logic
User asks to integrate an external reward service
User asks to customize reward normalization/post-processing

Step-by-Step Guide

Step 1: Choose Reward Mode

Pick one of these:

Single-sample mode (--group-rm disabled): custom function gets one Sample
Group/batch mode (--group-rm enabled): custom function gets list[Sample]

slime.rollout.rm_hub.__init__.py calls your function via --custom-rm-path.

Step 2: Create Reward Module

Create slime/rollout/rm_hub/<your_rm>.py.

Supported signatures:

async def custom_rm(args, sample):
    return float_reward_or_reward_dict

async def custom_rm(args, samples):
    return list_of_rewards

If using group mode, return one reward per sample in input order.

Step 3: Keep Reward Type Consistent

Return scalar numeric rewards unless your pipeline explicitly uses keyed rewards.
If using reward dicts, ensure downstream reward_key / eval_reward_key is configured.
Keep exceptions explicit for invalid metadata instead of silently returning zeros.

Step 4: Optional Reward Post-Processing

To customize normalization/shaping before advantage computation, add:

def post_process_rewards(args, samples):
    # return (raw_rewards, processed_rewards)
    ...

Wire with:

--custom-reward-post-process-path <module>.post_process_rewards

This hook is consumed in slime/ray/rollout.py.

Step 5: Wire and Validate

Use:

--custom-rm-path slime.rollout.rm_hub.<your_rm>.custom_rm

Common Mistakes

Returning wrong output shape in group mode
Mixing scalar rewards and reward dicts without reward_key config
Doing blocking network calls without async handling
Forgetting to validate reward behavior on truncated/failed samples

Reference Locations

Reward dispatch: slime/rollout/rm_hub/__init__.py
Reward post-process hook: slime/ray/rollout.py
Customization docs: docs/en/get_started/customization.md

related-skills.json

same repository

add-dynamic-filter.md

from "THUDM/slime"

Guide for adding dynamic/filter hooks in slime rollout pipeline. Use when user wants sample-group selection during rollout, buffer filtering before training, or per-sample masking/processing hooks.

2026-03-025.9k

add-eval-dataset-config.md

from "THUDM/slime"

Guide for adding and validating evaluation dataset configuration in slime. Use when user wants to configure eval datasets via --eval-config or --eval-prompt-data, add per-dataset overrides, or customize evaluation rollout behavior.

2026-03-025.9k

add-rollout-function.md

from "THUDM/slime"

Guide for adding a new rollout function in slime and wiring it through --rollout-function-path. Use when user wants to implement custom rollout data generation logic, custom train/eval rollout outputs, or migrate from the default sglang rollout path.

2026-03-025.9k

add-tests-and-ci.md

from "THUDM/slime"

Guide for adding or updating slime tests and CI wiring. Use when tasks require new test cases, CI registration, test matrix updates, or workflow template changes.

2026-03-025.9k

package.json

"author": "THUDM"

"repository": "THUDM/slime"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Data ScientistsComputer and Mathematical Occupations15-2051L4

Software DevelopersL4

name	add-reward-function
description	Guide for adding a custom reward function in slime and wiring it through --custom-rm-path (and optional reward post-processing). Use when user wants new reward logic, remote/service reward integration, or task-specific reward shaping.

Add Reward Function

Implement custom reward logic and connect it to slime rollout/training safely.

When to Use

Use this skill when:

User asks to add new reward computation logic
User asks to integrate an external reward service
User asks to customize reward normalization/post-processing

Step-by-Step Guide

Step 1: Choose Reward Mode

Pick one of these:

Single-sample mode (--group-rm disabled): custom function gets one Sample
Group/batch mode (--group-rm enabled): custom function gets list[Sample]

slime.rollout.rm_hub.__init__.py calls your function via --custom-rm-path.

Step 2: Create Reward Module

Create slime/rollout/rm_hub/<your_rm>.py.

Supported signatures:

async def custom_rm(args, sample):
    return float_reward_or_reward_dict

async def custom_rm(args, samples):
    return list_of_rewards

If using group mode, return one reward per sample in input order.

Step 3: Keep Reward Type Consistent

Return scalar numeric rewards unless your pipeline explicitly uses keyed rewards.
If using reward dicts, ensure downstream reward_key / eval_reward_key is configured.
Keep exceptions explicit for invalid metadata instead of silently returning zeros.

Step 4: Optional Reward Post-Processing

To customize normalization/shaping before advantage computation, add:

def post_process_rewards(args, samples):
    # return (raw_rewards, processed_rewards)
    ...

Wire with:

--custom-reward-post-process-path <module>.post_process_rewards

This hook is consumed in slime/ray/rollout.py.

Step 5: Wire and Validate

Use:

--custom-rm-path slime.rollout.rm_hub.<your_rm>.custom_rm

Common Mistakes

Returning wrong output shape in group mode
Mixing scalar rewards and reward dicts without reward_key config
Doing blocking network calls without async handling
Forgetting to validate reward behavior on truncated/failed samples

Reference Locations

Reward dispatch: slime/rollout/rm_hub/__init__.py
Reward post-process hook: slime/ray/rollout.py
Customization docs: docs/en/get_started/customization.md

add-reward-function

Add Reward Function

When to Use

Step-by-Step Guide

Step 1: Choose Reward Mode

Step 2: Create Reward Module

Step 3: Keep Reward Type Consistent

Step 4: Optional Reward Post-Processing

Step 5: Wire and Validate

Common Mistakes

Reference Locations

More from this repository

More from this repository

Add Reward Function

When to Use

Step-by-Step Guide

Step 1: Choose Reward Mode

Step 2: Create Reward Module

Step 3: Keep Reward Type Consistent

Step 4: Optional Reward Post-Processing

Step 5: Wire and Validate

Common Mistakes

Reference Locations