Ejecuta cualquier Skill en Manus
con un clic

Ejecuta cualquier Skill en Manus con un clic

slime-user

Guide for using SLIME (LLM post-training framework for RL Scaling). Use when working with SLIME for reinforcement learning training of language models, including setup, configuration, training execution, multi-turn interactions, custom reward models, tool calling scenarios, or troubleshooting SLIME workflows. Covers GRPO, GSPO, PPO, Reinforce++, multi-agent RL, VLM training, FSDP/Megatron backends, SGLang integration, dynamic sampling, and custom generation functions.

Ejecutar en Manus

Resumen

Comando de instalación

npx skills add https://github.com/yzlnew/infra-skills --skill slime-user

Copia y pega este comando en Claude Code para instalar la habilidad

Fuente

yzlnew/infra-skills

Estrellas128

Forks9

Actualizado14 de enero de 2026, 05:46

Explorador de archivos

4 archivos

SKILL.md

readonly

Más de este repositorio

mismo repositorio

hf-architecture-tikz

yzlnew/infra-skills

Draw Sebastian-Raschka-gallery-style TikZ architecture diagrams for any HuggingFace decoder-only LLM, with per-block parameter formulas and concrete numbers. Supports MHA, GQA, MLA, DeepSeek-V4-Flash (Hyper-Connections + Sparse Attention with learned indexer), dense and MoE FFNs (incl. hash routing), and MTP heads. Use when the user asks to visualize / diagram / illustrate a transformer or LLM architecture (DeepSeek, Qwen, Llama, Mistral, gpt-oss, etc.), wants a Raschka-style figure, or wants a TikZ/LaTeX rendering of an HF model.

2026-05-22128

tikz-flowchart

yzlnew/infra-skills

Creates professional TikZ flowcharts with standardized themes, including Google Material-like and Anthropic-inspired options.

2026-04-15128

html-flowchart-anthropic

yzlnew/infra-skills

Create and revise pure HTML/CSS flowcharts using an Anthropic-inspired design language. Use when Codex needs to produce process diagrams, decision trees, pipelines, or system flows that should share warm ivory backgrounds, transparent dashed grouping containers, pastel node fills, SF Pro-style sans-serif labels, smaller rounded corners, quiet orthogonal connectors, and theme-tinted text hierarchy in standalone `.html` outputs.

2026-04-10128

material-you-slides

yzlnew/infra-skills

Create presentation slides using Material You (Material Design 3) style. Generates 1280x720 HTML slides with M3 color tokens, Roboto typography, rounded cards, flow diagrams, metric cards, code blocks, and structured layouts. Use when the user asks to create slides, presentations, or decks and wants a clean, modern Material Design 3 aesthetic.

2026-02-02128

megatron-memory-estimator

yzlnew/infra-skills

Estimate GPU memory usage for Megatron-based MoE (Mixture of Experts) and dense models. Use when users need to (1) estimate memory from HuggingFace model configs (DeepSeek-V3, Qwen, etc.), (2) plan GPU resource allocation for training, (3) compare different parallelism strategies (TP/PP/EP/CP), (4) determine if a model fits in available GPU memory, or (5) optimize training configurations for memory efficiency.

2026-01-10128

tilelang-developer

yzlnew/infra-skills

Write, optimize, and debug high-performance AI compute kernels using TileLang (a Python DSL for GPU programming). Use when the user requests: (1) Writing custom GPU kernels for AI workloads (GEMM, Attention, MLA, etc.), (2) Optimizing existing TileLang code for NVIDIA, AMD, or Ascend hardware, (3) Implementing non-standard operators (like DeepSeek MLA, FlashAttention variants), (4) Debugging TileLang compilation or runtime errors, or (5) Cross-platform kernel development targeting multiple GPU vendors.

2026-01-07128

Fuente

yzlnew

yzlnew/infra-skills

Abrir repositorio de GitHub Ver repositorios del creador

Comando de instalación

Descarga

Ejecutar en Manus

Útil paraSOC

Desarrolladores de softwareOcupaciones informáticas y matemáticas15-1252L4

name

slime-user

description

SLIME User Guide

SLIME is an LLM post-training framework for RL Scaling developed by THUDM. It supports various RL algorithms (GRPO, GSPO, PPO, Reinforce++), multiple training backends (Megatron, FSDP), and advanced features like multi-turn interactions, tool calling, and dynamic sampling.

Quick Start Workflow

For First-Time Users

Environment Setup
- Use Docker: docker pull slimerl/slime:latest
- Or build from source: See docs/en/get_started/quick_start.md
- Hardware: Supports H100/H200, B200 series

Download Model and Data

hf download Qwen/Qwen3-4B --local-dir /root/Qwen3-4B
hf download --repo-type dataset zhuzilin/dapo-math-17k --local-dir /root/dapo-math-17k

Convert Weights (Megatron backend only)

source scripts/models/qwen3-4B.sh
PYTHONPATH=/root/Megatron-LM python tools/convert_hf_to_torch_dist.py \
    ${MODEL_ARGS[@]} \
    --hf-checkpoint /root/Qwen3-4B \
    --save /root/Qwen3-4B_torch_dist

Run Training
```
bash scripts/run-qwen3-4B.sh
```

For Experienced Users

When user needs specific functionality:

Multi-turn/tool calling: Read references/examples_reference.md Search-R1 section
Custom reward models: See custom RM pattern in examples reference
FSDP instead of Megatron: Use --train-backend fsdp, skip weight conversion
Large-scale training: See multi-node examples (GLM-4.5, DeepSeek-R1)
Source code exploration: Check references/source_code_reference.md

Documentation Navigation

SLIME has extensive documentation. Use this guide to find what you need quickly.

Essential Documentation (Read These First)

Quick Start Guide: docs/en/get_started/quick_start.md - Setup and first training run
Usage Guide: docs/en/get_started/usage.md - Comprehensive parameter reference
Example Docs: docs/en/examples/qwen3-4B.md or docs/en/examples/glm4-9B.md

For detailed navigation of all documentation, see references/doc_navigation.md.

Common Tasks → Documentation Mapping

Task	Documentation
First-time setup	`docs/en/get_started/quick_start.md`
Understanding parameters	`docs/en/get_started/usage.md`
Basic training (8 GPUs)	`docs/en/examples/qwen3-4B.md`
Multi-turn tool use	`examples/search-r1/`
Custom generation logic	`docs/en/get_started/customization.md`
Multi-node training	`docs/en/examples/glm4.5-355B-A32B.md`
FSDP backend	`docs/en/get_started/usage.md` (FSDP section)
VLM training	`examples/geo3k_vlm/`
Troubleshooting	`docs/en/get_started/qa.md`

Core Concepts

Training Loop

SLIME uses a "Rollout → Train" loop:

Rollout: Generate responses using SGLang inference
Reward: Compute rewards using reward model
Train: Update model weights using Megatron/FSDP
Repeat for --num-rollout iterations

Key Constraint

rollout-batch-size × n-samples-per-prompt = global-batch-size × num-steps-per-rollout

Resource Allocation Modes

Colocated (training and inference share GPUs):

--actor-num-nodes 1 \
--actor-num-gpus-per-node 8 \
--colocate \
--sglang-mem-fraction-static 0.7

Disaggregated (separate GPUs for training/inference):

--actor-num-nodes 1 \
--actor-num-gpus-per-node 4 \
--rollout-num-gpus 4

Parameter Quick Reference

Essential Parameters

Model Loading:

--hf-checkpoint: HuggingFace model path (for SGLang and FSDP)
--ref-load: Megatron reference model checkpoint
--load: Megatron actor checkpoint (resume training)
--save: Save path for checkpoints

Data:

--prompt-data: JSONL dataset path
--input-key: Field name for prompts (default: "prompt")
--label-key: Field name for labels (default: "label")
--metadata-key: Field name for metadata (default: "metadata")
--apply-chat-template: Apply tokenizer chat template

Rollout:

--rollout-batch-size: Prompts per rollout
--n-samples-per-prompt: Responses per prompt
--rollout-max-response-len: Max response length
--rollout-temperature: Sampling temperature

Training:

--num-rollout: Total training iterations
--num-steps-per-rollout: Optimizer steps per rollout (default: 1)
--global-batch-size: Samples per optimizer step
--advantage-estimator: RL algorithm (grpo, gspo, ppo, reinforce_plus_plus)

Reward Model:

--rm-type: Built-in RM type (e.g., "deepscaler")
--custom-rm-path: Custom RM function path

Backends:

--train-backend: Training backend (megatron or fsdp)
--rollout-num-gpus-per-engine: GPUs per SGLang engine (like tp_size)

For complete parameter reference, see docs/en/get_started/usage.md.

Common Workflows

1. Standard Single-Turn Training

Use example scripts as templates:

scripts/run-qwen3-4B.sh: Basic 8xH100 setup
scripts/run-glm4-9B.sh: With dynamic sampling

Key sections in script:

# Load model config
source scripts/models/qwen3-4B.sh

# Configure checkpoints
CKPT_ARGS=(--hf-checkpoint /root/Qwen3-4B ...)

# Configure rollout
ROLLOUT_ARGS=(
  --rollout-batch-size 32
  --n-samples-per-prompt 8
  --rm-type deepscaler
)

# Configure algorithm
GRPO_ARGS=(--advantage-estimator grpo ...)

# Run training
ray job submit ... -- python3 train.py \
  ${MODEL_ARGS[@]} ${CKPT_ARGS[@]} ${ROLLOUT_ARGS[@]} ...

2. Multi-Turn Tool Calling

For multi-turn scenarios (like Search-R1):

Prepare Data with metadata:

{
  "question": "User query",
  "final_answer": "Expected answer",
  "metadata": "{\"session_id\": \"123\", \"tool_code\": \"...\"}"
}

Implement Custom Generation Function:

async def generate(args, sample: Sample, sampling_params) -> Sample:
    for turn in range(max_turns):
        # Generate action
        model_output = await call_sglang(...)
        sample.loss_mask += [1] * len(model_tokens)  # Train on actions

        # Execute tool
        tool_output = await execute_tool(...)
        sample.loss_mask += [0] * len(tool_tokens)  # Mask tool outputs

        if action == "answer":
            break

    sample.tokens = prompt_tokens + response_tokens
    sample.response_length = len(response_tokens)
    return sample

Configure Custom Functions:

--custom-generate-function-path my_module.generate \
--custom-rm-path my_module.reward_func \
--metadata-key metadata

See examples/search-r1/ for complete example.

3. Dynamic Sampling (DAPO-style)

Filter low-quality samples during generation:

ROLLOUT_ARGS+=(
  --over-sampling-batch-size 64 \
  --rollout-batch-size 32 \
  --dynamic-sampling-filter-path \
    slime.rollout.filter_hub.dynamic_sampling_filters.check_reward_nonzero_std
)

How it works:

Samples 64 prompts (over-sampling)
Filters groups based on reward diversity
Keeps only 32 prompts × 8 samples that pass filter
Automatically resamples if too many filtered out

4. FSDP Backend (No Weight Conversion)

--train-backend fsdp \
--hf-checkpoint /root/Qwen3-4B \
--gradient-checkpointing \
--context-parallel-size 2

Benefits:

No HF → Megatron weight conversion needed
Directly load HuggingFace checkpoints
Simpler setup for supported models

See examples/geo3k_vlm/ and docs/en/get_started/usage.md FSDP section.

5. Multi-Node Training

Start Ray cluster:

# Head node
ray start --head --node-ip-address ${MASTER_ADDR} --num-gpus 8

# Worker nodes
ray start --address=${MASTER_ADDR}:6379 --num-gpus 8

Submit job:

ray job submit --address="http://127.0.0.1:8265" \
  --runtime-env-json='{"env_vars": {"PYTHONPATH": "/root/Megatron-LM/"}}' \
  -- python3 train.py \
  --actor-num-nodes 8 \
  --actor-num-gpus-per-node 8 \
  ...

See docs/en/examples/glm4.5-355B-A32B.md for large-scale example.

Customization Guide

Custom Reward Model

Implement async function:

async def my_reward_func(args, sample: Sample, **kwargs) -> float:
    # Access sample fields
    prompt = sample.prompt
    response = sample.response
    label = sample.label

    # Compute reward
    reward = compute_score(response, label)
    return float(reward)

Use with: --custom-rm-path module.path:my_reward_func

Custom Generation Function

Implement async function:

async def my_generate(args, sample: Sample, sampling_params) -> Sample:
    # Load tokenizer
    from slime.utils.processing_utils import load_tokenizer
    tokenizer = load_tokenizer(args.hf_checkpoint, trust_remote_code=True)

    # Generate response (call SGLang API or custom logic)
    from slime.utils.http_utils import post
    output = await post(
        f"http://{args.sglang_router_ip}:{args.sglang_router_port}/generate",
        {"text": sample.prompt, "sampling_params": sampling_params}
    )

    # Set sample fields
    prompt_tokens = tokenizer(sample.prompt, add_special_tokens=False)["input_ids"]
    response_tokens = tokenizer(output["text"], add_special_tokens=False)["input_ids"]

    sample.tokens = prompt_tokens + response_tokens
    sample.response_length = len(response_tokens)
    sample.response = output["text"]
    sample.truncated = output["meta_info"]["finish_reason"]["type"] == "length"

    return sample

Use with: --custom-generate-function-path module.path:my_generate

Custom Dynamic Filter

Implement filter function:

def my_filter(args, samples: list[Sample], **kwargs) -> bool:
    # Return True to keep samples, False to discard
    return all(sample.reward > 0.5 for sample in samples)

Use with: --dynamic-sampling-filter-path module.path:my_filter

Examples Reference

For detailed examples and patterns, see references/examples_reference.md.

Quick finder:

Basic math training: scripts/run-qwen3-4B.sh
Multi-turn tool use: examples/search-r1/
Vision-language RL: examples/geo3k_vlm/
Large-scale MOE: docs/en/examples/glm4.5-355B-A32B.md
Custom generation: examples/search-r1/search_r1_logic.py
FSDP backend: examples/geo3k_vlm/

Source Code Reference

For source code exploration, see references/source_code_reference.md.

Key files:

Arguments: slime/utils/arguments.py
Rollout: slime/rollout/sglang_rollout.py
Sample type: slime/utils/types.py
Reward models: slime/rollout/rm_hub/
Conversion tools: tools/convert_hf_to_torch_dist.py

Troubleshooting

Common Issues

OOM during colocated training:

Reduce --sglang-mem-fraction-static (try 0.7 or 0.6)
Reduce --max-tokens-per-gpu
Enable gradient checkpointing: --recompute-granularity full

Mismatched batch sizes:

Ensure: rollout-batch-size × n-samples-per-prompt = global-batch-size × num-steps-per-rollout

Weight conversion errors:

Check model config matches exactly (e.g., --rotary-base)
Use FSDP backend to skip conversion: --train-backend fsdp

Multi-node communication issues:

Set environment variables: GLOO_SOCKET_IFNAME, NCCL_SOCKET_IFNAME
See docs/en/get_started/quick_start.md multi-node section

SGLang concurrency issues:

Limit concurrency: --sglang-server-concurrency 160
Increase CUDA graphs: --sglang-cuda-graph-bs 1 2 4 8 $(seq 16 8 256)

For more troubleshooting, see docs/en/get_started/qa.md.

Additional Resources

Reference Files

Doc Navigation: references/doc_navigation.md - Find documentation quickly
Examples Reference: references/examples_reference.md - Example scripts and patterns
Source Code Reference: references/source_code_reference.md - Code structure and key functions

External Links

GitHub Repository: https://github.com/THUDM/slime
Docker Image: slimerl/slime:latest
Megatron-LM: https://github.com/NVIDIA/Megatron-LM
SGLang: https://github.com/sgl-project/sglang