一键在 Manus 中运行任何 Skill

training-campaign

星标0

分支0

更新时间2026年2月12日 10:01

Execute and monitor long-running RL training campaigns. Progress tracking, checkpoint management, experiment logging, and resume capabilities.

安装

用 Codex 或 Claude 帮你安装复制这段 Prompt，粘贴到 Codex、Claude 或其他助手里，让它检查 Skill 页面并帮你完成安装。

在 Manus 中运行

来源

mzqef

mzqef/MotrixLab

打开 GitHub 仓库查看创作者相关仓库

下载

在 Manus 中运行

Purpose

Long-running training management for VBot navigation:

Execute multi-day training campaigns
Checkpoint registry and resume
Structured experiment logging
Progress monitoring and alerts

🔴 AutoML-First Policy (MANDATORY): NEVER use train.py for parameter search or reward hypothesis testing. ALWAYS use automl.py for batch search. See .github/copilot-instructions.md for the full policy. train.py is ONLY for: smoke tests (<500K steps), --render visual debug, or final deployment runs.

Operational Guardrails:

The AutoML pipeline is tested and working. Do NOT re-read automl.py, train_one.py, or evaluate.py before launching.

When asked to start/resume training, use the commands below directly.

The pipeline handles import ordering, JSON serialization, and subprocess management internally.

Related Skills:

training-pipeline — Hub with Quick Start commands (start here)

curriculum-learning — Define curriculum plans

hyperparameter-optimization — Search configurations

reward-penalty-engineering — Reward exploration methodology

When to Use

Task	Use This
Start training campaign	✅
Resume interrupted run	✅
Monitor progress	✅
Checkpoint management	✅
Review past experiments	✅ (see Step 0 below)
Design rewards	❌ Use `reward-penalty-engineering`

⚠️ Step 0: Review Before Starting

ALWAYS review existing experiments before starting new training. See training-pipeline skill → "Step 0: Review Experiment History" for the full checklist.

Quick Review Commands

# Quick review: what training exists?
Get-ChildItem starter_kit_log/automl_* -Directory | Select-Object Name
Get-ChildItem runs/<env-name>/ -Directory | Sort-Object Name -Descending | Select-Object -First 5

# Check training progress of latest run
uv run starter_kit_schedule/scripts/check_training.py

Commands

Start Training

# === PRIMARY: AutoML pipeline (USE THIS for all parameter exploration) ===
uv run starter_kit_schedule/scripts/automl.py `
    --mode stage `
    --budget-hours 8 `
    --hp-trials 15

# === SMOKE TEST ONLY (<500K steps, verify code compiles) ===
uv run scripts/train.py --env <env-name> --train-backend torch --max-env-steps 200000

# === VISUAL DEBUGGING ONLY ===
uv run scripts/train.py --env <env-name> --render

# === FINAL DEPLOYMENT RUN (after AutoML found best config) ===
uv run scripts/train.py --env <env-name> --train-backend torch

Monitor Progress

# Check AutoML state
Get-Content starter_kit_schedule/progress/automl_state.yaml

# TensorBoard (opens web dashboard)
uv run tensorboard --logdir runs/<env-name>

# List checkpoints
Get-ChildItem runs/<env-name>/ -Recurse -Filter "*.pt"

Evaluate

# Play latest checkpoint (visual)
uv run scripts/play.py --env <env-name>

# Play specific checkpoint
uv run scripts/play.py --env <env-name> `
    --policy runs/<env-name>/<run_dir>/checkpoints/agent.pt

Directory Structure

starter_kit_schedule/
├── templates/                 # All YAML templates & config references
│   ├── automl_config.yaml     # AutoML configuration template
│   ├── config_template.yaml   # Individual training config
│   ├── curriculum_plan_template.yaml
│   ├── plan_template.yaml
│   ├── reward_config_template.yaml
│   └── search_space_template.yaml
├── scripts/
│   ├── automl.py              # AutoML search engine (entry point)
│   ├── train_one.py           # Single trial subprocess
│   ├── evaluate.py            # Read TensorBoard for metrics
│   ├── monitor_training.py    # Training monitor & TB analyzer
│   ├── eval_checkpoint.py     # Checkpoint evaluator & ranker
│   ├── smoke_test.py          # Smoke test & reward budget auditor
│   ├── check_training.py      # Quick training progress checker
│   └── progress_watcher.py    # Generates WAKE_UP.md for agent context
├── progress/
│   ├── automl_state.yaml      # AutoML search state (primary tracking file)
│   └── WAKE_UP.md             # Generated by progress_watcher for agent context
├── checkpoints/
│   └── registry.yaml          # All checkpoints index
└── reward_library/            # Archived reward/penalty components

starter_kit_log/
└── <automl_id>/               # Self-contained per-run folder
    ├── configs/               # HP + reward configs per trial
    ├── experiments/           # Per-experiment summaries
    ├── index.yaml             # Run-level index
    └── state.yaml             # AutoML state snapshot

runs/                          # Training outputs
└── <env-name>/
    └── <timestamp>_PPO/
        ├── checkpoints/       # Policy checkpoints
        ├── events.out.tfevents.*  # TensorBoard logs
        └── experiment_meta.json   # HP config snapshot

AutoML Pipeline Architecture

The AutoML pipeline runs as a single process that spawns subprocesses:

run.py (entry point, sets --env <env-name>)
  └── automl.py (HP search engine)
       ├── sample_from_space() → HP config (native Python types)
       ├── _train_and_eval() → spawns subprocess:
       │    └── train_one.py (imports vbot FIRST, then motrix_rl)
       │         └── Trainer(env_name, cfg_override=rl_overrides).train()
       ├── evaluate.py → reads TensorBoard event files
       │    └── Returns: final_reward, max_reward, distance_to_target
       └── Saves state to: starter_kit_schedule/progress/automl_state.yaml

Expected Training Times

Hardware	50M Steps	100M Steps
RTX 3090	~4 hours	~8 hours
RTX 4090	~2.5 hours	~5 hours
A100	~1.5 hours	~3 hours

Troubleshooting

Issue	Solution
Training stuck	Check GPU memory, reduce `num_envs`
OOM error	Reduce `num_envs` or `mini_batches`
Resume fails	Check `current_run.yaml` for last checkpoint
Metrics missing	Check `metrics.jsonl` write permissions
Lazy robot at long training	Anti-laziness mechanisms disabled or arrival_bonus too small. See `reward-penalty-engineering` Lazy Robot case study
Reward looks good but robot not navigating	Check distance + reached% metrics, not just reward. High reward can come from alive_bonus accumulation

Best Practices

Checkpoint every 500-1000 iters - Training can be interrupted
Use separate log directories - One per experiment
Monitor GPU memory - Set alerts at 90% usage
Version control configs - Store templates in templates/
Back up best checkpoints - Before advancing stages
Use --resume liberally - Don't restart from scratch

同仓库更多 Skills

同仓库

training-pipeline

mzqef/MotrixLab

Index skill for VBot quadruped RL training. Routes to specialized skills for curriculum learning, hyperparameter optimization, reward/penalty engineering, and campaign management.

2026-02-120

curriculum-learning

mzqef/MotrixLab

Multi-stage curriculum training for VBot quadruped navigation. Stage progression with warm-starts and promotion criteria.

2026-02-110

hyperparameter-optimization

mzqef/MotrixLab

Unified PPO hyperparameter and reward/penalty weight search for VBot navigation. Grid, random, and Bayesian optimization across learning rate, network architecture, training dynamics, and reward scales.

2026-02-110

mjcf-xml-reasoning

mzqef/MotrixLab

Master understanding and reasoning about MuJoCo MJCF XML model files. Enables accurate interpretation of robot/scene definitions, kinematic trees, physics configurations, and simulation parameters.

2026-02-110

quadruped-competition-tutor

mzqef/MotrixLab

Comprehensive tutoring for the MotrixArena S1 quadruped robot navigation competition. Covers VBot robot design, reinforcement learning strategies, reward function engineering, terrain traversal techniques, and scoring optimization to achieve top rankings.

2026-02-110

reward-penalty-engineering

mzqef/MotrixLab

Methodology for exploring, testing, and archiving reward/penalty functions for VBot quadruped navigation. A process-oriented guide for systematic reward discovery.

2026-02-110

Task

Use This

Start training campaign

✅

Resume interrupted run

✅

Monitor progress

✅

Checkpoint management

✅

Review past experiments

✅ (see Step 0 below)

Design rewards

❌ Use reward-penalty-engineering

# Quick review: what training exists? Get-ChildItem starter_kit_log/automl_* -Directory | Select-Object Name Get-ChildItem runs/<env-name>/ -Directory | Sort-Object Name -Descending | Select-Object -First 5 # Check training progress of latest run uv run starter_kit_schedule/scripts/check_training.py

# === PRIMARY: AutoML pipeline (USE THIS for all parameter exploration) === uv run starter_kit_schedule/scripts/automl.py ` --mode stage ` --budget-hours 8 ` --hp-trials 15 # === SMOKE TEST ONLY (<500K steps, verify code compiles) === uv run scripts/train.py --env <env-name> --train-backend torch --max-env-steps 200000 # === VISUAL DEBUGGING ONLY === uv run scripts/train.py --env <env-name> --render # === FINAL DEPLOYMENT RUN (after AutoML found best config) === uv run scripts/train.py --env <env-name> --train-backend torch

# Check AutoML state Get-Content starter_kit_schedule/progress/automl_state.yaml # TensorBoard (opens web dashboard) uv run tensorboard --logdir runs/<env-name> # List checkpoints Get-ChildItem runs/<env-name>/ -Recurse -Filter "*.pt"

# Play latest checkpoint (visual) uv run scripts/play.py --env <env-name> # Play specific checkpoint uv run scripts/play.py --env <env-name> ` --policy runs/<env-name>/<run_dir>/checkpoints/agent.pt

starter_kit_schedule/ ├── templates/ # All YAML templates & config references │ ├── automl_config.yaml # AutoML configuration template │ ├── config_template.yaml # Individual training config │ ├── curriculum_plan_template.yaml │ ├── plan_template.yaml │ ├── reward_config_template.yaml │ └── search_space_template.yaml ├── scripts/ │ ├── automl.py # AutoML search engine (entry point) │ ├── train_one.py # Single trial subprocess │ ├── evaluate.py # Read TensorBoard for metrics │ ├── monitor_training.py # Training monitor & TB analyzer │ ├── eval_checkpoint.py # Checkpoint evaluator & ranker │ ├── smoke_test.py # Smoke test & reward budget auditor │ ├── check_training.py # Quick training progress checker │ └── progress_watcher.py # Generates WAKE_UP.md for agent context ├── progress/ │ ├── automl_state.yaml # AutoML search state (primary tracking file) │ └── WAKE_UP.md # Generated by progress_watcher for agent context ├── checkpoints/ │ └── registry.yaml # All checkpoints index └── reward_library/ # Archived reward/penalty components starter_kit_log/ └── <automl_id>/ # Self-contained per-run folder ├── configs/ # HP + reward configs per trial ├── experiments/ # Per-experiment summaries ├── index.yaml # Run-level index └── state.yaml # AutoML state snapshot runs/ # Training outputs └── <env-name>/ └── <timestamp>_PPO/ ├── checkpoints/ # Policy checkpoints ├── events.out.tfevents.* # TensorBoard logs └── experiment_meta.json # HP config snapshot

run.py (entry point, sets --env <env-name>) └── automl.py (HP search engine) ├── sample_from_space() → HP config (native Python types) ├── _train_and_eval() → spawns subprocess: │ └── train_one.py (imports vbot FIRST, then motrix_rl) │ └── Trainer(env_name, cfg_override=rl_overrides).train() ├── evaluate.py → reads TensorBoard event files │ └── Returns: final_reward, max_reward, distance_to_target └── Saves state to: starter_kit_schedule/progress/automl_state.yaml

Hardware

50M Steps

100M Steps

RTX 3090

~4 hours

~8 hours

RTX 4090

~2.5 hours

~5 hours

A100

~1.5 hours

~3 hours

Issue

Solution

Training stuck

Check GPU memory, reduce num_envs

OOM error

Reduce num_envs or mini_batches

Resume fails

Check current_run.yaml for last checkpoint

Metrics missing

Check metrics.jsonl write permissions

Lazy robot at long training

Anti-laziness mechanisms disabled or arrival_bonus too small. See reward-penalty-engineering Lazy Robot case study

Reward looks good but robot not navigating

Check distance + reached% metrics, not just reward. High reward can come from alive_bonus accumulation

name	training-campaign
description	Execute and monitor long-running RL training campaigns. Progress tracking, checkpoint management, experiment logging, and resume capabilities.