一键导入
training-campaign
Execute and monitor long-running RL training campaigns. Progress tracking, checkpoint management, experiment logging, and resume capabilities.
用 Codex 或 Claude 帮你安装 复制这段 Prompt,粘贴到 Codex、Claude 或其他助手里,让它检查 Skill 页面并帮你完成安装。
菜单
Execute and monitor long-running RL training campaigns. Progress tracking, checkpoint management, experiment logging, and resume capabilities.
用 Codex 或 Claude 帮你安装 复制这段 Prompt,粘贴到 Codex、Claude 或其他助手里,让它检查 Skill 页面并帮你完成安装。
基于 SOC 职业分类
Index skill for VBot quadruped RL training. Routes to specialized skills for curriculum learning, hyperparameter optimization, reward/penalty engineering, and campaign management.
Multi-stage curriculum training for VBot quadruped navigation. Stage progression with warm-starts and promotion criteria.
Unified PPO hyperparameter and reward/penalty weight search for VBot navigation. Grid, random, and Bayesian optimization across learning rate, network architecture, training dynamics, and reward scales.
Master understanding and reasoning about MuJoCo MJCF XML model files. Enables accurate interpretation of robot/scene definitions, kinematic trees, physics configurations, and simulation parameters.
Comprehensive tutoring for the MotrixArena S1 quadruped robot navigation competition. Covers VBot robot design, reinforcement learning strategies, reward function engineering, terrain traversal techniques, and scoring optimization to achieve top rankings.
Methodology for exploring, testing, and archiving reward/penalty functions for VBot quadruped navigation. A process-oriented guide for systematic reward discovery.
| name | training-campaign |
| description | Execute and monitor long-running RL training campaigns. Progress tracking, checkpoint management, experiment logging, and resume capabilities. |
Long-running training management for VBot navigation:
🔴 AutoML-First Policy (MANDATORY): NEVER use
train.pyfor parameter search or reward hypothesis testing. ALWAYS useautoml.pyfor batch search. See.github/copilot-instructions.mdfor the full policy.train.pyis ONLY for: smoke tests (<500K steps),--rendervisual debug, or final deployment runs.Operational Guardrails:
- The AutoML pipeline is tested and working. Do NOT re-read
automl.py,train_one.py, orevaluate.pybefore launching.- When asked to start/resume training, use the commands below directly.
- The pipeline handles import ordering, JSON serialization, and subprocess management internally.
Related Skills:
training-pipeline— Hub with Quick Start commands (start here)curriculum-learning— Define curriculum planshyperparameter-optimization— Search configurationsreward-penalty-engineering— Reward exploration methodology
| Task | Use This |
|---|---|
| Start training campaign | ✅ |
| Resume interrupted run | ✅ |
| Monitor progress | ✅ |
| Checkpoint management | ✅ |
| Review past experiments | ✅ (see Step 0 below) |
| Design rewards | ❌ Use reward-penalty-engineering |
ALWAYS review existing experiments before starting new training. See
training-pipelineskill → "Step 0: Review Experiment History" for the full checklist.
# Quick review: what training exists?
Get-ChildItem starter_kit_log/automl_* -Directory | Select-Object Name
Get-ChildItem runs/<env-name>/ -Directory | Sort-Object Name -Descending | Select-Object -First 5
# Check training progress of latest run
uv run starter_kit_schedule/scripts/check_training.py
# === PRIMARY: AutoML pipeline (USE THIS for all parameter exploration) ===
uv run starter_kit_schedule/scripts/automl.py `
--mode stage `
--budget-hours 8 `
--hp-trials 15
# === SMOKE TEST ONLY (<500K steps, verify code compiles) ===
uv run scripts/train.py --env <env-name> --train-backend torch --max-env-steps 200000
# === VISUAL DEBUGGING ONLY ===
uv run scripts/train.py --env <env-name> --render
# === FINAL DEPLOYMENT RUN (after AutoML found best config) ===
uv run scripts/train.py --env <env-name> --train-backend torch
# Check AutoML state
Get-Content starter_kit_schedule/progress/automl_state.yaml
# TensorBoard (opens web dashboard)
uv run tensorboard --logdir runs/<env-name>
# List checkpoints
Get-ChildItem runs/<env-name>/ -Recurse -Filter "*.pt"
# Play latest checkpoint (visual)
uv run scripts/play.py --env <env-name>
# Play specific checkpoint
uv run scripts/play.py --env <env-name> `
--policy runs/<env-name>/<run_dir>/checkpoints/agent.pt
starter_kit_schedule/
├── templates/ # All YAML templates & config references
│ ├── automl_config.yaml # AutoML configuration template
│ ├── config_template.yaml # Individual training config
│ ├── curriculum_plan_template.yaml
│ ├── plan_template.yaml
│ ├── reward_config_template.yaml
│ └── search_space_template.yaml
├── scripts/
│ ├── automl.py # AutoML search engine (entry point)
│ ├── train_one.py # Single trial subprocess
│ ├── evaluate.py # Read TensorBoard for metrics
│ ├── monitor_training.py # Training monitor & TB analyzer
│ ├── eval_checkpoint.py # Checkpoint evaluator & ranker
│ ├── smoke_test.py # Smoke test & reward budget auditor
│ ├── check_training.py # Quick training progress checker
│ └── progress_watcher.py # Generates WAKE_UP.md for agent context
├── progress/
│ ├── automl_state.yaml # AutoML search state (primary tracking file)
│ └── WAKE_UP.md # Generated by progress_watcher for agent context
├── checkpoints/
│ └── registry.yaml # All checkpoints index
└── reward_library/ # Archived reward/penalty components
starter_kit_log/
└── <automl_id>/ # Self-contained per-run folder
├── configs/ # HP + reward configs per trial
├── experiments/ # Per-experiment summaries
├── index.yaml # Run-level index
└── state.yaml # AutoML state snapshot
runs/ # Training outputs
└── <env-name>/
└── <timestamp>_PPO/
├── checkpoints/ # Policy checkpoints
├── events.out.tfevents.* # TensorBoard logs
└── experiment_meta.json # HP config snapshot
The AutoML pipeline runs as a single process that spawns subprocesses:
run.py (entry point, sets --env <env-name>)
└── automl.py (HP search engine)
├── sample_from_space() → HP config (native Python types)
├── _train_and_eval() → spawns subprocess:
│ └── train_one.py (imports vbot FIRST, then motrix_rl)
│ └── Trainer(env_name, cfg_override=rl_overrides).train()
├── evaluate.py → reads TensorBoard event files
│ └── Returns: final_reward, max_reward, distance_to_target
└── Saves state to: starter_kit_schedule/progress/automl_state.yaml
| Hardware | 50M Steps | 100M Steps |
|---|---|---|
| RTX 3090 | ~4 hours | ~8 hours |
| RTX 4090 | ~2.5 hours | ~5 hours |
| A100 | ~1.5 hours | ~3 hours |
| Issue | Solution |
|---|---|
| Training stuck | Check GPU memory, reduce num_envs |
| OOM error | Reduce num_envs or mini_batches |
| Resume fails | Check current_run.yaml for last checkpoint |
| Metrics missing | Check metrics.jsonl write permissions |
| Lazy robot at long training | Anti-laziness mechanisms disabled or arrival_bonus too small. See reward-penalty-engineering Lazy Robot case study |
| Reward looks good but robot not navigating | Check distance + reached% metrics, not just reward. High reward can come from alive_bonus accumulation |
templates/--resume liberally - Don't restart from scratch