Index skill for VBot quadruped RL training. Routes to specialized skills for curriculum learning, hyperparameter optimization, reward/penalty engineering, and campaign management.

2026-02-120

hyperparameter-optimization

mzqef/MotrixLab

Unified PPO hyperparameter and reward/penalty weight search for VBot navigation. Grid, random, and Bayesian optimization across learning rate, network architecture, training dynamics, and reward scales.

2026-02-110

mjcf-xml-reasoning

mzqef/MotrixLab

Master understanding and reasoning about MuJoCo MJCF XML model files. Enables accurate interpretation of robot/scene definitions, kinematic trees, physics configurations, and simulation parameters.

2026-02-110

quadruped-competition-tutor

mzqef/MotrixLab

Comprehensive tutoring for the MotrixArena S1 quadruped robot navigation competition. Covers VBot robot design, reinforcement learning strategies, reward function engineering, terrain traversal techniques, and scoring optimization to achieve top rankings.

2026-02-110

reward-penalty-engineering

mzqef/MotrixLab

Methodology for exploring, testing, and archiving reward/penalty functions for VBot quadruped navigation. A process-oriented guide for systematic reward discovery.

2026-02-110

Source

mzqef

mzqef/MotrixLab

Ouvrir le dépôt GitHub Voir les dépôts du créateur

Commande d'installation

Téléchargement

Exécuter dans Manus

Utile pourSOC

Scientifiques des donnéesProfessions informatiques et mathématiques15-2051L4

Task

Use This

Design multi-stage curriculum

✅

Define stage progression criteria

✅

Configure warm-start transfers

✅

Single-stage training

❌ Use training-campaign

┌─────────────────────────────────────────────────────────────────────────────┐ │ CURRICULUM STAGES │ ├─────────────────────────────────────────────────────────────────────────────┤ │ │ │ STAGE 1: Simplest Terrain │ │ └── Environment: <easiest-env> │ │ └── Steps: 50M (use AutoML pipeline) │ │ └── Goal: Basic locomotion + navigation │ │ ↓ │ │ [Best Checkpoint] │ │ ↓ │ │ STAGE 2A: Intermediate Terrain │ │ └── Environment: <mid-difficulty-env> │ │ └── Steps: 30M │ │ └── Warm-start: Stage 1 best, LR × 0.5 │ │ ↓ │ │ STAGE 2B: Hard Terrain │ │ └── Environment: <hard-env> │ │ └── Steps: 40M │ │ └── Warm-start: Stage 2A best, LR × 0.3 │ │ ↓ │ │ FINAL: Full Course │ │ └── Environment: <full-course-env> │ │ └── Steps: 50M │ │ └── Goal: End-to-end navigation │ │ │ └─────────────────────────────────────────────────────────────────────────────┘

# starter_kit_schedule/templates/curriculum_full.yaml plan_id: "curriculum_<task>_YYYYMMDD" name: "<Task> Full Curriculum" curriculum: - stage_id: "stage1_easy" environment: "<env-name>" max_env_steps: 50_000_000 checkpoint_interval: 500 warm_start: null # Start fresh reward_overrides: position_tracking: 2.0 heading_tracking: 1.0 termination: -200.0 promotion_criteria: metric: "episode_reward_mean" threshold: 30.0 min_steps: 20_000_000 success_rate: 0.95 - stage_id: "stage2_medium" environment: "<env-name-2>" max_env_steps: 30_000_000 warm_start: from_stage: "stage1_easy" strategy: "best_checkpoint" reset_optimizer: true learning_rate_multiplier: 0.5 reward_overrides: position_tracking: 1.5 # Stage-specific overrides promotion_criteria: metric: "episode_reward_mean" threshold: 25.0 success_rate: 0.85

Stage

Environment

Steps

LR Mult

Key Rewards

1: Easy

<env-1>

50M

1.0

position, heading

2: Medium

<env-2>

30M

0.5

+ terrain adaptation

3: Hard

<env-3>

40M

0.3

+ obstacle avoidance

Final

<full-course>

50M

1.0

all combined

Strategy

When to Use

Config

Best Checkpoint

Default for curriculum

reset_optimizer: true, lr_mult: 0.5

Final Checkpoint

Keep momentum

reset_optimizer: false

Frozen Layers

New observation space

freeze_layers: ["encoder"]

promotion_criteria: metric: "episode_reward_mean" # Primary metric threshold: 30.0 # Minimum to pass min_steps: 20_000_000 # Train at least this much success_rate: 0.95 # Episode success rate patience: 5_000_000 # Steps without improvement before abort

# === PRIMARY: AutoML pipeline for EACH stage (USE THIS) === uv run starter_kit_schedule/scripts/automl.py ` --mode stage ` --budget-hours 8 ` --hp-trials 15 # === MONITOR AutoML === Get-Content starter_kit_schedule/progress/automl_state.yaml # === READ AutoML RESULTS === Get-Content starter_kit_log/automl_*/report.md # === FINAL DEPLOYMENT (after AutoML found best config for this stage) === uv run scripts/train.py --env <env-name> --train-backend torch # === EVALUATE a checkpoint === uv run scripts/play.py --env <env-name> # === TENSORBOARD === uv run tensorboard --logdir runs/<env-name>

# starter_kit_schedule/progress/curriculum_state.yaml curriculum_id: "curriculum_vbot_20260206" current_stage: "stage2_section01" status: "running" stages_completed: - stage_id: "stage1_flat" completed_at: "2026-02-06T18:00:00Z" best_checkpoint: "runs/vbot_navigation_section001/<run>/checkpoints/best_agent.pt" final_metrics: episode_reward_mean: 35.2 success_rate: 0.97 stages_pending: - "stage2_stairs" - "stage2_section03" - "final_long_course"

Issue

Solution

Stage transfer fails

Lower LR mult (0.3×), longer adaptation

Promotion never triggers

Reduce threshold, check success_rate calc

Performance drops after warm-start

Reset optimizer, use smaller LR

Task

Use This

Design multi-stage curriculum

✅

Define stage progression criteria

✅

Configure warm-start transfers

✅

Single-stage training

❌ Use training-campaign

Stage

Environment

Steps

LR Mult

Key Rewards

1: Easy

<env-1>

50M

1.0

position, heading

2: Medium

<env-2>

30M

0.5

+ terrain adaptation

3: Hard

<env-3>

40M

0.3

+ obstacle avoidance

Final

<full-course>

50M

1.0

all combined

Strategy

When to Use

Config

Best Checkpoint

Default for curriculum

reset_optimizer: true, lr_mult: 0.5

Final Checkpoint

Keep momentum

reset_optimizer: false

Frozen Layers

New observation space

freeze_layers: ["encoder"]

Issue

Solution

Stage transfer fails

Lower LR mult (0.3×), longer adaptation

Promotion never triggers

Reduce threshold, check success_rate calc

Performance drops after warm-start

Reset optimizer, use smaller LR

curriculum-learning

Purpose

When to Use

Registered Environments

Curriculum Pipeline

Recommended Progression Pattern

Curriculum Plan Schema

Stage Configuration Reference

Warm-Start Strategies

Promotion Criteria

Commands

Progress State Schema

Troubleshooting

Best Practices

Purpose

When to Use

Registered Environments

Curriculum Pipeline

Recommended Progression Pattern

Curriculum Plan Schema

Stage Configuration Reference

Warm-Start Strategies

Promotion Criteria

Commands

Progress State Schema

Troubleshooting

Best Practices

name	curriculum-learning
description	Multi-stage curriculum training for VBot quadruped navigation. Stage progression with warm-starts and promotion criteria.