Ejecuta cualquier Skill en Manus
con un clic
con un clic
Ejecuta cualquier Skill en Manus con un clic
Comenzar$pwd:
$ git log --oneline --stat
stars:1162
forks:134
updated:28 de abril de 2026, 13:00
SKILL.md
| name | train |
| description | Launch a training run for a robot environment using PPO |
Parse the user's request from $ARGUMENTS and construct a training command.
RAY_ADDRESS= uv run python run_experiment.py train --env <ENV> --logdir <LOGDIR> [OPTIONS...]
| Name | Description |
|---|---|
cartpole | Cartpole swing-up (simplest, good for testing) |
h1 | Unitree H1 standing task |
jvrc_walk | JVRC humanoid basic walking |
jvrc_step | JVRC humanoid stepping with planned footsteps |
| Flag | Default | Description |
|---|---|---|
--n-itr | 20000 | Training iterations |
--lr | 1e-4 | Learning rate |
--gamma | 0.99 | Discount factor |
--std-dev | 0.223 | Action noise |
--learn-std | off | Learn action noise (flag) |
--entropy-coeff | 0.0 | Entropy regularization |
--clip | 0.2 | PPO clipping |
--minibatch-size | 64 | Minibatch size |
--epochs | 3 | Optimization epochs per update |
--num-procs | 12 | Parallel workers |
--num-envs-per-worker | 1 | Vectorized envs per worker |
--max-grad-norm | 0.05 | Gradient clipping |
--max-traj-len | 400 | Episode horizon |
--eval-freq | 100 | Eval every N iterations |
--seed | None | Random seed |
--device | auto | Training device (auto/cpu/cuda) |
--no-mirror | off | Disable symmetry wrapper (flag) |
--recurrent | off | Use LSTM policy (flag) |
--continued | None | Path to pretrained weights |
--logdir /tmp/training_runs unless the user specifies a different path.run_in_background: true on the Bash tool. Set a generous timeout (600000ms).TaskOutput with block: false to check the latest output.For cartpole, these settings are known to work well with the current defaults
(--lr 3e-4 --max-grad-norm 0.5 --lam 0.95 --gamma 0.99):
--minibatch-size 256--std-dev 0.15 --learn-std --entropy-coeff 0.01--max-traj-len 500 --n-itr 500 --num-procs 12--no-mirror (cartpole has no body symmetry)Suggest these defaults when the user trains cartpole, but let them override.