تشغيل أي مهارة في Manus بنقرة واحدة

rl-isaaclab-benchmark

النجوم١

التفرعات٠

آخر تحديث٣٠ مارس ٢٠٢٦ في ١٩:٠٠

Benchmark and analyze Moleworks IsaacLab RL checkpoints. Use when playing a policy locally, benchmarking excavation checkpoints, ranking multiple checkpoints fairly, plotting TensorBoard progression, or comparing synced run artifacts before recommending a checkpoint.

التثبيت

التثبيت باستخدام Codex أو Claude انسخ هذا Prompt والصقه في Codex أو Claude أو مساعد آخر ليراجع صفحة Skill ويثبّتها لك.

تشغيل في Manus

المصدر

Idate96

Idate96/codex_skills

فتح مستودع GitHub عرض مستودعات المنشئ

تنزيل

تشغيل في Manus

المهن ذات الصلةSOC

استنادا إلى تصنيف SOC المهني

مطوّرو البرمجياتمهن الحاسوب والرياضيات·SOC 15-1252

SKILL.md

readonly

name	rl-isaaclab-benchmark
description	Benchmark and analyze Moleworks IsaacLab RL checkpoints. Use when playing a policy locally, benchmarking excavation checkpoints, ranking multiple checkpoints fairly, plotting TensorBoard progression, or comparing synced run artifacts before recommending a checkpoint.

IsaacLab Benchmark Workflow

Use this skill for checkpoint evaluation, playback, and run-to-run comparison in moleworks_ext.

Source Of Truth

Read only the files needed for the current task:

docs/AI_RESEARCHER_WORKFLOW.md
docs/EXPERIMENTS_ONGOING.md
docs/EXPERIMENTS_RUN.md
scripts/rsl_rl/play.py
scripts/mole_environments/excavation3D/benchmark_excavation.py
scripts/utils/compare_run_configs.py
scripts/plot_tb_scalars.py

Hard Rules

Do not recommend a checkpoint from sync alone.
Keep benchmark settings fixed when comparing checkpoints.
Use temporal plots after pull when benchmark numbers and live W&B summaries disagree.
If the job was only a smoke or diagnostic run, do not pretend there is a production policy artifact to rank.

Main Entry Points

Play: scripts/rsl_rl/play.py
Excavation benchmark: scripts/mole_environments/excavation3D/benchmark_excavation.py
Config diff: scripts/utils/compare_run_configs.py
Temporal plots: scripts/plot_tb_scalars.py

Playback

/workspace/isaaclab/isaaclab.sh -p scripts/rsl_rl/play.py \
  --task <TASK> \
  --checkpoint logs/rsl_rl/<exp>/<run_dir>/model_<N>.pt \
  --num_envs 1

Use --num_envs 1 for visual debugging unless the user explicitly wants a larger headless check.

Benchmark

Excavation3D / w-cabin template:

/workspace/isaaclab/isaaclab.sh -p scripts/mole_environments/excavation3D/benchmark_excavation.py \
  --task Moleworks-Isaac-m445-digging-3D-w-cabin \
  --checkpoint logs/rsl_rl/<exp>/<run_dir>/model_<N>.pt \
  --num_envs 2048 \
  --benchmark_steps 300

For fair cross-run comparison:

keep num_envs fixed
keep benchmark_steps fixed
keep seed fixed
avoid task/config mismatches between checkpoints

Multi-Checkpoint Ranking

When sweeping checkpoints, use one fixed benchmark contract:

RUN_DIR=logs/rsl_rl/<exp>/<run_dir>
for m in 500 1000 1500 2000; do
  /workspace/isaaclab/isaaclab.sh -p scripts/mole_environments/excavation3D/benchmark_excavation.py \
    --task Moleworks-Isaac-m445-digging-3D-w-cabin \
    --checkpoint "${RUN_DIR}/model_${m}.pt" \
    --num_envs 1024 \
    --benchmark_steps 400 \
    --seed 0
done

Rank from the generated benchmark reports, not from "latest checkpoint wins".

Temporal Plots

After pulling or benchmarking a run, generate progression plots:

python3 scripts/plot_tb_scalars.py \
  --run-dir logs/rsl_rl/<exp>/<run_dir> \
  --plot-core \
  --out-dir outputs/analysis/training_curves

Use exact tags or regex when the default core plot is not enough:

python3 scripts/plot_tb_scalars.py --run-dir logs/rsl_rl/<exp>/<run_dir> --list-tags
python3 scripts/plot_tb_scalars.py --run-dir logs/rsl_rl/<exp>/<run_dir> --regex '^Episode_Termination/'

Comparison Hygiene

Use compare_run_configs.py before manual config diffs.
If the task spans many benchmark reports, TensorBoard runs, or W&B records, also use moleworks-subagent-orchestrator.
If playback or debugging also uses a ROS parity stack, also use newton-ros-parity and ros2-debugging and make sure the ROS side is cleaned up before the next IsaacLab launch.

المزيد من هذا المستودع

نفس المستودع

terra-trench

Idate96/codex_skills

Current Moleworks Terra trenching runbook for full autonomous Beam6-style trench execution in simulation or on the robot. Use when investigating or running the two-stage flange/bottom trench flow, generate_trench_sequence_plans.py, beam6_sequence_stage.launch.py, BASE_CONTROL target registration, mesh_to_excavation_grid_map.py, workspace planner trench-axis metadata, Terra behavior-tree activation, Newton or Isaac/Terra simulation bringup, robot bringup, and 400 mm tool handoff.

2026-05-261

chat-replies

Idate96/codex_skills

Read recent Google Chat context, draft or send a reply in the correct DM or space, download collaborator attachments such as timesheets or PDFs, and handle simple meeting coordination by creating or updating a Google Calendar invite and posting the Meet link back in Chat. Use when Lorenzo asks to read a collaborator's recent messages, understand chat context before replying, send a Google Chat reply through the Chat API, pull a PDF or timesheet out of Chat, or create a meeting from a chat exchange.

2026-05-241

dig-bag-replay

Idate96/codex_skills

Replay split DIG bags in the `moleworks_ros` container with bag TF, live self-filter, live elevation mapping, live excavation mapping, and Foxglove. Use when reviewing DIG episodes from `sensors/`, `state/`, `commands/`, `lidar/`, and optional `elevation_map/` bags.

2026-05-241

grading-student

Idate96/codex_skills

Finalize RSL student grading and offboarding. Use when Lorenzo asks to find a student's grading sheet, extract or submit a grade, update the RSL student-project tracker like the onboarding workflow, request eDoz grade entry from admin staff, mark offboarding fields such as completed/report/grading/source/access-revoked only with evidence, or send a short Google Chat status reply after the handoff.

2026-05-241

newton-nav-stack-test

Idate96/codex_skills

Validate the Newton + ROS Nav2 driving stack in a clean tmux session after bringup. Use when the user wants a repeatable navigation check in Newton sim, including health checks for the bridge/model/drive path and the lateral-shift golden test.

2026-05-241

newton-sim-ros-startup

Idate96/codex_skills

Start or restart the Moleworks ROS2 stack using the Newton simulator in the default moleworks_ros runtime shell, assuming the current shell is already inside the target container unless the user says otherwise. Use when you need a clean tmux layout for Newton bridge, robot/TF/RViz, perception (elevation + excavation mapping), optional Foxglove bridge, an isolated bridge-only validation stack on a specific ROS domain, or Terra failure capture and resume from saved checkpoints in Newton simulation, all with use_sim_time:=true.

2026-05-241

name	rl-isaaclab-benchmark
description	Benchmark and analyze Moleworks IsaacLab RL checkpoints. Use when playing a policy locally, benchmarking excavation checkpoints, ranking multiple checkpoints fairly, plotting TensorBoard progression, or comparing synced run artifacts before recommending a checkpoint.

IsaacLab Benchmark Workflow

Use this skill for checkpoint evaluation, playback, and run-to-run comparison in moleworks_ext.

Source Of Truth

Read only the files needed for the current task:

docs/AI_RESEARCHER_WORKFLOW.md
docs/EXPERIMENTS_ONGOING.md
docs/EXPERIMENTS_RUN.md
scripts/rsl_rl/play.py
scripts/mole_environments/excavation3D/benchmark_excavation.py
scripts/utils/compare_run_configs.py
scripts/plot_tb_scalars.py

Hard Rules

Do not recommend a checkpoint from sync alone.
Keep benchmark settings fixed when comparing checkpoints.
Use temporal plots after pull when benchmark numbers and live W&B summaries disagree.
If the job was only a smoke or diagnostic run, do not pretend there is a production policy artifact to rank.

Main Entry Points

Play: scripts/rsl_rl/play.py
Excavation benchmark: scripts/mole_environments/excavation3D/benchmark_excavation.py
Config diff: scripts/utils/compare_run_configs.py
Temporal plots: scripts/plot_tb_scalars.py

Playback

/workspace/isaaclab/isaaclab.sh -p scripts/rsl_rl/play.py \
  --task <TASK> \
  --checkpoint logs/rsl_rl/<exp>/<run_dir>/model_<N>.pt \
  --num_envs 1

Use --num_envs 1 for visual debugging unless the user explicitly wants a larger headless check.

Benchmark

Excavation3D / w-cabin template:

/workspace/isaaclab/isaaclab.sh -p scripts/mole_environments/excavation3D/benchmark_excavation.py \
  --task Moleworks-Isaac-m445-digging-3D-w-cabin \
  --checkpoint logs/rsl_rl/<exp>/<run_dir>/model_<N>.pt \
  --num_envs 2048 \
  --benchmark_steps 300

For fair cross-run comparison:

keep num_envs fixed
keep benchmark_steps fixed
keep seed fixed
avoid task/config mismatches between checkpoints

Multi-Checkpoint Ranking

When sweeping checkpoints, use one fixed benchmark contract:

RUN_DIR=logs/rsl_rl/<exp>/<run_dir>
for m in 500 1000 1500 2000; do
  /workspace/isaaclab/isaaclab.sh -p scripts/mole_environments/excavation3D/benchmark_excavation.py \
    --task Moleworks-Isaac-m445-digging-3D-w-cabin \
    --checkpoint "${RUN_DIR}/model_${m}.pt" \
    --num_envs 1024 \
    --benchmark_steps 400 \
    --seed 0
done

Rank from the generated benchmark reports, not from "latest checkpoint wins".

Temporal Plots

After pulling or benchmarking a run, generate progression plots:

python3 scripts/plot_tb_scalars.py \
  --run-dir logs/rsl_rl/<exp>/<run_dir> \
  --plot-core \
  --out-dir outputs/analysis/training_curves

Use exact tags or regex when the default core plot is not enough:

python3 scripts/plot_tb_scalars.py --run-dir logs/rsl_rl/<exp>/<run_dir> --list-tags
python3 scripts/plot_tb_scalars.py --run-dir logs/rsl_rl/<exp>/<run_dir> --regex '^Episode_Termination/'

Comparison Hygiene

Use compare_run_configs.py before manual config diffs.
If the task spans many benchmark reports, TensorBoard runs, or W&B records, also use moleworks-subagent-orchestrator.
If playback or debugging also uses a ROS parity stack, also use newton-ros-parity and ros2-debugging and make sure the ROS side is cleaned up before the next IsaacLab launch.