// Monitor and analyze Esperanto training runs. Checks GPU usage, training metrics, W&B logs, and checkpoint status. Use when user asks about training progress, GPU status, or experiment monitoring.
| name | training-monitor |
| description | Monitor and analyze Esperanto training runs. Checks GPU usage, training metrics, W&B logs, and checkpoint status. Use when user asks about training progress, GPU status, or experiment monitoring. |
| allowed-tools | Read, Glob, Grep, Bash |
Help users monitor active or completed Esperanto training experiments.
When monitoring training:
Check GPU Status
nvidia-smi --query-gpu=name,memory.used,memory.total,utilization.gpu --format=csv
Find Active Experiments
ps aux | grep train.pylogs/Analyze Experiment Results
experiments/results/ or checkpoints/training_state.pt for step count and metricswandb/Key Metrics to Report
When reporting, use this structure:
## Training Status
**GPU**: [name] | [memory used/total] | [utilization]%
**Active Runs**: [list or "none"]
**Latest Checkpoint**: [path]
- Step: X
- Success Rate: X%
- Message Norms: X
**Trends**: [improving/stable/declining]