Skip to main content
Ejecuta cualquier Skill en Manus
con un clic

debug-training-logs

Debug distributed training failures (NeMo, Megatron, PyTorch) from worker stderr logs and optional AIStore daemon logs. Finds root cause across NCCL timeouts, data loading errors, and storage failures.

Estrellas17.390
Forks3436
Actualizado15 de abril de 2026, 20:53
SKILL.md
readonly