Skip to main content
Execute qualquer Skill no Manus
com um clique

debug-training-logs

Debug distributed training failures (NeMo, Megatron, PyTorch) from worker stderr logs and optional AIStore daemon logs. Finds root cause across NCCL timeouts, data loading errors, and storage failures.

Estrelas17.390
Forks3.436
Atualizado15 de abril de 2026 às 20:53
SKILL.md
readonly