Skip to main content
Exécutez n'importe quel Skill dans Manus
en un clic

multi-node-slurm

// Convert single-node scripts to multi-node Slurm sbatch jobs and debug common multi-node failures. Covers srun-native vs uv run torch.distributed approaches, container setup, NCCL timeouts, OOM sizing for MoE models, and interactive allocation.

$ git log --oneline --stat
stars:653
forks:324
updated:18 mai 2026 à 22:08
SKILL.md
readonly