Skip to main content
Execute qualquer Skill no Manus
com um clique

xla-tuning

// Find the XLA flag / NCCL env-var combination that maximizes steady-state TGS for one (model × parallelism) cell. Produces an evidence-backed leaderboard, mechanistic explanation of the winning flag, and a deployment recipe. Use when the user asks to tune XLA flags, tune NCCL, find best collective-permute / all-gather threshold, optimize FSDP/PP/TP, close a parallelism-vs-parallelism throughput gap, or sweep cross-iteration prefetch / overlap-limit / async-stream-priority knobs for a specific model.

$ git log --oneline --stat
stars:28
forks:3
updated:9 de maio de 2026 às 18:04
SKILL.md
readonly