Skip to main content
Run any Skill in Manus
with one click
$pwd:

xla-tuning

// Find the XLA flag / NCCL env-var combination that maximizes steady-state TGS for one (model × parallelism) cell. Produces an evidence-backed leaderboard, mechanistic explanation of the winning flag, and a deployment recipe. Use when the user asks to tune XLA flags, tune NCCL, find best collective-permute / all-gather threshold, optimize FSDP/PP/TP, close a parallelism-vs-parallelism throughput gap, or sweep cross-iteration prefetch / overlap-limit / async-stream-priority knobs for a specific model.

$ git log --oneline --stat
stars:28
forks:3
updated:May 9, 2026 at 18:04
SKILL.md
readonly