Skip to main content
Manus에서 모든 스킬 실행
원클릭으로

xla-tuning

// Find the XLA flag / NCCL env-var combination that maximizes steady-state TGS for one (model × parallelism) cell. Produces an evidence-backed leaderboard, mechanistic explanation of the winning flag, and a deployment recipe. Use when the user asks to tune XLA flags, tune NCCL, find best collective-permute / all-gather threshold, optimize FSDP/PP/TP, close a parallelism-vs-parallelism throughput gap, or sweep cross-iteration prefetch / overlap-limit / async-stream-priority knobs for a specific model.

$ git log --oneline --stat
stars:28
forks:3
updated:2026년 5월 9일 18:04
SKILL.md
readonly