Skip to main content
تشغيل أي مهارة في Manus
بنقرة واحدة

xla-tuning

// Find the XLA flag / NCCL env-var combination that maximizes steady-state TGS for one (model × parallelism) cell. Produces an evidence-backed leaderboard, mechanistic explanation of the winning flag, and a deployment recipe. Use when the user asks to tune XLA flags, tune NCCL, find best collective-permute / all-gather threshold, optimize FSDP/PP/TP, close a parallelism-vs-parallelism throughput gap, or sweep cross-iteration prefetch / overlap-limit / async-stream-priority knobs for a specific model.

$ git log --oneline --stat
stars:٢٨
forks:٣
updated:٩ مايو ٢٠٢٦ في ١٨:٠٤
SKILL.md
readonly