Skip to main content
تشغيل أي مهارة في Manus
بنقرة واحدة

hyperpod-performance-debugger

// Diagnose performance issues on Amazon SageMaker HyperPod clusters — uneven NCCL bandwidth across nodes and poor filesystem throughput. Read-only. Surfaces host-side signals (Xid, ECC, NVLink, EFA reachability, FSx saturation) and routes to the appropriate sibling skill (hyperpod-node-debugger, hyperpod-nccl, hyperpod-version-checker, hyperpod-issue-report) for any remediation. Triggers on uneven NCCL across nodes, straggler node, FSx slow, checkpoint slow, dataloader slow, filesystem bottleneck, FSx throughput, cross-AZ latency, topology mismatch.

$ git log --oneline --stat
stars:٧٦٥
forks:١٠٧
updated:١٦ مايو ٢٠٢٦ في ٢٣:٢٨
مستكشف الملفات
3 ملفات
SKILL.md
readonly