Skip to main content
Execute qualquer Skill no Manus
com um clique
$pwd:

fp8-gemm-tuning-sglang-aiter

// Use when trying to optimize end-to-end SGLang performance with gemm tuning for FP8 models on AMD HIP/ROCm by replacing the default Triton GEMM backend with a tuned Composable Kernel (CK) path through aiter; this skill is the verified playbook for that entire process, using FP8 block-wise GEMM (gemm_a8w8_blockscale) as the primary worked example—GEMM shape/dispatch logging in SGLang, CK composable-kernel tuning, and AITER_CONFIG_GEMM_A8W8_BLOCKSCALE CSV integration. FP8 blockscale and bpreshuffle should also apply by switch the place for dumping gemm and the ck tool used for tuning.

$ git log --oneline --stat
stars:102
forks:26
updated:18 de maio de 2026 às 09:56
SKILL.md
readonly