Skip to main content
Run any Skill in Manus
with one click
$pwd:

improve-cutile-kernel-perf

// Iteratively optimize cuTile kernel performance through systematic profiling, bottleneck analysis, IR comparison, and targeted tuning. Covers tile sizes, occupancy, autotune configs, TMA, latency hints, persistent scheduling, num_ctas, flush_to_zero, and IR-level debugging. Use when asked to "optimize cutile kernel", "improve kernel perf", "tune cutile performance", "make kernel faster", or iteratively benchmark and refine a cuTile GPU kernel in the TileGym project.

$ git log --oneline --stat
stars:722
forks:70
updated:May 6, 2026 at 07:52
File Explorer
7 files
SKILL.md
readonly