Skip to main content
تشغيل أي مهارة في Manus
بنقرة واحدة
$pwd:

kernel-cute-writing

// Write and implement GPU kernels using NVIDIA CuTe DSL (CUTLASS 4.x Python API) — NOT for Triton, CUDA C++, or conceptual explanations. Trigger only when the user wants to write or implement a kernel, not when asking questions about CuTe DSL concepts or layouts. CuTe DSL uses cute.jit/cute.kernel decorators and cutlass.cute imports. Covers element-wise kernels, GEMM patterns, reductions, memory hierarchy (global/shared/register/TMA), MMA tensor core operations, software pipelining, and framework integration.

$ git log --oneline --stat
stars:١٣٬٧٠٢
forks:٢٬٤٠٦
updated:٢٠ مايو ٢٠٢٦ في ٠٧:٣٥
مستكشف الملفات
20 ملفات
SKILL.md
readonly