Skip to main content
Ejecuta cualquier Skill en Manus
con un clic

pto-isa-matmul-l2-schedule

Estrellas58
Forks38
Actualizado21 de mayo de 2026, 01:48

PTO-DSL matmul L2-reuse scheduler for Ascend A2/A3: persistent-block GEMM with N-group swizzle along the inner M walk and M-direction zigzag at N-group boundaries. Captures the tile-id math, the CANN platform_config- driven swizzleCountN budget (with the 32 MiB safety-ratio cliff), the DN-B layout note, the runtime wiring, and the verification path against torch_npu. Use when tuning a matmul-shaped kernel that profiles as L2-bound, porting the swizzle/zigzag schedule to a new persistent-block kernel, choosing swizzleCountN for a new SoC, or deciding between the manual SPMD-static baseline and this persistent + swizzle schedule. Scoped to one schedule recipe — add a separate skill for other PTO-ISA performance patterns (vector reduce, flash-attention scheduling, etc.).

Instalación

Instalar con Codex o Claude Copia este prompt, pégalo en Codex, Claude u otro asistente, y deja que revise la página de la skill y la instale por ti.

Explorador de archivos
2 archivos
SKILL.md
readonly