Skip to main content
Run any Skill in Manus
with one click

pto-isa-matmul-l2-schedule

Stars58
Forks38
UpdatedMay 21, 2026 at 01:48

PTO-DSL matmul L2-reuse scheduler for Ascend A2/A3: persistent-block GEMM with N-group swizzle along the inner M walk and M-direction zigzag at N-group boundaries. Captures the tile-id math, the CANN platform_config- driven swizzleCountN budget (with the 32 MiB safety-ratio cliff), the DN-B layout note, the runtime wiring, and the verification path against torch_npu. Use when tuning a matmul-shaped kernel that profiles as L2-bound, porting the swizzle/zigzag schedule to a new persistent-block kernel, choosing swizzleCountN for a new SoC, or deciding between the manual SPMD-static baseline and this persistent + swizzle schedule. Scoped to one schedule recipe — add a separate skill for other PTO-ISA performance patterns (vector reduce, flash-attention scheduling, etc.).

Installation

Install with Codex or Claude Copy this prompt, paste it into Codex, Claude, or another assistant, and let it review the skill page and install it for you.

File Explorer
2 files
SKILL.md
readonly