Skip to main content
Jeden Skill in Manus ausführen
mit einem Klick

pto-isa-matmul-l2-schedule

Sterne58
Forks38
Aktualisiert21. Mai 2026 um 01:48

PTO-DSL matmul L2-reuse scheduler for Ascend A2/A3: persistent-block GEMM with N-group swizzle along the inner M walk and M-direction zigzag at N-group boundaries. Captures the tile-id math, the CANN platform_config- driven swizzleCountN budget (with the 32 MiB safety-ratio cliff), the DN-B layout note, the runtime wiring, and the verification path against torch_npu. Use when tuning a matmul-shaped kernel that profiles as L2-bound, porting the swizzle/zigzag schedule to a new persistent-block kernel, choosing swizzleCountN for a new SoC, or deciding between the manual SPMD-static baseline and this persistent + swizzle schedule. Scoped to one schedule recipe — add a separate skill for other PTO-ISA performance patterns (vector reduce, flash-attention scheduling, etc.).

Installation

Mit Codex oder Claude installieren Kopieren Sie diesen Prompt, fügen Sie ihn in Codex, Claude oder einen anderen Assistant ein und lassen Sie die Skill-Seite prüfen und installieren.

Datei-Explorer
2 Dateien
SKILL.md
readonly