Skip to main content
Run any Skill in Manus
with one click

cuda-attention-kernel-patterns

Patterns and pitfalls for the ONNX domain Attention operator (opset 23/24) CUDA implementation. Use when modifying the dispatch cascade in core/providers/cuda/llm/attention.cc, writing mask/bias CUDA kernels, debugging attention test routing, or adding features to the ONNX Attention op. NOT for contrib domain MultiHeadAttention/GroupQueryAttention.

Stars20,780
Forks3,970
UpdatedMay 7, 2026 at 20:42
SKILL.md
readonly