Jeden Skill in Manus ausführen
mit einem Klick

Jeden Skill in Manus mit einem Klick ausführen

$pwd:

triton-ascend-api-rules

Name: Triton Ascend Api Rules
Author: mindspore-ai

// Triton Ascend hard API restrictions and forbidden syntax. MUST-follow rules that apply to every kernel: forbidden control flow (return/break/continue/lambda/while), tensor slice/index restrictions, scalar conversion rules, BLOCK_SIZE upper bound. Violating any of these produces a compile or runtime error on Ascend.

In Manus ausführen

$ git log --oneline --stat

stars:254

forks:48

updated:19. April 2026 um 13:47

SKILL.md

readonly

name	triton-ascend-api-rules
description	Triton Ascend hard API restrictions and forbidden syntax. MUST-follow rules that apply to every kernel: forbidden control flow (return/break/continue/lambda/while), tensor slice/index restrictions, scalar conversion rules, BLOCK_SIZE upper bound. Violating any of these produces a compile or runtime error on Ascend.
category	fundamental
version	1.0.0
metadata	{"backend":"ascend","dsl":"triton_ascend","hardware":"Atlas A2, Atlas A3"}

Triton Ascend API Hard Rules (MUST follow)

This file lists rules whose violation causes a compile or runtime error on Ascend. They apply to every kernel in this DSL — no exceptions.

禁止使用的语法

return / break / continue → 使用 mask 控制
lambda → 内联函数或 tl.where
链式布尔运算 → 分步计算 mask
张量直接索引 → tl.load / tl.store
if-else 中负偏移 → tl.maximum(offset, 0)
Ascend: 复杂 tl.where → if-else
Ascend: while 循环 → for 替代
Ascend: range() 的 start/stop 混用运行时变量和 constexpr → 用全 constexpr 的 range + 循环体内运行时 if 跳过

While 循环替代（Ascend）

静态上限（编译时常量）: 直接 for i in range(N_ITERS)

动态上限（运行时参数）:

@triton.jit
def kernel(ptr, n_iters, TILE: tl.constexpr, MAX_ITERS: tl.constexpr):
    for i in range(MAX_ITERS):
        if i < n_iters:
            offset = i * TILE + tl.arange(0, TILE)
            data = tl.load(ptr + offset)
            tl.store(ptr + offset, data * 2)

切片操作

禁止 Python 切片 b[0] b[i:j]
单元素: tl.get_element(tensor, (index,))
切片: tl.extract_slice(tensor, offsets, sizes, strides)
插入: tl.insert_slice(full, sub, offsets, sizes, strides)
禁止对 tl.arange 张量用 get_element

其他限制

tl.constexpr 仅在内核参数中使用，host 侧不可用
输出张量用 torch.empty / empty_like（避免 zeros/ones 初始化开销）
标量转换仅 scalar.to(type)，禁止 tl.float16(scalar)
BLOCK_SIZE 必须小于 65536

related-skills.json

gleiches Repository

triton-ascend-case-matmul-large-k.md

from "mindspore-ai/akg"

矩阵乘法矩阵乘法 A[M, K] @ B[K, N] = C[M, N]中，大K维度矩阵乘法(K>>M,N)优化：针对M/N较小但K极大(如M=N=256,K=131072)的场景，Split-K切分K维度并行化、Workspace+Reduce替代全局同步，实现显著性能提升

2026-04-20254

triton-ascend-optimization.md

from "mindspore-ai/akg"

Triton Ascend 性能优化通用策略: BLOCK_SIZE 选择 (1024-2048 for elementwise, must be <65536), grid configuration (use VEC_CORE_NUM / CUBE_CORE_NUM, 2D/3D grid for matmul / conv / reduce, 1D grid + inner loop for elementwise / pointwise), 256B alignment for memory transfers, autotune block-size patterns, fp16 / fp32 precision conversion. Bind via keywords like matmul, elementwise, reduce, block_size, grid, autotune, alignment, fp16, fp32, tile, interleaved-loop, cube-core, vec-core.

2026-04-19254

search-workflow.md

from "mindspore-ai/akg"

通过 adaptive_search 或 evolve 搜索式 workflow 生成优化算子。后台 silent mode 执行，轮询监控进度。

2026-04-16254

triton-ascend-reduce.md

from "mindspore-ai/akg"

适用于归约(reduce)类算子和含归约子步骤的复合算子（如归一化）的优化指南。典型算子包括：sum, mean, max, min, prod, argmax, argmin, cumsum, cumprod, softmax, logsoftmax, layernorm, rmsnorm, groupnorm, instancenorm, batchnorm, l1norm, l2norm, frobeniusnorm, var, std, average_pooling, sum_pooling 等。特别重要：当归约维度不是最后一维（如 dim=1 归约 shape=[B,F,D1,D2]），需要正确处理多维索引和两阶段归约。包含 PyTorch normalized_shape 多轴归一化语义说明。不适用于纯逐元素运算或矩阵乘法。如果算子是损失函数（先逐元素计算再全局归约），应选择 elementwise-reduce-fused 指南。

2026-04-16254

cpu-basics.md

from "mindspore-ai/akg"

CPU C++ 算子核心概念、标准结构模式、KernelBench 代码规范和内嵌扩展方法

2026-04-13254

cpu-optimization-arm.md

from "mindspore-ai/akg"

ARM CPU 架构性能优化技巧、NEON SIMD 向量化、数值稳定性和调试策略

2026-04-13254

package.json

"author": "mindspore-ai"

"repository": "mindspore-ai/akg"

GitHub-Repository öffnen Creator-Repositorys ansehen

$ install --global

$ download --local

In Manus ausführen

$ useful --forSOC

SoftwareentwicklerInformatik- und Mathematikberufe15-1252L4

name	triton-ascend-api-rules
description	Triton Ascend hard API restrictions and forbidden syntax. MUST-follow rules that apply to every kernel: forbidden control flow (return/break/continue/lambda/while), tensor slice/index restrictions, scalar conversion rules, BLOCK_SIZE upper bound. Violating any of these produces a compile or runtime error on Ascend.
category	fundamental
version	1.0.0
metadata	{"backend":"ascend","dsl":"triton_ascend","hardware":"Atlas A2, Atlas A3"}

Triton Ascend API Hard Rules (MUST follow)

This file lists rules whose violation causes a compile or runtime error on Ascend. They apply to every kernel in this DSL — no exceptions.

禁止使用的语法

return / break / continue → 使用 mask 控制
lambda → 内联函数或 tl.where
链式布尔运算 → 分步计算 mask
张量直接索引 → tl.load / tl.store
if-else 中负偏移 → tl.maximum(offset, 0)
Ascend: 复杂 tl.where → if-else
Ascend: while 循环 → for 替代
Ascend: range() 的 start/stop 混用运行时变量和 constexpr → 用全 constexpr 的 range + 循环体内运行时 if 跳过

While 循环替代（Ascend）

静态上限（编译时常量）: 直接 for i in range(N_ITERS)

动态上限（运行时参数）:

@triton.jit
def kernel(ptr, n_iters, TILE: tl.constexpr, MAX_ITERS: tl.constexpr):
    for i in range(MAX_ITERS):
        if i < n_iters:
            offset = i * TILE + tl.arange(0, TILE)
            data = tl.load(ptr + offset)
            tl.store(ptr + offset, data * 2)

切片操作

禁止 Python 切片 b[0] b[i:j]
单元素: tl.get_element(tensor, (index,))
切片: tl.extract_slice(tensor, offsets, sizes, strides)
插入: tl.insert_slice(full, sub, offsets, sizes, strides)
禁止对 tl.arange 张量用 get_element

其他限制

tl.constexpr 仅在内核参数中使用，host 侧不可用
输出张量用 torch.empty / empty_like（避免 zeros/ones 初始化开销）
标量转换仅 scalar.to(type)，禁止 tl.float16(scalar)
BLOCK_SIZE 必须小于 65536

triton-ascend-api-rules

Triton Ascend API Hard Rules (MUST follow)

禁止使用的语法

While 循环替代（Ascend）

切片操作

其他限制

Mehr aus diesem Repository

Mehr aus diesem Repository

Triton Ascend API Hard Rules (MUST follow)

禁止使用的语法

While 循环替代（Ascend）

切片操作

其他限制