Triton Ascend hard API restrictions and forbidden syntax. MUST-follow rules that apply to every kernel: forbidden control flow (return/break/continue/lambda/while), tensor slice/index restrictions, scalar conversion rules, BLOCK_SIZE upper bound. Violating any of these produces a compile or runtime error on Ascend.

2026-04-19

triton-ascend-optimization

Développeurs de logiciels

Triton Ascend 性能优化通用策略: BLOCK_SIZE 选择 (1024-2048 for elementwise, must be <65536), grid configuration (use VEC_CORE_NUM / CUBE_CORE_NUM, 2D/3D grid for matmul / conv / reduce, 1D grid + inner loop for elementwise / pointwise), 256B alignment for memory transfers, autotune block-size patterns, fp16 / fp32 precision conversion. Bind via keywords like matmul, elementwise, reduce, block_size, grid, autotune, alignment, fp16, fp32, tile, interleaved-loop, cube-core, vec-core.

2026-04-19

search-workflow

Développeurs de logiciels

通过 adaptive_search 或 evolve 搜索式 workflow 生成优化算子。后台 silent mode 执行，轮询监控进度。

2026-04-16

triton-ascend-reduce

Développeurs de logiciels

适用于归约(reduce)类算子和含归约子步骤的复合算子（如归一化）的优化指南。典型算子包括：sum, mean, max, min, prod, argmax, argmin, cumsum, cumprod, softmax, logsoftmax, layernorm, rmsnorm, groupnorm, instancenorm, batchnorm, l1norm, l2norm, frobeniusnorm, var, std, average_pooling, sum_pooling 等。特别重要：当归约维度不是最后一维（如 dim=1 归约 shape=[B,F,D1,D2]），需要正确处理多维索引和两阶段归约。包含 PyTorch normalized_shape 多轴归一化语义说明。不适用于纯逐元素运算或矩阵乘法。如果算子是损失函数（先逐元素计算再全局归约），应选择 elementwise-reduce-fused 指南。

2026-04-16

cpu-basics

Développeurs de logiciels

CPU C++ 算子核心概念、标准结构模式、KernelBench 代码规范和内嵌扩展方法

2026-04-13

cpu-optimization-arm

Développeurs de logiciels

ARM CPU 架构性能优化技巧、NEON SIMD 向量化、数值稳定性和调试策略

2026-04-13

cpu-optimization-x64

Développeurs de logiciels

x64 CPU 架构性能优化技巧、SIMD/AVX 向量化、数值稳定性和调试策略

2026-04-13

Affichage des 8 principaux skills collectés sur 117 dans ce dépôt.

#002

mindspore-lite

9 skills51mis à jour 2026-07-02

6.8% du créateur

skill

métier

description

mis à jour

onnx-model-conversion-and-deployment

Développeurs de logiciels

MindSpore Lite云侧推理 Ascend 后端离线转换（ONNX → MindIR）与推理部署全流程。覆盖固定 shape、动态分档、纯动态 shape 的转换策略，以及 MindIR 推理验证与部署注意事项。

2026-07-02

open-source-model-migration

Développeurs de logiciels

把开源算法模型适配到 MindSpore Lite 部署管线：按网络结构拆分导出 ONNX、ONNX Runtime 推理验证、ONNX→MindIR 转换、MindSpore Lite 推理实现，并交付文档与常见问题。用户想把某个开源模型迁移到 MSLite 部署时调用。

2026-07-02

performance-optimization

Développeurs de logiciels

MindSpore Lite（Ascend）模型性能优化总攻略。做基线/profiling、融合算子改写、推理免拷贝、PTQ int8 量化、精度对齐与归档时调用。本文为总览与索引，细化策略见 references/。

2026-07-02

lite-build

Développeurs de logiciels

Build configuration, CMake options, cross-compilation and packaging. Use when building MindSpore Lite, configuring CMake, cross-compiling for ARM/iOS/MCU, packaging release archives, or troubleshooting build errors.

2026-07-02

lite-converter

Développeurs de logiciels

Model conversion pipeline, parser development, optimization passes and quantization. Use when converting models to .ms, writing parser code, implementing optimizer passes, or configuring quantization.

2026-07-02

lite-debug-test

Analystes en assurance qualité des logiciels et testeurs

Debugging, unit testing, benchmarking and performance analysis. Use when running gtest, benchmark tools, profiling latency or accuracy, diagnosing operator precision issues, delegate fallback, or memory leaks.

2026-07-02

lite-device-side-infer

Développeurs de logiciels

Device-side inference with LiteRT, NNACL and hardware delegates. Use for mobile/IoT inference, Android/iOS integration, NPU/GPU/CoreML delegates, Micro codegen for MCU, on-device training, or C/C++/Java/Python API usage with .ms models.

2026-07-02

lite-kernel-dev

Développeurs de logiciels

Operator and kernel development, NNACL, delegates, custom kernel registration. Use when adding operators, implementing NNACL kernels, writing delegate adapters (NPU/CoreML/Ascend), registering custom kernels, or modifying operator schema.

2026-07-02

Affichage des 8 principaux skills collectés sur 9 dans ce dépôt.

#003

hyper-parallel

7 skills53mis à jour 2026-06-29

5.3% du créateur

skill

métier

description

mis à jour

platform-dev

Développeurs de logiciels

HyperParallel platform abstraction layer development. Use when adding new platform APIs, implementing cross-platform features (FSDP/HSDP/Pipeline/Activation Checkpoint), creating DTensorBase extensions, or modifying collective operations. Covers both PyTorch and MindSpore backends.

2026-06-29

autogit

Développeurs de logiciels

GitCode fork workflow automation. Use this skill whenever the user wants to commit code, push, create or append to a Pull Request, view PR status, squash commits, regenerate a PR description, or run lint checks against a GitCode `origin` (fork) + `upstream` repository. Supports both Chinese and English natural-language triggers (e.g. "帮我提交", "create PR", "看下 PR 状态") and slash-command shortcuts (`/commit`, `/create-pr`, etc.). The full trigger → subcommand mapping lives in the "When to Activate" section.

2026-06-22

dist-op-analysis

Développeurs de logiciels

Distributed operator analysis. Analyzes operator interfaces provided by the user and outputs a standardized implementation plan. Requires human confirmation before development begins.

2026-06-21

dist-op-dev

Développeurs de logiciels

Distributed operator development. Reads a confirmed plan file, implements operator code and tests, and runs until all executable tests pass. Goal mode — no step-by-step confirmation needed.

2026-06-21

code-review

Analystes en assurance qualité des logiciels et testeurs

Review HyperParallel code changes for distributed correctness, stream synchronization, memory safety, cross-platform consistency, and code quality. Use when reviewing PRs, code changes, or when the user mentions "review", "code review", or "check this".

2026-06-02

gate-doctor

Analystes en assurance qualité des logiciels et testeurs

Drive a red MindSpore-family GitCode PR gate to green end-to-end — read state, post /check-pr → /retest, classify every failure (root-cause fix in production code for PR-INDUCED, /retest only for random flakes, REVERT-BEFORE-MERGE temp comment-out for confirmed unrelated sticky flakes), then loop until both pr-check-pass AND ci-pipeline-passed labels appear. Emits one structured final report on exit. Trigger on: 触发门禁 / 重跑门禁 / 门禁失败 / PR 流水线挂了 / 修一下流水线 / autofix 门禁 / 把 PR 修绿 / 一直跑到通过 / 看下 #N 现在情况 / ut/st/Ascend 用例失败 / Smoke_Ascend 挂了 / 门禁随机挂 / gate is failing / CI is flaky / random retest pass / /retest / /check-pr, or any prompt naming a MindSpore GitCode PR alongside a request to investigate or fix CI. Default PR repo is mindspore/hyper-parallel; pass full URL or `mindspore/mindspore#N` for cross-repo.

2026-06-02

parallel-strategy-analyzer

Développeurs de logiciels

Analyze model architecture and hardware constraints to recommend optimal parallel strategy combinations (DP/FSDP/TP/PP/EP/CP) with memory, communication, compute, and pipeline bubble estimation.

2026-06-02

3 dépôts affichés sur 3

Tous les dépôts sont affichés