Triton Ascend hard API restrictions and forbidden syntax. MUST-follow rules that apply to every kernel: forbidden control flow (return/break/continue/lambda/while), tensor slice/index restrictions, scalar conversion rules, BLOCK_SIZE upper bound. Violating any of these produces a compile or runtime error on Ascend.

2026-04-19

triton-ascend-optimization

소프트웨어 개발자

Triton Ascend 性能优化通用策略: BLOCK_SIZE 选择 (1024-2048 for elementwise, must be <65536), grid configuration (use VEC_CORE_NUM / CUBE_CORE_NUM, 2D/3D grid for matmul / conv / reduce, 1D grid + inner loop for elementwise / pointwise), 256B alignment for memory transfers, autotune block-size patterns, fp16 / fp32 precision conversion. Bind via keywords like matmul, elementwise, reduce, block_size, grid, autotune, alignment, fp16, fp32, tile, interleaved-loop, cube-core, vec-core.

2026-04-19

search-workflow

소프트웨어 개발자

通过 adaptive_search 或 evolve 搜索式 workflow 生成优化算子。后台 silent mode 执行，轮询监控进度。

2026-04-16

triton-ascend-reduce

소프트웨어 개발자

适用于归约(reduce)类算子和含归约子步骤的复合算子（如归一化）的优化指南。典型算子包括：sum, mean, max, min, prod, argmax, argmin, cumsum, cumprod, softmax, logsoftmax, layernorm, rmsnorm, groupnorm, instancenorm, batchnorm, l1norm, l2norm, frobeniusnorm, var, std, average_pooling, sum_pooling 等。特别重要：当归约维度不是最后一维（如 dim=1 归约 shape=[B,F,D1,D2]），需要正确处理多维索引和两阶段归约。包含 PyTorch normalized_shape 多轴归一化语义说明。不适用于纯逐元素运算或矩阵乘法。如果算子是损失函数（先逐元素计算再全局归约），应选择 elementwise-reduce-fused 指南。

2026-04-16

cpu-basics

소프트웨어 개발자

CPU C++ 算子核心概念、标准结构模式、KernelBench 代码规范和内嵌扩展方法

2026-04-13

cpu-optimization-arm

소프트웨어 개발자

ARM CPU 架构性能优化技巧、NEON SIMD 向量化、数值稳定性和调试策略

2026-04-13

cpu-optimization-x64

소프트웨어 개발자

x64 CPU 架构性能优化技巧、SIMD/AVX 向量化、数值稳定性和调试策略

2026-04-13

이 저장소에서 수집된 skills 117개 중 상위 8개를 표시합니다.

#002

mindspore-lite

skills 9개512026-07-02 업데이트

제작자 내 6.8%

skill

직업 분류

설명

업데이트

onnx-model-conversion-and-deployment

소프트웨어 개발자

MindSpore Lite云侧推理 Ascend 后端离线转换（ONNX → MindIR）与推理部署全流程。覆盖固定 shape、动态分档、纯动态 shape 的转换策略，以及 MindIR 推理验证与部署注意事项。

2026-07-02

open-source-model-migration

소프트웨어 개발자

把开源算法模型适配到 MindSpore Lite 部署管线：按网络结构拆分导出 ONNX、ONNX Runtime 推理验证、ONNX→MindIR 转换、MindSpore Lite 推理实现，并交付文档与常见问题。用户想把某个开源模型迁移到 MSLite 部署时调用。

2026-07-02

performance-optimization

소프트웨어 개발자

MindSpore Lite（Ascend）模型性能优化总攻略。做基线/profiling、融合算子改写、推理免拷贝、PTQ int8 量化、精度对齐与归档时调用。本文为总览与索引，细化策略见 references/。

2026-07-02

lite-build

소프트웨어 개발자

Build configuration, CMake options, cross-compilation and packaging. Use when building MindSpore Lite, configuring CMake, cross-compiling for ARM/iOS/MCU, packaging release archives, or troubleshooting build errors.

2026-07-02

lite-converter

소프트웨어 개발자

Model conversion pipeline, parser development, optimization passes and quantization. Use when converting models to .ms, writing parser code, implementing optimizer passes, or configuring quantization.

2026-07-02

lite-debug-test

소프트웨어 품질 보증 분석가·테스터

Debugging, unit testing, benchmarking and performance analysis. Use when running gtest, benchmark tools, profiling latency or accuracy, diagnosing operator precision issues, delegate fallback, or memory leaks.

2026-07-02

lite-device-side-infer

소프트웨어 개발자

Device-side inference with LiteRT, NNACL and hardware delegates. Use for mobile/IoT inference, Android/iOS integration, NPU/GPU/CoreML delegates, Micro codegen for MCU, on-device training, or C/C++/Java/Python API usage with .ms models.

2026-07-02

lite-kernel-dev

소프트웨어 개발자

Operator and kernel development, NNACL, delegates, custom kernel registration. Use when adding operators, implementing NNACL kernels, writing delegate adapters (NPU/CoreML/Ascend), registering custom kernels, or modifying operator schema.

2026-07-02

이 저장소에서 수집된 skills 9개 중 상위 8개를 표시합니다.

#003

hyper-parallel

skills 7개532026-06-29 업데이트

제작자 내 5.3%

skill

직업 분류

설명

업데이트

platform-dev

소프트웨어 개발자

HyperParallel platform abstraction layer development. Use when adding new platform APIs, implementing cross-platform features (FSDP/HSDP/Pipeline/Activation Checkpoint), creating DTensorBase extensions, or modifying collective operations. Covers both PyTorch and MindSpore backends.

2026-06-29

autogit

소프트웨어 개발자

GitCode fork workflow automation. Use this skill whenever the user wants to commit code, push, create or append to a Pull Request, view PR status, squash commits, regenerate a PR description, or run lint checks against a GitCode `origin` (fork) + `upstream` repository. Supports both Chinese and English natural-language triggers (e.g. "帮我提交", "create PR", "看下 PR 状态") and slash-command shortcuts (`/commit`, `/create-pr`, etc.). The full trigger → subcommand mapping lives in the "When to Activate" section.

2026-06-22

dist-op-analysis

소프트웨어 개발자

Distributed operator analysis. Analyzes operator interfaces provided by the user and outputs a standardized implementation plan. Requires human confirmation before development begins.

2026-06-21

dist-op-dev

소프트웨어 개발자

Distributed operator development. Reads a confirmed plan file, implements operator code and tests, and runs until all executable tests pass. Goal mode — no step-by-step confirmation needed.

2026-06-21

code-review

소프트웨어 품질 보증 분석가·테스터

Review HyperParallel code changes for distributed correctness, stream synchronization, memory safety, cross-platform consistency, and code quality. Use when reviewing PRs, code changes, or when the user mentions "review", "code review", or "check this".

2026-06-02

gate-doctor

소프트웨어 품질 보증 분석가·테스터

Drive a red MindSpore-family GitCode PR gate to green end-to-end — read state, post /check-pr → /retest, classify every failure (root-cause fix in production code for PR-INDUCED, /retest only for random flakes, REVERT-BEFORE-MERGE temp comment-out for confirmed unrelated sticky flakes), then loop until both pr-check-pass AND ci-pipeline-passed labels appear. Emits one structured final report on exit. Trigger on: 触发门禁 / 重跑门禁 / 门禁失败 / PR 流水线挂了 / 修一下流水线 / autofix 门禁 / 把 PR 修绿 / 一直跑到通过 / 看下 #N 现在情况 / ut/st/Ascend 用例失败 / Smoke_Ascend 挂了 / 门禁随机挂 / gate is failing / CI is flaky / random retest pass / /retest / /check-pr, or any prompt naming a MindSpore GitCode PR alongside a request to investigate or fix CI. Default PR repo is mindspore/hyper-parallel; pass full URL or `mindspore/mindspore#N` for cross-repo.

2026-06-02

parallel-strategy-analyzer

소프트웨어 개발자

Analyze model architecture and hardware constraints to recommend optimal parallel strategy combinations (DP/FSDP/TP/PP/EP/CP) with memory, communication, compute, and pipeline bubble estimation.

2026-06-02

저장소 3개 중 3개 표시

모든 저장소를 표시했습니다