Skip to main content
Exécutez n'importe quel Skill dans Manus
en un clic
$pwd:
mindspore-ai
GitHub creator profile

mindspore-ai

Repository-level view of 131 collected skills across 3 GitHub repositories, including approximate occupation coverage.

skills collected
131
repositories
3
occupation fields
3
updated
2026-04-27
repository explorer

Repositories and representative skills

#001
akg
117 skills25448updated 2026-04-20
89% of creator
triton-ascend-case-matmul-large-k
Développeurs de logiciels

矩阵乘法矩阵乘法 A[M, K] @ B[K, N] = C[M, N]中,大K维度矩阵乘法(K>>M,N)优化:针对M/N较小但K极大(如M=N=256,K=131072)的场景,Split-K切分K维度并行化、Workspace+Reduce替代全局同步,实现显著性能提升

2026-04-20
triton-ascend-api-rules
Développeurs de logiciels

Triton Ascend hard API restrictions and forbidden syntax. MUST-follow rules that apply to every kernel: forbidden control flow (return/break/continue/lambda/while), tensor slice/index restrictions, scalar conversion rules, BLOCK_SIZE upper bound. Violating any of these produces a compile or runtime error on Ascend.

2026-04-19
triton-ascend-optimization
Développeurs de logiciels

Triton Ascend 性能优化通用策略: BLOCK_SIZE 选择 (1024-2048 for elementwise, must be <65536), grid configuration (use VEC_CORE_NUM / CUBE_CORE_NUM, 2D/3D grid for matmul / conv / reduce, 1D grid + inner loop for elementwise / pointwise), 256B alignment for memory transfers, autotune block-size patterns, fp16 / fp32 precision conversion. Bind via keywords like matmul, elementwise, reduce, block_size, grid, autotune, alignment, fp16, fp32, tile, interleaved-loop, cube-core, vec-core.

2026-04-19
search-workflow
Développeurs de logiciels

通过 adaptive_search 或 evolve 搜索式 workflow 生成优化算子。 后台 silent mode 执行,轮询监控进度。

2026-04-16
triton-ascend-reduce
Développeurs de logiciels

适用于归约(reduce)类算子和含归约子步骤的复合算子(如归一化)的优化指南。典型算子包括:sum, mean, max, min, prod, argmax, argmin, cumsum, cumprod, softmax, logsoftmax, layernorm, rmsnorm, groupnorm, instancenorm, batchnorm, l1norm, l2norm, frobeniusnorm, var, std, average_pooling, sum_pooling 等。特别重要:当归约维度不是最后一维(如 dim=1 归约 shape=[B,F,D1,D2]),需要正确处理多维索引和两阶段归约。包含 PyTorch normalized_shape 多轴归一化语义说明。不适用于纯逐元素运算或矩阵乘法。如果算子是损失函数(先逐元素计算再全局归约),应选择 elementwise-reduce-fused 指南。

2026-04-16
cpu-basics
Développeurs de logiciels

CPU C++ 算子核心概念、标准结构模式、KernelBench 代码规范和内嵌扩展方法

2026-04-13
cpu-optimization-arm
Développeurs de logiciels

ARM CPU 架构性能优化技巧、NEON SIMD 向量化、数值稳定性和调试策略

2026-04-13
cpu-optimization-x64
Développeurs de logiciels

x64 CPU 架构性能优化技巧、SIMD/AVX 向量化、数值稳定性和调试策略

2026-04-13
Showing top 8 of 117 collected skills in this repository.
#002
mindspore-lite
8 skills51updated 2026-04-16
6.1% of creator
lite-cloud-side-infer
Développeurs de logiciels

Cloud-side inference with ExtendRT and Ascend backends. Use for server-side inference, Ascend 310/910 deployment, ModelParallelRunner for concurrent serving, ModelGroup for weight sharing, distributed inference, or .mindir format loading.

2026-04-16
lite-converter
Développeurs de logiciels

Model conversion pipeline, parser development, optimization passes and quantization. Use when converting models to .ms, writing parser code, implementing optimizer passes, or configuring quantization.

2026-04-16
open-model-convert-deploy
Développeurs de logiciels

实现开源模型从PyTorch→ONNX→MindIR→MindSpore Lite的端到端导出/验证/部署/性能评测。用户要求模型拆分导出、精度对齐、MindIR转换或部署工具链时调用。

2026-04-14
lite-build
Développeurs de logiciels

Build configuration, CMake options, cross-compilation and packaging. Use when building MindSpore Lite, configuring CMake, cross-compiling for ARM/iOS/MCU, packaging release archives, or troubleshooting build errors.

2026-04-02
lite-code-quality
Analystes en assurance qualité des logiciels et testeurs

Code formatting, naming conventions, security checks and CI verification. Use when running clang-format, checking code style, writing secure code for model parsing, reviewing code quality, or configuring CI/Jenkins pipelines.

2026-04-02
lite-debug-test
Analystes en assurance qualité des logiciels et testeurs

Debugging, unit testing, benchmarking and performance analysis. Use when running gtest, benchmark tools, profiling latency or accuracy, diagnosing operator precision issues, delegate fallback, or memory leaks.

2026-04-02
lite-device-side-infer
Développeurs de logiciels

Device-side inference with LiteRT, NNACL and hardware delegates. Use for mobile/IoT inference, Android/iOS integration, NPU/GPU/CoreML delegates, Micro codegen for MCU, on-device training, or C/C++/Java/Python API usage with .ms models.

2026-04-02
lite-kernel-dev
Développeurs de logiciels

Operator and kernel development, NNACL, delegates, custom kernel registration. Use when adding operators, implementing NNACL kernels, writing delegate adapters (NPU/CoreML/Ascend), registering custom kernels, or modifying operator schema.

2026-04-02
#003
hyper-parallel
6 skills12updated 2026-04-27
4.6% of creator
autogit
Développeurs de logiciels

GitCode fork workflow automation. Use this skill whenever the user wants to commit code, push, create or append to a Pull Request, view PR status, squash commits, regenerate a PR description, or run lint checks against a GitCode `origin` (fork) + `upstream` repository. Supports both Chinese and English natural-language triggers (e.g. "帮我提交", "create PR", "看下 PR 状态") and slash-command shortcuts (`/commit`, `/create-pr`, etc.). The full trigger → subcommand mapping lives in the "When to Activate" section.

2026-04-27
code-review
Analystes en assurance qualité des logiciels et testeurs

Review HyperParallel code changes for distributed correctness, stream synchronization, memory safety, cross-platform consistency, and code quality. Use when reviewing PRs, code changes, or when the user mentions "review", "code review", or "check this".

2026-04-27
platform-dev
Développeurs de logiciels

HyperParallel platform abstraction layer development. Use when adding new platform APIs, implementing cross-platform features (FSDP/HSDP/Pipeline/Activation Checkpoint), creating DTensorBase extensions, or modifying collective operations. Covers both PyTorch and MindSpore backends.

2026-04-27
dist-op-dev
Développeurs de logiciels

Execution-oriented workflow for HyperParallel distributed operator development. Analyzes the operator, implements or updates code and tests.

2026-04-17
dist-op-analysis
Développeurs de logiciels

Internal analysis tool for distributed operator development — provides interface specs, Primitive/ATen mappings and HyperParallel layout derivation logic. Used by dist-op-dev workflow. NOT for direct user calls.

2026-04-01
parallel-strategy-analyzer
Scientifiques des données

Analyze model architecture and hardware constraints to recommend optimal parallel strategy combinations (DP/FSDP/TP/PP/EP/CP) with memory, communication, compute, and pipeline bubble estimation.

2026-03-28
3 sur 3 depots affiches
Tous les depots sont affiches