Skip to main content
Manus에서 모든 스킬 실행
원클릭으로
$pwd:
mindspore-ai
GitHub creator profile

mindspore-ai

Repository-level view of 131 collected skills across 3 GitHub repositories, including approximate occupation coverage.

skills collected
131
repositories
3
occupation fields
3
updated
2026-04-27
repository explorer

Repositories and representative skills

#001
akg
117 skills25448updated 2026-04-20
89% of creator
triton-ascend-case-matmul-large-k
소프트웨어 개발자

矩阵乘法矩阵乘法 A[M, K] @ B[K, N] = C[M, N]中,大K维度矩阵乘法(K>>M,N)优化:针对M/N较小但K极大(如M=N=256,K=131072)的场景,Split-K切分K维度并行化、Workspace+Reduce替代全局同步,实现显著性能提升

2026-04-20
triton-ascend-api-rules
소프트웨어 개발자

Triton Ascend hard API restrictions and forbidden syntax. MUST-follow rules that apply to every kernel: forbidden control flow (return/break/continue/lambda/while), tensor slice/index restrictions, scalar conversion rules, BLOCK_SIZE upper bound. Violating any of these produces a compile or runtime error on Ascend.

2026-04-19
triton-ascend-optimization
소프트웨어 개발자

Triton Ascend 性能优化通用策略: BLOCK_SIZE 选择 (1024-2048 for elementwise, must be <65536), grid configuration (use VEC_CORE_NUM / CUBE_CORE_NUM, 2D/3D grid for matmul / conv / reduce, 1D grid + inner loop for elementwise / pointwise), 256B alignment for memory transfers, autotune block-size patterns, fp16 / fp32 precision conversion. Bind via keywords like matmul, elementwise, reduce, block_size, grid, autotune, alignment, fp16, fp32, tile, interleaved-loop, cube-core, vec-core.

2026-04-19
search-workflow
소프트웨어 개발자

通过 adaptive_search 或 evolve 搜索式 workflow 生成优化算子。 后台 silent mode 执行,轮询监控进度。

2026-04-16
triton-ascend-reduce
소프트웨어 개발자

适用于归约(reduce)类算子和含归约子步骤的复合算子(如归一化)的优化指南。典型算子包括:sum, mean, max, min, prod, argmax, argmin, cumsum, cumprod, softmax, logsoftmax, layernorm, rmsnorm, groupnorm, instancenorm, batchnorm, l1norm, l2norm, frobeniusnorm, var, std, average_pooling, sum_pooling 等。特别重要:当归约维度不是最后一维(如 dim=1 归约 shape=[B,F,D1,D2]),需要正确处理多维索引和两阶段归约。包含 PyTorch normalized_shape 多轴归一化语义说明。不适用于纯逐元素运算或矩阵乘法。如果算子是损失函数(先逐元素计算再全局归约),应选择 elementwise-reduce-fused 指南。

2026-04-16
cpu-basics
소프트웨어 개발자

CPU C++ 算子核心概念、标准结构模式、KernelBench 代码规范和内嵌扩展方法

2026-04-13
cpu-optimization-arm
소프트웨어 개발자

ARM CPU 架构性能优化技巧、NEON SIMD 向量化、数值稳定性和调试策略

2026-04-13
cpu-optimization-x64
소프트웨어 개발자

x64 CPU 架构性能优化技巧、SIMD/AVX 向量化、数值稳定性和调试策略

2026-04-13
Showing top 8 of 117 collected skills in this repository.
#002
mindspore-lite
8 skills51updated 2026-04-16
6.1% of creator
lite-cloud-side-infer
소프트웨어 개발자

Cloud-side inference with ExtendRT and Ascend backends. Use for server-side inference, Ascend 310/910 deployment, ModelParallelRunner for concurrent serving, ModelGroup for weight sharing, distributed inference, or .mindir format loading.

2026-04-16
lite-converter
소프트웨어 개발자

Model conversion pipeline, parser development, optimization passes and quantization. Use when converting models to .ms, writing parser code, implementing optimizer passes, or configuring quantization.

2026-04-16
open-model-convert-deploy
소프트웨어 개발자

实现开源模型从PyTorch→ONNX→MindIR→MindSpore Lite的端到端导出/验证/部署/性能评测。用户要求模型拆分导出、精度对齐、MindIR转换或部署工具链时调用。

2026-04-14
lite-build
소프트웨어 개발자

Build configuration, CMake options, cross-compilation and packaging. Use when building MindSpore Lite, configuring CMake, cross-compiling for ARM/iOS/MCU, packaging release archives, or troubleshooting build errors.

2026-04-02
lite-code-quality
소프트웨어 품질 보증 분석가·테스터

Code formatting, naming conventions, security checks and CI verification. Use when running clang-format, checking code style, writing secure code for model parsing, reviewing code quality, or configuring CI/Jenkins pipelines.

2026-04-02
lite-debug-test
소프트웨어 품질 보증 분석가·테스터

Debugging, unit testing, benchmarking and performance analysis. Use when running gtest, benchmark tools, profiling latency or accuracy, diagnosing operator precision issues, delegate fallback, or memory leaks.

2026-04-02
lite-device-side-infer
소프트웨어 개발자

Device-side inference with LiteRT, NNACL and hardware delegates. Use for mobile/IoT inference, Android/iOS integration, NPU/GPU/CoreML delegates, Micro codegen for MCU, on-device training, or C/C++/Java/Python API usage with .ms models.

2026-04-02
lite-kernel-dev
소프트웨어 개발자

Operator and kernel development, NNACL, delegates, custom kernel registration. Use when adding operators, implementing NNACL kernels, writing delegate adapters (NPU/CoreML/Ascend), registering custom kernels, or modifying operator schema.

2026-04-02
#003
hyper-parallel
6 skills12updated 2026-04-27
4.6% of creator
autogit
소프트웨어 개발자

GitCode fork workflow automation. Use this skill whenever the user wants to commit code, push, create or append to a Pull Request, view PR status, squash commits, regenerate a PR description, or run lint checks against a GitCode `origin` (fork) + `upstream` repository. Supports both Chinese and English natural-language triggers (e.g. "帮我提交", "create PR", "看下 PR 状态") and slash-command shortcuts (`/commit`, `/create-pr`, etc.). The full trigger → subcommand mapping lives in the "When to Activate" section.

2026-04-27
code-review
소프트웨어 품질 보증 분석가·테스터

Review HyperParallel code changes for distributed correctness, stream synchronization, memory safety, cross-platform consistency, and code quality. Use when reviewing PRs, code changes, or when the user mentions "review", "code review", or "check this".

2026-04-27
platform-dev
소프트웨어 개발자

HyperParallel platform abstraction layer development. Use when adding new platform APIs, implementing cross-platform features (FSDP/HSDP/Pipeline/Activation Checkpoint), creating DTensorBase extensions, or modifying collective operations. Covers both PyTorch and MindSpore backends.

2026-04-27
dist-op-dev
소프트웨어 개발자

Execution-oriented workflow for HyperParallel distributed operator development. Analyzes the operator, implements or updates code and tests.

2026-04-17
dist-op-analysis
소프트웨어 개발자

Internal analysis tool for distributed operator development — provides interface specs, Primitive/ATen mappings and HyperParallel layout derivation logic. Used by dist-op-dev workflow. NOT for direct user calls.

2026-04-01
parallel-strategy-analyzer
데이터 과학자

Analyze model architecture and hardware constraints to recommend optimal parallel strategy combinations (DP/FSDP/TP/PP/EP/CP) with memory, communication, compute, and pipeline bubble estimation.

2026-03-28
저장소 3개 중 3개 표시
모든 저장소를 표시했습니다