Skip to main content
在 Manus 中运行任何 Skill
一键导入
hw-native-sys
GitHub 创作者资料

hw-native-sys

按仓库查看 5 个 GitHub 仓库中的 58 个已收集 skills。

已收集 skills
58
仓库
5
更新
2026-06-26
仓库浏览

仓库与代表性 skills

incore-profiling
未分类

Profile PyPTO kernels in-core with the Ascend msprof op-simulator — cycle-accurate per-kernel traces. Use when the user wants to profile a built case, inspect kernel timing or instruction streams, or generate MindStudio Insight traces.

2026-06-24
cube-tile-tuning
未分类

Tune cube/matmul tile sizes (row tile, N fragment, K fragment) for a PyPTO kernel — analytic hints, an on-chip buffer constraint model, and an empirical device sweep. Use when optimizing a matmul/cube's throughput, sizing the row / N / K tiles, resolving Mat (L1) / L0C / UB buffer overflows, or trading one tile dim for another.

2026-06-23
bisect-precision
软件质量保证分析师与测试员

Locate which pypto commit introduced a precision regression. Only pypto and its corresponding simpler (submodule) are tracked — ptoas and pto-isa versions are not part of the bisect. If the culprit is a simpler submodule bump, performs a second-level bisect within simpler.

2026-05-21
create-issue
软件开发工程师

Reproduce a reported problem, collect dependency versions, and create a GitHub issue. Use when the user wants to file a bug, request a feature, or create any GitHub issue.

2026-05-21
ascendc-docs-search
软件开发工程师

Ascend C 开发资源索引(本地+在线)。提供:(1) 本地 API 文档索引、示例代码映射,(2) 在线文档搜索功能,(3) 资源查找优先级,(4) Explore Agent 使用指南。优先使用本地资源,仅在本地检索不到时使用在线搜索。

2026-05-15
git-commit
软件开发工程师

Complete git commit workflow including pre-commit checks, staging, message generation, and verification. Use when creating commits or preparing changes for commit.

2026-05-06
github-pr
软件开发工程师

Create or update a GitHub pull request after committing and pushing changes. Use when the user asks to create a PR, submit changes for review, or open a pull request.

2026-04-02
ascendc-api-best-practices
软件开发工程师

Ascend C API 使用最佳实践。提供算术、归约、数据搬运、Buffer管理、精度转换等 API 的正确用法和限制说明。触发:用户询问具体 API 用法(如"DataCopy 怎么用")、遇到 API 参数错误或限制报错(如 repeatTimes、对齐问题)、需要查看 API 最佳实践或避坑指南时。

2026-03-31
当前展示该仓库 Top 8 / 20 个已收集 skills。
dfx-analyze
未分类

Analyze an onboard run's performance/scheduling/dependency/dump data using simpler's BUILT-IN DFX tools (simpler_setup.tools.*) instead of hand-rolling instrumentation. Use AFTER an onboard run when you need per-run device timing (Total/Orch/Sched), AICPU scheduler-overhead / Tail-OH breakdown, the task dependency graph, scope ring-fill peaks, or to inspect args dumps. These are simpler's own tools (shipped in the wheel), distinct from any cross-repo workload. Reach for this before writing custom timing/logging into the runtime.

2026-06-26
multi-repo-setup
未分类

Set up a cross-repo investigation when a workload from another repo (pypto, pypto-lib, etc.) needs to be run, especially when you want to swap in simpler-main HEAD or the current worktree's simpler instead of the version that repo pins. Clones-or-updates each external repo every invocation so stale local clones don't lie about CI parity. MUST invoke before chasing "X doesn't work on simpler" reports where X lives outside this repo.

2026-06-26
weekly-changelog
软件开发工程师

Summarize user-facing changes merged in the current Friday-anchored week (most recent Friday up through yesterday) in the simpler repo into a markdown changelog with before/after code examples. Also emits a full all-PR inventory (WEEKLY_ALL_PRS) and Chinese (_zh) translations of both docs. Use when the user asks for a weekly changelog, weekly summary, or weekly external changes report.

2026-06-12
onboard-arch-precheck
软件质量保证分析师与测试员

Detect the host's actual Ascend silicon and refuse mismatched `--platform` onboard hardware test invocations BEFORE any device is locked. MUST invoke this skill before running pytest or task-submit commands that use `--platform a2a3` or `--platform a5` (onboard only — sim variants pass through). Use when invoking onboard hardware tests, repro'ing flaky-test reports, or wrapping pytest in task-submit. Skip for `--platform a2a3sim` / `--platform a5sim` (silicon-agnostic).

2026-06-05
review-pr
软件质量保证分析师与测试员

Review a GitHub PR by analyzing the correct diff (merge-base to HEAD), reconciling stated vs. real goal, and applying type-specific scrutiny. Optionally folds in independent reviews from local `codex` / `gemini` CLIs when the invocation explicitly opts in (`codex`, `gemini`, or `all` in the arguments). Use when the user asks to review a PR, analyze PR changes, or give feedback on a pull request.

2026-06-03
testing
软件质量保证分析师与测试员

Testing guide and pre-commit testing strategy for PTO Runtime. Use when running tests, adding tests, or deciding what to test before committing.

2026-05-30
insight-trace
软件开发工程师

Generate a MindStudio Insight trace for any `kernel_entry(args)` style kernel in this repo — SPMD mix, AIC-only single-task (e.g. `aic_pv_matmul`), or AIV-only single-task (e.g. `aiv_softmax_prepare`). Use when the user asks to "produce/generate/run an Insight trace", "trace this kernel under msprof op simulator", or troubleshoot Insight trace collection. AICore-only replay path — bypasses AICPU orchestration. For PTOAS-style kernels, use [PTOAS msprof_op_simulator_usage_zh.md](https://github.com/hw-native-sys/PTOAS/blob/main/.claude/skill/msprof_op_sim_insight_skill.md) instead.

2026-05-25
benchmark
软件质量保证分析师与测试员

Benchmark runtime performance on hardware. If the current branch has commits ahead of upstream/main or uncommitted changes, compares against the fork point (merge-base). Otherwise benchmarks current state only. Use when the user asks to benchmark, measure performance, or compare latency.

2026-05-21
当前展示该仓库 Top 8 / 14 个已收集 skills。
incore-profiling
软件开发工程师

Profile PyPTO kernels in-core with the Ascend msprof op-simulator — cycle-accurate per-kernel traces. Use when the user wants to profile a built case, inspect kernel timing or instruction streams, or generate MindStudio Insight traces.

2026-06-15
weekly-changelog
软件开发工程师

Generate a weekly changelog markdown file summarizing external API and feature changes from git commits in a date range. Extracts before/after Python examples per commit, groups by theme (DSL / distributed / runtime / IR deprecations), and attributes each change to its author. Use when the user asks for a weekly report, changelog, commit summary, or interface-change digest.

2026-06-09
compare-codegen
软件开发工程师

Compare codegen output (.pto files and pass dumps) between origin/main and the current branch for a given test case. Runs the test with --save-kernels and --dump-passes on both branches via git worktree, then diffs the results. Use when the user asks to compare codegen output, diff .pto files between branches, or check what changed in generated code.

2026-05-30
add-op
软件开发工程师

Add new operator definitions to PyPTO across all layers (C++, Python IR, Python DSL, tests, codegen, docs). Covers tile ops, tensor ops, tensor-to-tile conversion, and codegen registration. Use when the user asks to add a new op, define a new operator, implement a new tile/tensor operation, or extend the operator system.

2026-05-25
auto-pr
软件开发工程师

Create a GitHub PR then autonomously loop on CI failures and review comments until the PR is fully green. Combines branch prep, PR creation, and a hands-off fix loop. Use when the user wants to ship a PR end-to-end, auto-fix a PR until green, or create-and-fix a PR in one go.

2026-05-23
fix-pr
高级秘书和行政助理

Fix GitHub PR issues — address review comments and resolve CI failures in a loop until the PR is fully clean. Fetches CI errors online and triages review feedback. Use when fixing PR problems, addressing review comments, or resolving CI failures.

2026-05-02
fix-issue
软件开发工程师

Fix a GitHub issue by fetching content, creating a branch, planning the fix, and implementing it. Use when the user asks to fix a specific issue number or work on a GitHub issue.

2026-04-10
create-issue
软件开发工程师

Create a GitHub issue following the project's issue templates. Classifies the issue type, fills required fields per template, creates it via gh CLI, and sets project board fields (Status, Priority, Effort, Sprint). Use when the user wants to file a bug, request a feature, report a pass bug, or create any GitHub issue.

2026-04-09
当前展示该仓库 Top 8 / 13 个已收集 skills。
pto-isa-cpu-sim-kernel-test
软件质量保证分析师与测试员

Use when Codex needs to validate a `.pto` program or `ptoas`-generated C++ kernel with the local `pto-isa` CPU simulator in this repository. Covers generating `.pto` files from PTOAS samples, running `ptoas` to emit C++, grafting the emitted kernel into a testcase under `.downloads/pto-isa/tests/cpu/st/testcase/`, updating `main.cpp`, `gen_data.py`, and CMake wiring as needed, then running `tests/run_cpu.py` for functional testing and debug on Windows, WSL, or Linux.

2026-04-25
ptoas-publish-pr
软件开发工程师

Publish PTOAS changes to GitHub as a pull request. Use when Codex needs to turn intended local PTOAS edits into a branch, commit, push, and PR, especially when the worktree contains unrelated files, the repo uses `origin` as a personal fork and `upstream` as the canonical repository, or GitHub authentication may need to be checked with `gh auth status` and `gh auth login`.

2026-04-25
camodel-isa-verification
软件质量保证分析师与测试员

Create, run, and analyze PTO-ISA ST tests on the CANN CA model simulator. Use when Codex needs to verify A5/Ascend PTO-ISA instruction behavior, inspect simulator instruction logs, measure vector instruction latency, or compare UB dump hex output against expected values.

2026-04-25
msprof-op-simulator-insight
软件质量保证分析师与测试员

Compile and profile PTOAS-generated kernel sources with `msprof op simulator`, then export MindStudio Insight files. Use when Codex needs to build a host runner, run A3 `dav_2201` op simulator collection, resolve mangled kernel symbols, export `trace.json` or `visualize_data.bin`, or troubleshoot simulator dump/export paths.

2026-04-25
ptoas-project-development
软件开发工程师

Project development guidance for PTOAS. Use when Codex modifies PTOAS source, MLIR ODS dialect definitions, C++ verifiers or transforms, CLI behavior, Python bindings, docs, tests, examples, or any user-visible PTOAS behavior; keeps cross-layer updates, license headers, regression tests, and examples synchronized.

2026-04-25
build-ptoas-wsl
软件开发工程师

Build PTOAS from source inside WSL using the repository README workflow. Use when Codex is asked to build, configure, install, test, or troubleshoot ptoas/PTOAS in WSL or Ubuntu, including LLVM/MLIR llvmorg-19.1.7 setup, CMake/Ninja out-of-tree builds, pybind11 Python bindings, runtime environment variables, CLI smoke tests, or Python dialect import validation.

2026-04-25
pto-isa
软件开发工程师

使用PTO-ISA实现指定算子功能的完整流程指南,涵盖ISA指令选择、数据流分析、指令功能解释和kernel代码生成

2026-06-04
pto-isa-flash-atten-a3-pipeline
软件开发工程师

PTO-DSL Flash Attention four-stage cross-core software pipeline for Ascend A3: compute_qk (Cube) -> compute_p (Vec) -> compute_pv (Cube) -> compute_gu (Vec), staged through a GM software FIFO. Captures the steady-state rhythm (cube-side per-tile emit_qk_pv interleaving, vec-side "drain GU then produce P"), the QK_PRELOAD / EXP_RING / S1_TILE knobs and their invariants, the UB 192 KiB budget with the row_slice working-tile shrink, the empirical S1 >= 16384 -> S1_TILE = 512 recommendation, and the op-pattern PIPE_V barrier removal recipe. Use when tuning the in-tree DSL Flash Attention, porting the four-stage pipeline to a new persistent-block kernel that mixes cube + vec stages through a GM FIFO, choosing QK_PRELOAD / S1_TILE for a new shape mix, or deciding when a PIPE_V barrier in generated C++ is safe to drop. Scoped to A3 non-causal prefill with HEAD=128, S0=128, CUBE_S1=128 -- other Flash Attention flavors (causal mask, GQA/MQA, KV-cache decode, A5 NZ/NZ+1 layout) belong in sibling skills.

2026-05-25
pto-isa-matmul-l2-schedule
软件开发工程师

PTO-DSL matmul L2-reuse scheduler for Ascend A2/A3: persistent-block GEMM with N-group swizzle along the inner M walk and M-direction zigzag at N-group boundaries. Captures the tile-id math, the CANN platform_config- driven swizzleCountN budget (with the 32 MiB safety-ratio cliff), the DN-B layout note, the runtime wiring, and the verification path against torch_npu. Use when tuning a matmul-shaped kernel that profiles as L2-bound, porting the swizzle/zigzag schedule to a new persistent-block kernel, choosing swizzleCountN for a new SoC, or deciding between the manual SPMD-static baseline and this persistent + swizzle schedule. Scoped to one schedule recipe — add a separate skill for other PTO-ISA performance patterns (vector reduce, flash-attention scheduling, etc.).

2026-05-21
pto-comm
软件开发工程师

基于 PTO-COMM ISA 开发通信算子的完整指南。涵盖 Host-Device 架构、文件结构、通信模式(P2P/集合通信/通算融合)、同步策略、信号矩阵设计、多 Block 调度、远端地址管理、构建系统配置等。触发:需要使用 PTO-COMM 开发通信算子、设计通信 kernel、编写 Host 侧代码、配置 CMakeLists 时。

2026-04-27
pto-isa-dev
软件开发工程师

Work effectively in PTO-ISA: choose the right backend, run CPU/SIM/NPU flows, trace instruction constraints, understand A2/A3 vs A5 differences, align with PTO-AS, debug failures, and apply review-derived guardrails from recent PRs.

2026-04-27
已展示 5 / 5 个仓库
已展示全部仓库