with one click
dist-op-analysis
// Internal analysis tool for distributed operator development — provides interface specs, Primitive/ATen mappings and HyperParallel layout derivation logic. Used by dist-op-dev workflow. NOT for direct user calls.
// Internal analysis tool for distributed operator development — provides interface specs, Primitive/ATen mappings and HyperParallel layout derivation logic. Used by dist-op-dev workflow. NOT for direct user calls.
| name | dist-op-analysis |
| description | Internal analysis tool for distributed operator development — provides interface specs, Primitive/ATen mappings and HyperParallel layout derivation logic. Used by dist-op-dev workflow. NOT for direct user calls. |
⚠️ Internal SKILL — This SKILL is invoked automatically by
dist-op-dev. Do NOT call it directly.
This SKILL provides read-only analysis for HyperParallel distributed operator development. Given a MindSpore mint or PyTorch op name, it explores framework source code to extract:
This SKILL is automatically called by Workflow 1: Operator Analysis in the dist-op-dev SKILL. It should never be invoked directly by users.
If the user provides a path, use it. Otherwise discover installed packages:
pip show mindspore | grep Location # → installed location
pip show torch | grep Location # → installed location
Source code paths (preferred for deep analysis): user-provided or ask. Validate:
mindspore/python/mindspore/mint/ (conda/pip installs lack ccsrc/ sources)torch/distributed/tensor/_ops/MindSpore mint — Read actual source files to extract full signatures:
<ms_path>/mindspore/python/mindspore/mint/__init__.py — trace each import to its real definitionmint.nn.functional → <ms_path>/mindspore/python/mindspore/mint/nn/functional.pymint.linalg / mint.special → corresponding submodules<ms_path>/mindspore/ops/op_def/yaml/{op}_op.yaml or func_op/ — extract input/output/attribute parametersPyTorch — Read actual source files:
<pt_path>/torch/nn/functional.py — high-level function signatures<pt_path>/torch/_refs/ — reference implementations showing parameter semantics<pt_path>/torch/_C/_VariableFunctions.pyi or use torch.ops.aten.{op} patternsMindSpore distributed strategy — conda/pip-installed MindSpore ships compiled binaries with no .cc sources. Derive sharding strategy from the HyperParallel side:
DistributedOp mapping for that Primitive in hyper_parallel/core/shard/ops/yaml/*.yamlinfer_layout() and get_expand_impl() in core/shard/ops/parallel_*.py for sharding logicccsrc/frontend/parallel/ops_info/ (non-conda only)PyTorch distributed DTensor strategies — Read actual Python source:
<pt_path>/torch/distributed/tensor/_ops/_matrix_ops.py — mm, bmm, addmm, linear<pt_path>/torch/distributed/tensor/_ops/_pointwise_ops.py — element-wise ops with broadcasting<pt_path>/torch/distributed/tensor/_ops/_conv_ops.py — conv1d/2d/3d<pt_path>/torch/distributed/tensor/_ops/_view_ops.py — reshape, transpose, permute, slice<pt_path>/torch/distributed/tensor/_ops/_einsum_strategy.py — einsum strategy generation@register_op_strategy(aten.xxx) returning OpStrategy with PlacementStrategy listing valid input/output placement combinationsHyperParallel mapping:
hyper_parallel/core/shard/ops/yaml/*.yaml — maps Primitive/ATen names to DistributedOp classhyper_parallel/core/shard/ops/parallel_*.py — infer_layout() + optional get_expand_impl()hyper_parallel/core/shard/_op_dispatch.py)DTensorBase.__torch_function__ / __fallback__
→ platform.get_op_name() → whitelist? → random op?
→ YAML lookup (layout_infer_ops[op_name])
→ infer_layout_suffix routing: "" | "WithShape" | "Reshape" | "WithTupleExpand" | "Slice"
→ distribute_op.infer_layout(input_layouts, extra_args) → output Layout
→ optional get_expand_impl() → execute on local tensors → DTensor.from_local()
Cache keyed by layout.compact_str + rank_id. 45 YAML files in core/shard/ops/yaml/.
core/dtensor/dtensor.py): from_local(), to_local(), redistribute(), is_partial() (method, not property)core/dtensor/layout.py): alias_tensor_map maps tensor dims → device axes; set_partial_by_dev_axis(axis, 'sum'); compact_str for cachecore/dtensor/device_mesh.py): named axes (dp, tp, pp, cp); get_device_num_along_axis(axis)When consulted about a specific operator, provide:
dim, axis, transpose_a/b, keepdim)Partial('sum'); reduction → sharded dim reduced → Replicate; concat → merge on concat axis; reshape → must not cross sharded boundariesget_expand_impl wraps the native op to restore equivalence — common patterns include bias/normalization scaling (Linear), attention score normalization (FlashAttention), index/coordinate remapping (index_select, gather)infer_layout_suffix to use:
"" (default): op only needs input layouts + scalar kwargs (e.g., transpose flags). Most element-wise and matrix ops."WithShape": layout inference depends on input tensor shapes (e.g., broadcasting ops, expand, repeat)."Reshape": op changes tensor shape in ways requiring specialized shape-to-layout mapping (e.g., reshape, view, flatten, squeeze, unsqueeze)."WithTupleExpand": op takes tuple/list arguments that need flattening before layout extraction (e.g., stack, cat with tuple inputs)."Slice": op takes begin/end/stride coordinates that must be transformed per-shard (e.g., slice, slice_scatter).DistributedOp (direct): new op with unique layout logic not shared by existing ops.*DistributedOp subclass: extend an existing class when the new op shares the same layout inference pattern (e.g., new element-wise variant → inherit ElementWiseDistributedOp; new matmul variant → inherit from MatMulDistributedOp or BaseBatchMatMulDistributedOp).BaseXxxDistributedOp).hyper_parallel/core/shard/_op_dispatch.py — dispatch mechanismhyper_parallel/core/shard/ops/parallel_ops.py — DistributedOp base classhyper_parallel/core/shard/ops/parallel_ops_register.py — op registryhyper_parallel/core/shard/ops/yaml/*.yaml — operator configs (45 files)hyper_parallel/core/shard/ops/parallel_*.py — distributed implementationshyper_parallel/core/shard/custom_shard.py — @custom_shard decoratorhyper_parallel/core/dtensor/{dtensor,layout,device_mesh}.pyhyper_parallel/platform/torch/dtensor.py — torch interceptionhyper_parallel/platform/platform.py — platform abstractionGitCode fork workflow automation. Use this skill whenever the user wants to commit code, push, create or append to a Pull Request, view PR status, squash commits, regenerate a PR description, or run lint checks against a GitCode `origin` (fork) + `upstream` repository. Supports both Chinese and English natural-language triggers (e.g. "帮我提交", "create PR", "看下 PR 状态") and slash-command shortcuts (`/commit`, `/create-pr`, etc.). The full trigger → subcommand mapping lives in the "When to Activate" section.
Review HyperParallel code changes for distributed correctness, stream synchronization, memory safety, cross-platform consistency, and code quality. Use when reviewing PRs, code changes, or when the user mentions "review", "code review", or "check this".
HyperParallel platform abstraction layer development. Use when adding new platform APIs, implementing cross-platform features (FSDP/HSDP/Pipeline/Activation Checkpoint), creating DTensorBase extensions, or modifying collective operations. Covers both PyTorch and MindSpore backends.
Execution-oriented workflow for HyperParallel distributed operator development. Analyzes the operator, implements or updates code and tests.
Analyze model architecture and hardware constraints to recommend optimal parallel strategy combinations (DP/FSDP/TP/PP/EP/CP) with memory, communication, compute, and pipeline bubble estimation.