一键在 Manus 中运行任何 Skill

$pwd:

platform-dev

Name: Platform Dev
Author: mindspore-ai

// HyperParallel platform abstraction layer development. Use when adding new platform APIs, implementing cross-platform features (FSDP/HSDP/Pipeline/Activation Checkpoint), creating DTensorBase extensions, or modifying collective operations. Covers both PyTorch and MindSpore backends.

在 Manus 中运行

$ git log --oneline --stat

stars:1

forks:2

updated:2026年4月27日 09:22

文件资源管理器

9 个文件

SKILL.md

readonly

name	platform-dev
description	HyperParallel platform abstraction layer development. Use when adding new platform APIs, implementing cross-platform features (FSDP/HSDP/Pipeline/Activation Checkpoint), creating DTensorBase extensions, or modifying collective operations. Covers both PyTorch and MindSpore backends.

HyperParallel Platform Development Skill

Guides development of cross-platform features in the platform/ abstraction layer — adding new Platform APIs, implementing backend-specific features (FSDP, HSDP, Pipeline Parallelism, Activation Checkpoint), extending DTensorBase, and managing collective operations across PyTorch and MindSpore backends.

When to Use This Skill

Adding a new method to the Platform abstraction (platform/platform.py)
Implementing a new feature in platform/torch/ or platform/mindspore/
Modifying FSDP, HSDP, Pipeline Parallelism, or Activation Checkpoint platform code
Extending DTensorBase (torch or mindspore)
Adding or modifying collective operations (all_gather, all_reduce, reduce_scatter, etc.)
Implementing stream synchronization or memory lifecycle patterns
Working on process group management or device/RNG management

Architecture Overview

platform/
├── platform.py                    # Platform base class (~100+ abstract methods)
├── torch/                         # PyTorch backend
│   ├── platform.py                # TorchPlatform(Platform)
│   ├── dtensor.py                 # DTensorBase (torch.Tensor subclass)
│   ├── function_override.py       # DTensor backward hooks
│   ├── init_weights.py            # init_on_device context manager
│   ├── group_utils.py             # Process group creation
│   ├── clip_grad.py               # Distributed gradient clipping
│   ├── activation_checkpoint/     # SAC + Activation Swap
│   ├── fully_shard/               # FSDP + HSDP (state, param, scheduler, hooks; core hsdp_*.py)
│   └── pipeline_parallel/         # Pipeline stages + micro-batch
└── mindspore/                     # MindSpore backend
    ├── platform.py                # MindSporePlatform(Platform)
    ├── dtensor.py                 # DTensorBase (ms.Tensor subclass)
    ├── init_weights.py            # init_on_device context manager
    ├── parameter_init.py          # Parameter initialization with slice_index
    ├── platform_graph.py          # Graph construction utilities
    ├── custom_pass/               # Custom graph passes
    ├── fully_shard/               # FSDP + HSDP (state, param, scheduler, hooks; core hsdp_*.py)
    └── pipeline_parallel/         # Pipeline stages + micro-batch

How to Use

Call this skill with your task description:

# Add a new Platform API
/platform-dev Add a new `scatter()` collective operation to the Platform abstraction

# Implement a feature for one backend
/platform-dev Implement activation swap support for MindSpore backend

# Modify FSDP behavior
/platform-dev Fix the unshard scheduling in torch FSDP to support prefetch

# Extend DTensorBase
/platform-dev Add a new property to DTensorBase for tracking communication state

Execution Flow

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│  1. Scope        │ ──▶ │  2. Base Class   │ ──▶ │  3. Backend      │
│     Analysis     │     │     API Design   │     │     Implementation│
│  Identify what   │     │  platform.py     │     │  torch/ + ms/    │
│  needs to change │     │  abstract method │     │  concrete impl   │
└─────────────────┘     └─────────────────┘     └─────────────────┘
                                                           │
            ┌───────────────────────────────────────────────┘
            ▼
┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│  4. Cross-Platform│ ──▶ │  5. Testing     │ ──▶ │  6. Git Commit   │
│     Verification │     │  UT + ST        │     │  & PR Creation   │
│  Parity check    │     │  Both backends  │     │  Call autogit    │
└─────────────────┘     └─────────────────┘     └─────────────────┘

Workflow Execution Checklist

Step 1: Scope Analysis
- Goal: Identify affected files, understand existing patterns, determine change scope
- Output: List of files to modify, change strategy
Step 2: Base Class API Design
- Goal: Define/modify abstract methods in platform/platform.py
- Output: Updated Platform base class with new/modified abstract methods
Step 3: Backend Implementation
- Goal: Implement concrete methods in platform/torch/ and platform/mindspore/
- Output: Working implementation for both backends (or one with NotImplementedError for the other)
Step 4: Cross-Platform Verification
- Goal: Ensure feature parity, consistent semantics, no abstraction violations
- Output: Verification report
Step 5: Testing
- Goal: Add UT and ST tests covering both backends
- Output: Test files in tests/torch/ and/or tests/mindspore/
Step 6: Git Commit & PR
- Goal: Commit, push, and optionally create PR via autogit
- Output: Feature branch with clean commit

Key Decision Points

Decision	Criteria	Options	Impact
Change Scope	New API vs modifying existing	New abstract method / Modify existing / Internal only	Files affected, backward compat
Backend Priority	Which backend first	Torch first / MindSpore first / Both together	Development order
Feature Parity	Both backends needed?	Full parity / One backend + NotImplementedError	Test coverage
Stream Sync	Async operations involved?	Sync / Async with handle / Event-based	Correctness risk
Memory Pattern	Buffer management needed?	resize_(0) / Reuse / Allocate new	Memory efficiency

Quick Reference

See references/quick-reference.md for:

File location guide
Platform API categories
Cross-platform type mapping
Common patterns and anti-patterns

See references/architecture.md for:

Platform abstraction design
DTensorBase dispatch mechanism
FSDP/HSDP state lifecycle
Stream synchronization patterns
Memory management patterns

Hard Rules

Never import torch/mindspore directly in platform-agnostic code — use get_platform()
New Platform APIs must be added to base class first (platform/platform.py)
Both backends must be considered — implement or raise NotImplementedError
Cross-platform type differences — torch uses torch.device vs mindspore uses str; torch uses ProcessGroup vs mindspore uses str group names
Lazy backend imports — in platform/torch/ and platform/mindspore/, use lazy imports inside methods for framework modules; add # pylint: disable=C0415. Non-platform code uses module-top imports (see code-style.md)
handle.wait() before reading async collective output
event.record(src) → event.wait(dst) for cross-stream dependencies
resize_(0) to free device memory, never access freed storage

Related Skills

Skill	When to Use
code-review	After implementation, review for distributed correctness
autogit	Commit, push, create PR
dist-op-dev	When implementing distributed operator support (not platform layer)

related-skills.json

同仓库

autogit.md

from "mindspore-ai/hyper-parallel"

GitCode fork workflow automation. Use this skill whenever the user wants to commit code, push, create or append to a Pull Request, view PR status, squash commits, regenerate a PR description, or run lint checks against a GitCode `origin` (fork) + `upstream` repository. Supports both Chinese and English natural-language triggers (e.g. "帮我提交", "create PR", "看下 PR 状态") and slash-command shortcuts (`/commit`, `/create-pr`, etc.). The full trigger → subcommand mapping lives in the "When to Activate" section.

2026-04-271

code-review.md

from "mindspore-ai/hyper-parallel"

Review HyperParallel code changes for distributed correctness, stream synchronization, memory safety, cross-platform consistency, and code quality. Use when reviewing PRs, code changes, or when the user mentions "review", "code review", or "check this".

2026-04-271

dist-op-dev.md

from "mindspore-ai/hyper-parallel"

Execution-oriented workflow for HyperParallel distributed operator development. Analyzes the operator, implements or updates code and tests.

2026-04-171

dist-op-analysis.md

from "mindspore-ai/hyper-parallel"

Internal analysis tool for distributed operator development — provides interface specs, Primitive/ATen mappings and HyperParallel layout derivation logic. Used by dist-op-dev workflow. NOT for direct user calls.

2026-04-011

parallel-strategy-analyzer.md

from "mindspore-ai/hyper-parallel"

Analyze model architecture and hardware constraints to recommend optimal parallel strategy combinations (DP/FSDP/TP/PP/EP/CP) with memory, communication, compute, and pipeline bubble estimation.

2026-03-281

package.json

"author": "mindspore-ai"

"repository": "mindspore-ai/hyper-parallel"

打开 GitHub 仓库查看创作者相关仓库

$ install --global

$ download --local

在 Manus 中运行

$ useful --forSOC

软件开发工程师计算机与数学类职业15-1252L4

name	platform-dev
description	HyperParallel platform abstraction layer development. Use when adding new platform APIs, implementing cross-platform features (FSDP/HSDP/Pipeline/Activation Checkpoint), creating DTensorBase extensions, or modifying collective operations. Covers both PyTorch and MindSpore backends.

HyperParallel Platform Development Skill

Guides development of cross-platform features in the platform/ abstraction layer — adding new Platform APIs, implementing backend-specific features (FSDP, HSDP, Pipeline Parallelism, Activation Checkpoint), extending DTensorBase, and managing collective operations across PyTorch and MindSpore backends.

When to Use This Skill

Adding a new method to the Platform abstraction (platform/platform.py)
Implementing a new feature in platform/torch/ or platform/mindspore/
Modifying FSDP, HSDP, Pipeline Parallelism, or Activation Checkpoint platform code
Extending DTensorBase (torch or mindspore)
Adding or modifying collective operations (all_gather, all_reduce, reduce_scatter, etc.)
Implementing stream synchronization or memory lifecycle patterns
Working on process group management or device/RNG management

Architecture Overview

platform/
├── platform.py                    # Platform base class (~100+ abstract methods)
├── torch/                         # PyTorch backend
│   ├── platform.py                # TorchPlatform(Platform)
│   ├── dtensor.py                 # DTensorBase (torch.Tensor subclass)
│   ├── function_override.py       # DTensor backward hooks
│   ├── init_weights.py            # init_on_device context manager
│   ├── group_utils.py             # Process group creation
│   ├── clip_grad.py               # Distributed gradient clipping
│   ├── activation_checkpoint/     # SAC + Activation Swap
│   ├── fully_shard/               # FSDP + HSDP (state, param, scheduler, hooks; core hsdp_*.py)
│   └── pipeline_parallel/         # Pipeline stages + micro-batch
└── mindspore/                     # MindSpore backend
    ├── platform.py                # MindSporePlatform(Platform)
    ├── dtensor.py                 # DTensorBase (ms.Tensor subclass)
    ├── init_weights.py            # init_on_device context manager
    ├── parameter_init.py          # Parameter initialization with slice_index
    ├── platform_graph.py          # Graph construction utilities
    ├── custom_pass/               # Custom graph passes
    ├── fully_shard/               # FSDP + HSDP (state, param, scheduler, hooks; core hsdp_*.py)
    └── pipeline_parallel/         # Pipeline stages + micro-batch

How to Use

Call this skill with your task description:

# Add a new Platform API
/platform-dev Add a new `scatter()` collective operation to the Platform abstraction

# Implement a feature for one backend
/platform-dev Implement activation swap support for MindSpore backend

# Modify FSDP behavior
/platform-dev Fix the unshard scheduling in torch FSDP to support prefetch

# Extend DTensorBase
/platform-dev Add a new property to DTensorBase for tracking communication state

Execution Flow

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│  1. Scope        │ ──▶ │  2. Base Class   │ ──▶ │  3. Backend      │
│     Analysis     │     │     API Design   │     │     Implementation│
│  Identify what   │     │  platform.py     │     │  torch/ + ms/    │
│  needs to change │     │  abstract method │     │  concrete impl   │
└─────────────────┘     └─────────────────┘     └─────────────────┘
                                                           │
            ┌───────────────────────────────────────────────┘
            ▼
┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│  4. Cross-Platform│ ──▶ │  5. Testing     │ ──▶ │  6. Git Commit   │
│     Verification │     │  UT + ST        │     │  & PR Creation   │
│  Parity check    │     │  Both backends  │     │  Call autogit    │
└─────────────────┘     └─────────────────┘     └─────────────────┘

Workflow Execution Checklist

Step 1: Scope Analysis
- Goal: Identify affected files, understand existing patterns, determine change scope
- Output: List of files to modify, change strategy
Step 2: Base Class API Design
- Goal: Define/modify abstract methods in platform/platform.py
- Output: Updated Platform base class with new/modified abstract methods
Step 3: Backend Implementation
- Goal: Implement concrete methods in platform/torch/ and platform/mindspore/
- Output: Working implementation for both backends (or one with NotImplementedError for the other)
Step 4: Cross-Platform Verification
- Goal: Ensure feature parity, consistent semantics, no abstraction violations
- Output: Verification report
Step 5: Testing
- Goal: Add UT and ST tests covering both backends
- Output: Test files in tests/torch/ and/or tests/mindspore/
Step 6: Git Commit & PR
- Goal: Commit, push, and optionally create PR via autogit
- Output: Feature branch with clean commit

Key Decision Points

Decision	Criteria	Options	Impact
Change Scope	New API vs modifying existing	New abstract method / Modify existing / Internal only	Files affected, backward compat
Backend Priority	Which backend first	Torch first / MindSpore first / Both together	Development order
Feature Parity	Both backends needed?	Full parity / One backend + NotImplementedError	Test coverage
Stream Sync	Async operations involved?	Sync / Async with handle / Event-based	Correctness risk
Memory Pattern	Buffer management needed?	resize_(0) / Reuse / Allocate new	Memory efficiency

Quick Reference

See references/quick-reference.md for:

File location guide
Platform API categories
Cross-platform type mapping
Common patterns and anti-patterns

See references/architecture.md for:

Platform abstraction design
DTensorBase dispatch mechanism
FSDP/HSDP state lifecycle
Stream synchronization patterns
Memory management patterns

Hard Rules

Never import torch/mindspore directly in platform-agnostic code — use get_platform()
New Platform APIs must be added to base class first (platform/platform.py)
Both backends must be considered — implement or raise NotImplementedError
Cross-platform type differences — torch uses torch.device vs mindspore uses str; torch uses ProcessGroup vs mindspore uses str group names
Lazy backend imports — in platform/torch/ and platform/mindspore/, use lazy imports inside methods for framework modules; add # pylint: disable=C0415. Non-platform code uses module-top imports (see code-style.md)
handle.wait() before reading async collective output
event.record(src) → event.wait(dst) for cross-stream dependencies
resize_(0) to free device memory, never access freed storage

Related Skills

Skill	When to Use
code-review	After implementation, review for distributed correctness
autogit	Commit, push, create PR
dist-op-dev	When implementing distributed operator support (not platform layer)

platform-dev

HyperParallel Platform Development Skill

When to Use This Skill

Architecture Overview

How to Use

Execution Flow

Workflow Execution Checklist

Key Decision Points

Quick Reference

Hard Rules

Related Skills

同仓库更多 Skills

同仓库更多 Skills

HyperParallel Platform Development Skill

When to Use This Skill

Architecture Overview

How to Use

Execution Flow

Workflow Execution Checklist

Key Decision Points

Quick Reference

Hard Rules

Related Skills