| name | platform-dev |
| description | HyperParallel platform abstraction layer development. Use when adding new platform APIs, implementing cross-platform features (FSDP/HSDP/Pipeline/Activation Checkpoint), creating DTensorBase extensions, or modifying collective operations. Covers both PyTorch and MindSpore backends. |
HyperParallel Platform Development Skill
Guides development of cross-platform features in the platform/ abstraction layer — adding new Platform APIs, implementing backend-specific features (FSDP, HSDP, Pipeline Parallelism, Activation Checkpoint), extending DTensorBase, and managing collective operations across PyTorch and MindSpore backends.
When to Use This Skill
- Adding a new method to the Platform abstraction (
platform/platform.py)
- Implementing a new feature in
platform/torch/ or platform/mindspore/
- Modifying FSDP, HSDP, Pipeline Parallelism, or Activation Checkpoint platform code
- Extending DTensorBase (torch or mindspore)
- Adding or modifying collective operations (all_gather, all_reduce, reduce_scatter, etc.)
- Implementing stream synchronization or memory lifecycle patterns
- Working on process group management or device/RNG management
Architecture Overview
platform/
├── platform.py # Platform base class (~100+ abstract methods)
├── torch/ # PyTorch backend
│ ├── platform.py # TorchPlatform(Platform)
│ ├── dtensor.py # DTensorBase (torch.Tensor subclass)
│ ├── function_override.py # DTensor backward hooks
│ ├── init_weights.py # init_on_device context manager
│ ├── group_utils.py # Process group creation
│ ├── clip_grad.py # Distributed gradient clipping
│ ├── activation_checkpoint/ # SAC + Activation Swap
│ ├── fully_shard/ # FSDP + HSDP (state, param, scheduler, hooks; core hsdp_*.py)
│ └── pipeline_parallel/ # Pipeline stages + micro-batch
└── mindspore/ # MindSpore backend
├── platform.py # MindSporePlatform(Platform)
├── dtensor.py # DTensorBase (ms.Tensor subclass)
├── init_weights.py # init_on_device context manager
├── parameter_init.py # Parameter initialization with slice_index
├── platform_graph.py # Graph construction utilities
├── custom_pass/ # Custom graph passes
├── fully_shard/ # FSDP + HSDP (state, param, scheduler, hooks; core hsdp_*.py)
└── pipeline_parallel/ # Pipeline stages + micro-batch
How to Use
Call this skill with your task description:
/platform-dev Add a new `scatter()` collective operation to the Platform abstraction
/platform-dev Implement activation swap support for MindSpore backend
/platform-dev Fix the unshard scheduling in torch FSDP to support prefetch
/platform-dev Add a new property to DTensorBase for tracking communication state
Execution Flow
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ 1. Scope │ ──▶ │ 2. Base Class │ ──▶ │ 3. Backend │
│ Analysis │ │ API Design │ │ Implementation│
│ Identify what │ │ platform.py │ │ torch/ + ms/ │
│ needs to change │ │ abstract method │ │ concrete impl │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│
┌───────────────────────────────────────────────┘
▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ 4. Cross-Platform│ ──▶ │ 5. Testing │ ──▶ │ 6. Git Commit │
│ Verification │ │ UT + ST │ │ & PR Creation │
│ Parity check │ │ Both backends │ │ Call autogit │
└─────────────────┘ └─────────────────┘ └─────────────────┘
Workflow Execution Checklist
Key Decision Points
| Decision | Criteria | Options | Impact |
|---|
| Change Scope | New API vs modifying existing | New abstract method / Modify existing / Internal only | Files affected, backward compat |
| Backend Priority | Which backend first | Torch first / MindSpore first / Both together | Development order |
| Feature Parity | Both backends needed? | Full parity / One backend + NotImplementedError | Test coverage |
| Stream Sync | Async operations involved? | Sync / Async with handle / Event-based | Correctness risk |
| Memory Pattern | Buffer management needed? | resize_(0) / Reuse / Allocate new | Memory efficiency |
Quick Reference
See references/quick-reference.md for:
- File location guide
- Platform API categories
- Cross-platform type mapping
- Common patterns and anti-patterns
See references/architecture.md for:
- Platform abstraction design
- DTensorBase dispatch mechanism
- FSDP/HSDP state lifecycle
- Stream synchronization patterns
- Memory management patterns
Hard Rules
- Never import torch/mindspore directly in platform-agnostic code — use
get_platform()
- New Platform APIs must be added to base class first (
platform/platform.py)
- Both backends must be considered — implement or raise
NotImplementedError
- Cross-platform type differences — torch uses
torch.device vs mindspore uses str; torch uses ProcessGroup vs mindspore uses str group names
- Lazy backend imports — in
platform/torch/ and platform/mindspore/, use lazy imports inside methods for framework modules; add # pylint: disable=C0415. Non-platform code uses module-top imports (see code-style.md)
- handle.wait() before reading async collective output
- event.record(src) → event.wait(dst) for cross-stream dependencies
- resize_(0) to free device memory, never access freed storage
Related Skills
| Skill | When to Use |
|---|
| code-review | After implementation, review for distributed correctness |
| autogit | Commit, push, create PR |
| dist-op-dev | When implementing distributed operator support (not platform layer) |