Run any Skill in Manus with one click

add-op

Stars86

Forks73

UpdatedMay 25, 2026 at 06:47

Add new operator definitions to PyPTO across all layers (C++, Python IR, Python DSL, tests, codegen, docs). Covers tile ops, tensor ops, tensor-to-tile conversion, and codegen registration. Use when the user asks to add a new op, define a new operator, implement a new tile/tensor operation, or extend the operator system.

Installation

Install with Codex or Claude Copy this prompt, paste it into Codex, Claude, or another assistant, and let it review the skill page and install it for you.

Run Skill in Manus

Source

hw-native-sys

hw-native-sys/pypto

View GitHub Repository View Creator Repositories

Download

Run Skill in Manus

Related occupationsSOC

Based on SOC occupation classification

Software DevelopersComputer and Mathematical Occupations·SOC 15-1252

File Explorer

2 files

SKILL.md

readonly

Add New Operator to PyPTO

Overview

Adding a new op follows a layered workflow with three phases:

Phase A (required): Tile op definition + tests + docs
Phase B (optional): Tensor op + tensor-to-tile conversion + tests + docs
Phase C (optional): Codegen (orchestration, PTO) + system tests

Task Tracking

Copy and track progress:

Phase A — Tile Op (Required):
- [ ] A1: C++ tile op registration
- [ ] A2: Python IR wrapper
- [ ] A3: Python DSL wrapper
- [ ] A4: Unit tests (tile op)
- [ ] A5: Documentation update

Phase B — Tensor Op (Optional):
- [ ] B1: C++ tensor op registration
- [ ] B2: Python IR wrapper
- [ ] B3: Python DSL wrapper
- [ ] B4: Tensor-to-tile conversion registration
- [ ] B5: Unit tests (tensor op + conversion)
- [ ] B6: Documentation update

Phase C — Codegen (Optional):
- [ ] C1: Orchestration codegen (if tensor op exists)
- [ ] C2: PTO codegen registration
- [ ] C3: System tests

Ask the user which phases are needed before starting.

Phase A: Tile Op

A1: C++ Tile Op Registration

File: src/ir/op/tile_ops/<category>.cpp (pick the matching semantic group)

Categories: elementwise.cpp, unary.cpp, reduction.cpp, matmul.cpp, memory.cpp, transform.cpp, broadcast.cpp, cross_core.cpp

If no existing file fits, create a new .cpp and add it to CMakeLists.txt (around line 98–106 where tile_ops/*.cpp are listed).

Pattern — use REGISTER_OP fluent API:

#include "pypto/ir/op_registry.h"

REGISTER_OP("tile.<op_name>")
    .set_op_category("TileOp")
    .set_description("<human-readable description>")
    .add_argument("<arg1>", "<arg1 description>")
    .add_argument("<arg2>", "<arg2 description>")
    // kwargs if needed:
    .set_attr<bool>("some_flag")
    // memory spaces (required for all TileOps):
    .set_input_memory(0, MemorySpace::Vec)
    .set_input_memory(1, MemorySpace::Vec)
    .set_output_memory(MemorySpace::Vec)
    // type deduction:
    .f_deduce_type([](const std::vector<ExprPtr>& args,
                      const std::vector<std::pair<std::string, std::any>>& kwargs) {
      // Validate args, compute output shape/dtype, return TypePtr
      return std::make_shared<TileType>(output_shape, output_dtype);
    });

Key rules:

All TileOps must set memory spaces — ValidateTileOps() checks at load time
Memory spaces: Vec, Left, Right, Acc, Unknown
Use shared deduction helpers when possible (e.g. DeduceTileOpElementwiseBinaryType)

A2: Python IR Wrapper

File: python/pypto/ir/op/tile_ops.py

def <op_name>(arg1: Expr, arg2: Expr, span: Span | None = None) -> Call:
    """<Description>.

    Args:
        arg1: <description>
        arg2: <description>
        span: Optional source span for debugging (auto-captured if not provided)

    Returns:
        Call expression for <op_name>
    """
    actual_span = _get_span_or_capture(span)
    return _ir_core.create_op_call("tile.<op_name>", [arg1, arg2], {}, actual_span)

For ops with kwargs, pass a dict as the third argument to create_op_call.

A3: Python DSL Wrapper

File: python/pypto/language/op/tile_ops.py

def <op_name>(arg1: Tile, arg2: Tile) -> Tile:
    """<Description>.

    Args:
        arg1: <description>
        arg2: <description>

    Returns:
        Tile wrapping the <op_name> operation
    """
    call_expr = _ir_ops.<op_name>(arg1.unwrap(), arg2.unwrap())
    return Tile(expr=call_expr)

Also add "<op_name>" to the __all__ list if one exists in the file.

A4: Unit Tests

Tile op test: tests/ut/ir/operators/test_tile_ops.py

def test_tile_<op_name>(self):
    """Test tile.<op_name> operator."""

    @pl.program
    class Program:
        @pl.function(type=pl.FunctionType.InCore)
        def main(
            self,
            a: pl.Tensor[[M, N], pl.FP32],
            b: pl.Tensor[[M, N], pl.FP32],
            output: pl.Tensor[[M, N], pl.FP32],
        ) -> pl.Tensor[[M, N], pl.FP32]:
            tile_a: pl.Tile[[m, n], pl.FP32] = pl.load(a, [0, 0], [m, n])
            tile_b: pl.Tile[[m, n], pl.FP32] = pl.load(b, [0, 0], [m, n])
            tile_c: pl.Tile[[m, n], pl.FP32] = pl.tile.<op_name>(tile_a, tile_b)
            result: pl.Tensor[[M, N], pl.FP32] = pl.store(tile_c, [0, 0], output)
            return result

    ir_str = str(Program)
    assert "tile.<op_name>" in ir_str

Test edge cases: shape mismatches, dtype combinations, dynamic dims.

A5: Documentation

Update docs/en/dev/ir/05-operators.md — add the new op to the appropriate table. Keep docs/zh-cn/dev/ir/05-operators.md in sync.

For detailed file paths and code templates, see reference.md.

Phase B: Tensor Op + Conversion

B1: C++ Tensor Op Registration

File: src/ir/op/tensor_ops/<category>.cpp

Same REGISTER_OP pattern as tile ops, but:

Use .set_op_category("TensorOp")
No memory spaces (tensors live in DDR)
Type deduction returns TensorType with broadcasting support

B2–B3: Python IR + DSL Wrappers

Same pattern as A2/A3 but in:

IR: python/pypto/ir/op/tensor_ops.py
DSL: python/pypto/language/op/tensor_ops.py

Use Tensor instead of Tile, TensorType instead of TileType.

B4: Tensor-to-Tile Conversion

File: src/ir/transforms/op_conversion_registry.cpp

Simple 1:1 mapping (most common):

RegisterSimple("tensor.<op_name>", "tile.<op_name>");

Custom conversion (when extra logic is needed):

RegisterCustom("tensor.<op_name>",
    [](const std::vector<ExprPtr>& args,
       const std::vector<std::pair<std::string, std::any>>& kwargs,
       const Span& span) -> ConversionResult {
  auto& reg = OpRegistry::GetInstance();
  // Custom logic: broadcast detection, prologue stmts, etc.
  return ConversionResult{reg.Create("tile.<op_name>", args, span)};
});

ConversionResult can include a prologue (vector of statements inserted before).

B5: Unit Tests

Tensor op test: tests/ut/ir/operators/test_tensor_ops.py

def test_tensor_<op_name>():
    call = ir.op.tensor.<op_name>(args...)
    assert isinstance(call, ir.Call)
    assert call.op.name == "tensor.<op_name>"
    result_type = call.type
    assert isinstance(result_type, ir.TensorType)

Conversion test: tests/ut/ir/transforms/test_convert_tensor_to_tile_ops.py

Uses Before/Expected pattern with ir.assert_structural_equal:

def test_<op_name>_conversion(self):
    @pl.program
    class Before:
        @pl.function(type=pl.FunctionType.InCore)
        def main_incore_0(self, x: pl.Tensor[...]) -> pl.Tensor[...]:
            y = pl.<op_name>(x, ...)
            return y
        @pl.function
        def main(self, x: pl.Tensor[...]) -> pl.Tensor[...]:
            y = self.main_incore_0(x)
            return y

    @pl.program
    class Expected:
        # tile.load + tile.<op_name> + tile.store pattern
        ...

    After = passes.convert_tensor_to_tile_ops()(Before)
    ir.assert_structural_equal(After, Expected)

B6: Documentation

Update docs/en/dev/ir/05-operators.md with the tensor op entry. Update docs/zh-cn/dev/ir/05-operators.md in sync.

Phase C: Codegen

C1: Orchestration Codegen (Tensor Op on Host)

File: src/codegen/tensor_op_codegen.cpp

Only needed if the tensor op can appear in orchestration (host-side) code.

REGISTER_ORCHESTRATION_OP(tensor_<op_name>, ("tensor.<op_name>")) {
  std::string result_var = codegen.GetCurrentResultTarget();
  std::ostringstream oss;
  // Generate host-side C++ code
  oss << "Tensor " << result_var << " = ...;";
  return oss.str();
}

C2: PTO Codegen (Tile Op on Device)

File: src/backend/common/pto_ops_common.cpp

Simple N-ary op — add to kSimpleOps table:

{"tile.<op_name>", "pto.<pto_instruction>", <arity>},

Custom codegen — register with backend.RegisterOp:

backend.RegisterOp("tile.<op_name>").f_codegen(
    [](const CallPtr& op, codegen::CodegenBase& codegen_base) {
  auto& codegen = dynamic_cast<codegen::PTOCodegen&>(codegen_base);
  // Custom PTO MLIR generation
  codegen.Emit("pto.<instruction> " + GenerateInsOutsClause(op, codegen));
  return "";
});

Also check if 910B backend needs special handling: src/backend/910B/backend_910b_ops.cpp

C3: System Tests

File: tests/st/codegen/test_<op_name>_codegen.py (or add to existing file)

System tests require hardware/environment. Follow patterns in:

tests/st/codegen/dsl/test_add_mul_orch_codegen.py

Codegen unit tests (no hardware needed): tests/ut/codegen/test_pto_codegen_ops.py

def test_<op_name>_pto_codegen(self):
    # Build IR program with tile.<op_name>
    # Run PTOCodegen
    # Assert output contains expected PTO instruction

Quick Reference — File Locations

Layer	Tile Op	Tensor Op
C++ registration	`src/ir/op/tile_ops/*.cpp`	`src/ir/op/tensor_ops/*.cpp`
Python IR	`python/pypto/ir/op/tile_ops.py`	`python/pypto/ir/op/tensor_ops.py`
Python DSL	`python/pypto/language/op/tile_ops.py`	`python/pypto/language/op/tensor_ops.py`
Conversion	—	`src/ir/transforms/op_conversion_registry.cpp`
PTO codegen	`src/backend/common/pto_ops_common.cpp`	—
Orchestration codegen	—	`src/codegen/tensor_op_codegen.cpp`
Tile op UT	`tests/ut/ir/operators/test_tile_ops.py`	—
Tensor op UT	—	`tests/ut/ir/operators/test_tensor_ops.py`
Conversion UT	—	`tests/ut/ir/transforms/test_convert_tensor_to_tile_ops.py`
Codegen UT	`tests/ut/codegen/test_pto_codegen_ops.py`	`tests/ut/codegen/test_orchestration_codegen.py`
ST	`tests/st/codegen/`	`tests/st/codegen/`
Docs (en)	`docs/en/dev/ir/05-operators.md`	`docs/en/dev/ir/05-operators.md`
Docs (zh-cn)	`docs/zh-cn/dev/ir/05-operators.md`	`docs/zh-cn/dev/ir/05-operators.md`
Codegen docs	`docs/en/dev/codegen/00-pto_codegen.md`	`docs/en/dev/codegen/02-orchestration_codegen.md`
CMake	`CMakeLists.txt` (line ~98–116)	`CMakeLists.txt` (line ~109–116)

For complete code templates and detailed examples, see reference.md.

name	add-op
description	Add new operator definitions to PyPTO across all layers (C++, Python IR, Python DSL, tests, codegen, docs). Covers tile ops, tensor ops, tensor-to-tile conversion, and codegen registration. Use when the user asks to add a new op, define a new operator, implement a new tile/tensor operation, or extend the operator system.

add-op

More from this repository

Add New Operator to PyPTO

Overview

Task Tracking

Phase A: Tile Op

A1: C++ Tile Op Registration

A2: Python IR Wrapper

A3: Python DSL Wrapper

A4: Unit Tests

A5: Documentation

Phase B: Tensor Op + Conversion

B1: C++ Tensor Op Registration

B2–B3: Python IR + DSL Wrappers

B4: Tensor-to-Tile Conversion

B5: Unit Tests

B6: Documentation

Phase C: Codegen

C1: Orchestration Codegen (Tensor Op on Host)

C2: PTO Codegen (Tile Op on Device)

C3: System Tests

Quick Reference — File Locations

Checklist Before Commit

Add New Operator to PyPTO

Overview

Task Tracking

Phase A: Tile Op

A1: C++ Tile Op Registration

A2: Python IR Wrapper

A3: Python DSL Wrapper

A4: Unit Tests

A5: Documentation

Phase B: Tensor Op + Conversion

B1: C++ Tensor Op Registration

B2–B3: Python IR + DSL Wrappers

B4: Tensor-to-Tile Conversion

B5: Unit Tests

B6: Documentation

Phase C: Codegen

C1: Orchestration Codegen (Tensor Op on Host)

C2: PTO Codegen (Tile Op on Device)

C3: System Tests

Quick Reference — File Locations

Checklist Before Commit

More from this repository