Run any Skill in Manus with one click

perf-pipeline

Stars1

Forks0

UpdatedApril 2, 2026 at 04:22

Use when /bench-compare or /perf-regression identifies a regression needing root cause, when multiple performance dimensions need simultaneous triage, or when optimization work should be dispatched automatically. Two-phase diagnose-then-optimize pipeline.

Installation

Install with Codex or Claude Copy this prompt, paste it into Codex, Claude, or another assistant, and let it review the skill page and install it for you.

Run Skill in Manus

Source

ahrav

ahrav/Gossip-rs

View GitHub Repository View Creator Repositories

Download

Run Skill in Manus

Related occupationsSOC

Based on SOC occupation classification

Software DevelopersComputer and Mathematical Occupations·SOC 15-1252

SKILL.md

readonly

More from this repository

same repository

merge-reviews

ahrav/Gossip-rs

Consolidate all /review-capture drop files for the current branch into a single verified, deduplicated, conflict-annotated merged plan. Verifies each finding against HEAD (discarding stale ones), merges duplicates across reviewers, flags conflicting suggested fixes, groups findings into execution waves by file ownership, and deletes the individual drop files on success. Run after all parallel review terminals have completed.

2026-04-221

review-capture

ahrav/Gossip-rs

Wrapper skill that invokes any review skill/command and captures its findings into a structured YAML drop file under .claude/review-drops/<branch>/. Use when running parallel code reviews across multiple terminals (each terminal captures one reviewer's output) so a later /merge-reviews pass can dedup, verify, and consolidate before execution. Accepts any target review skill (e.g. ce:review, multi-reviewer-patterns, asm-forge, review-pipeline, perf-pipeline, cache-correctness-review, security-reviewer, etc.).

2026-04-221

create-task

ahrav/Gossip-rs

Use when creating any beads task — auto-researches the codebase, links related tasks, and produces a rich self-contained description from a structured template. Accepts minimal intent and outputs a complete task ready for agent implementation.

2026-04-201

review-pipeline

ahrav/Gossip-rs

Use when you want review AND automated fixes in one pass, when /review-dispatch alone would leave findings unaddressed, or before merging a feature branch that needs thorough diagnosis and remediation. Two-phase diagnose-then-fix pipeline.

2026-04-201

review-task

ahrav/Gossip-rs

Use when a beads task exists and needs validation before implementation — verifies codebase references, identifies edge cases and design flaws, assesses scope and feasibility, splits oversized tasks, dispatches domain-specific skills (test-strategy, unsafe-review, dist-sys-auditor, simd-optimize, asm-forge, performance-analyzer, security-reviewer, interface-design-review, sim-review, safe-over-unsafe) for specialized enrichment, and dispatches /deep-research or /deeper-research for ambiguous areas. The complement of /create-task — ensures tasks are buttoned up and ready for mechanical implementation.

2026-04-201

task-forge

ahrav/Gossip-rs

Use when creating implementation-ready beads tasks that need testing strategy, optimal implementation approach, and documentation requirements baked in — composes /create-task with parallel enrichment agents that analyze the codebase and produce concrete test specifications, algorithm/data-structure guidance, and doc quality standards so implementing agents don't need to re-research

2026-04-201

name	perf-pipeline
description	Use when /bench-compare or /perf-regression identifies a regression needing root cause, when multiple performance dimensions need simultaneous triage, or when optimization work should be dispatched automatically. Two-phase diagnose-then-optimize pipeline.
user-invocable	true

Performance Pipeline

Two-phase performance team: triage from multiple angles in parallel, then dispatch the right specialist skill for each finding.

When to Use

After writing or modifying hot-path code
When benchmarks show unexpected regressions
Before merging performance-sensitive changes
When you need deeper analysis than a single /performance-analyzer pass
For systematic optimization of a module or subsystem

Invocation

/perf-pipeline [<target>]

No argument: analyze recently changed files in the working tree
File path or glob: analyze specific files or modules
--bench <name>: start from Criterion benchmark results
--crate <name>: analyze an entire crate's hot paths

Phase 1: Parallel Triage

Launch three diagnostic agents in parallel using the Agent tool. Each agent approaches performance diagnosis from a different angle.

Agent A — Benchmark Triage

Analyze Criterion benchmark data for regressions, outliers, and trends.

Agent prompt template:

You are a Rust performance analyst specializing in benchmark interpretation.
Analyze the target code and any available Criterion benchmark results.

Look for:
- Statistical regressions (>5% median change)
- High variance indicating measurement instability
- Outlier samples suggesting GC pressure or system noise
- Benchmark gaps (hot code paths with no benchmarks)
- Comparison opportunities (before/after data available)

For each finding, report:
- Category: benchmark-regression | benchmark-gap | measurement-instability
- Location: file:line or benchmark name
- Evidence: numbers, percentages, statistical significance
- Impact estimate: how much latency/throughput is affected
- Recommended next step: which tool or skill to use

Target: {target_description}

Run `cargo bench --bench <relevant> -- --list` to discover available benchmarks.
Run benchmarks if needed to gather data.

Agent B — Static Analysis

Analyze code patterns for performance anti-patterns without running anything.

Agent prompt template:

You are a Rust performance analyst specializing in static code analysis for
performance issues. Analyze the target code for anti-patterns.

Check for:

Memory & Allocation:
- Unnecessary allocations in loops (Vec, String, Box)
- Missing with_capacity() for known-size collections
- Cloning where borrowing would suffice
- Large structs passed by value

CPU & Cache:
- False sharing in concurrent data structures
- Cache-unfriendly access patterns (strided, random)
- Branch-heavy code amenable to branchless alternatives
- Missing #[inline] on small hot functions

Async & Concurrency:
- Blocking operations in async contexts
- Lock contention patterns
- Oversized futures
- Unnecessary Arc when ownership would work

Project-specific patterns:
- NONE_U32 = u32::MAX sentinels (avoid Option overhead)
- Allocation tier violations (HOT paths must be allocation-silent)
- ByteSlab/InlineVec/RingBuffer usage opportunities

For each finding, report:
- Category: allocation-hotspot | cache-hostile | lock-contention |
  async-blocking | codegen-issue | vectorization-opportunity
- Location: file:line
- Evidence: the specific code pattern
- Severity: Critical (measurable impact) | High (likely impact) |
  Medium (potential impact) | Low (minor)
- Recommended fix: actionable change with code sketch

Target: {target_description}

Agent C — Hotspot Detection

Pre-profiling heuristic scan for likely performance bottlenecks.

Agent prompt template:

You are a Rust performance analyst specializing in hotspot detection. Scan
the target code to find functions and code paths most likely to be performance
bottlenecks, without running profilers.

Heuristics:
- Loop nesting depth and iteration counts
- Allocation density (allocs per iteration)
- Call graph depth in hot paths
- Data structure choice vs access pattern mismatch
- Serialization/deserialization in request paths
- Redundant computation (same value computed multiple times)
- Missed opportunities for short-circuit evaluation

For each hotspot, report:
- Risk level: High | Medium | Low
- Location: file:line (function name)
- Why it's likely hot: evidence from code structure
- Impact estimate: order-of-magnitude guess
- Ease of fix: Easy | Medium | Hard
- Recommended Phase 2 skill:
  * /heap-profile — for allocation attribution
  * /simd-optimize — for vectorizable loops
  * /asm-forge — for codegen quality issues
  * /bench-compare — for before/after measurement
  * /perf-topdown — for CPU microarchitecture bottlenecks
  * /pgo-bolt — for binary layout optimization
  * /causal-profile — for critical-path ambiguity
  * /linux-perf-profile — for PMU counter evidence

Target: {target_description}

Synthesis & Classification

After all three agents complete, merge and classify findings:

Match by location: Group findings referencing the same function or file:line
Score convergence: Findings from multiple agents get elevated priority
Classify each finding into one of these categories:

Category	Phase 2 Skill	Description
`allocation-hotspot`	`/heap-profile`	Excessive heap allocations in hot path
`vectorization-opportunity`	`/simd-optimize`	Loop pattern amenable to SIMD
`codegen-issue`	`/asm-forge`	Missed optimization visible in assembly
`benchmark-regression`	`/bench-compare`	Needs before/after measurement
`microarch-bottleneck`	`/perf-topdown`	Cache misses, branch misprediction
`pgo-candidate`	`/pgo-bolt`	Binary layout optimization opportunity
`critical-path-unclear`	`/causal-profile`	Hot function may not be on critical path
`needs-pmu-data`	`/linux-perf-profile`	Need hardware counter evidence
`general-optimization`	`/asm-forge`	Default: assembly-guided optimization

Tag Phase 2 type: Mark each finding as:
- Diagnostic (read-only): heap-profile, perf-topdown, linux-perf-profile, causal-profile
- Optimization (read-write): asm-forge, simd-optimize, pgo-bolt
- Measurement (read-only): bench-compare

Human Gate

Present findings to the user:

## Perf Pipeline — Phase 1 Complete

Found {N} performance findings across {M} files.

### Findings (ranked by impact + convergence)

| #  | Risk | Location               | Issue                        | Category              | Phase 2 Skill    | Type        |
|----|------|------------------------|------------------------------|-----------------------|------------------|-------------|
| 1  | High | src/engine/core.rs:42  | Vec alloc in per-claim loop  | allocation-hotspot    | /heap-profile    | Diagnostic  |
| 2  | High | src/shard/split.rs:88  | Branchless opportunity       | codegen-issue         | /asm-forge       | Optimization|
| 3  | Med  | src/stdx/inline_vec.rs | Loop amenable to NEON SIMD   | vectorization-opp     | /simd-optimize   | Optimization|
| 4  | Med  | bench: acquire_restore | 12% regression vs baseline   | benchmark-regression  | /bench-compare   | Measurement |

Approve all? Enter numbers to select, or modify skill assignments:

The user can:

Approve all: "all"
Select specific findings: "1,2,3"
Override skill assignment: "3 -> /asm-forge" (change recommended skill)
Skip: "none"

Phase 2: Targeted Execution

Dispatch Order

Diagnostic findings first (read-only) — these produce deeper data that may inform optimization decisions
Present diagnostic results — brief summary of what profiling found
Optimization findings (read-write) — these modify code, dispatched with file ownership boundaries
Measurement findings (read-only) — run after optimizations to validate

Agent Dispatch

For each approved finding, launch an Agent whose prompt embeds the relevant skill's methodology:

You are a performance specialist applying {skill_name} methodology.

Finding to address:
- Category: {category}
- Location: {file:line}
- Issue: {description}
- Evidence from triage: {evidence}

{Skill-specific methodology and checklist inlined here}

Files you own (only modify these): {file list}

After any code changes, run:
  cargo fmt --all && cargo check && cargo clippy --all-targets --all-features -- -D warnings

Parallel vs Sequential

Diagnostic agents (read-only) → all in parallel
Optimization agents on non-overlapping files → in parallel
Optimization agents on overlapping files → sequential
Measurement agents → after optimizations complete

Feedback Loop

If diagnostic Phase 2 agents (heap-profile, perf-topdown, etc.) produce findings that change the optimization picture, present an intermediate gate:

## Perf Pipeline — Diagnostic Phase 2 Complete

/heap-profile found: Top allocator is `ShardMap::resize` at 4.2MB/s
/perf-topdown found: 38% of cycles are backend-bound (L3 cache misses)

Updated recommendations:
| # | Location | Original Skill | Updated Skill | Reason |
|---|----------|---------------|---------------|--------|
| 2 | split.rs:88 | /asm-forge | /simd-optimize | Cache-line alignment more impactful |

Proceed with updated plan? Or modify:

Completion

## Perf Pipeline — Complete

### Results

| Finding | Phase 2 Skill | Status    | Result                              |
|---------|---------------|-----------|-------------------------------------|
| #1      | /heap-profile | Diagnosed | ShardMap::resize is top allocator   |
| #2      | /asm-forge    | Optimized | Eliminated branch in split loop     |
| #3      | /simd-optimize| Optimized | NEON vectorized InlineVec scan      |
| #4      | /bench-compare| Measured  | 8% improvement vs baseline          |

### Verification

Run to confirm:
  cargo fmt --all && cargo check && cargo clippy --all-targets --all-features -- -D warnings
  cargo bench --bench <relevant>

Error Handling

If a Phase 1 agent fails, proceed with the other agents' findings
If a Phase 2 optimization makes cargo check fail, revert and report
If bench-compare shows regression after optimization, flag for user review

Related Skills

/rust-perf-triage — Phase 1 methodology (benchmark data)
/performance-analyzer — Phase 1 methodology (static analysis)
/rust-hotspot-finder — Phase 1 methodology (hotspot heuristics)
/heap-profile /simd-optimize /asm-forge /bench-compare /perf-topdown /pgo-bolt /causal-profile /linux-perf-profile — Phase 2 specialists
/review-pipeline — Code quality team pipeline
/test-pipeline — Testing team pipeline