Ejecuta cualquier Skill en Manus
con un clic

Ejecuta cualquier Skill en Manus con un clic

$pwd:

perf-optimization

Name: Perf Optimization
Author: irahardianto

// Profile-driven performance optimization protocol. Use when profiling data (CPU, heap, trace) is available or when the user requests performance analysis. Covers methodology, pattern catalog, safety invariants, and when-to-stop heuristics. Language-specific tooling is in languages/*.md.

Ejecutar en Manus

$ git log --oneline --stat

stars:146

forks:47

updated:25 de mayo de 2026, 01:40

Explorador de archivos

16 archivos

SKILL.md

readonly

related-skills.json

mismo repositorio

omni.md

from "irahardianto/awesome-agv"

Token-efficient communication protocol. Activate ONLY when: (1) user explicitly requests it (e.g., "use omni", "be concise", "compress output"), (2) dispatched as a sub-agent in /workflow-team pipelines where token budget matters, or (3) agent-to-agent communication via /omni headless modifier. Never activate by default in normal conversations — users expect natural language responses unless they opt in. Compresses prose form while preserving 100% technical accuracy. Code blocks, tool calls, file paths, and data are NEVER compressed.

2026-05-25146

code-review.md

from "irahardianto/awesome-agv"

Structured code review protocol for inspecting code quality against the full rule set. Use when auditing code written by yourself or another agent, during the /audit workflow, or when the user asks for a code review.

2026-05-25146

guardrails.md

from "irahardianto/awesome-agv"

Pre-flight checklist and post-implementation self-review protocol. Use before generating any code (pre-flight) and after writing code but before verification (self-review) to catch issues early.

2026-05-25146

debugging-protocol.md

from "irahardianto/awesome-agv"

Comprehensive protocol for validating root causes of software issues. Use when you need to systematically debug a complex bug, flaky test, or unknown system behavior by forming hypotheses and validating them with specific tasks.

2026-05-25146

adr.md

from "irahardianto/awesome-agv"

Architecture Decision Record skill for documenting significant architectural decisions with context, options, and consequences. Use during the Research phase when choosing between approaches, or whenever the user asks to document an architectural decision.

2026-05-25146

api-documentation.md

from "irahardianto/awesome-agv"

OpenAPI 3.1 specification writing, request/response examples, error documentation, versioning, and interactive API portal patterns.

2026-05-25146

package.json

"author": "irahardianto"

"repository": "irahardianto/awesome-agv"

Abrir repositorio de GitHub Ver repositorios del creador

$ install --global

$ download --local

Ejecutar en Manus

$ useful --forSOC

Desarrolladores de softwareOcupaciones informáticas y matemáticas15-1252L4

name	perf-optimization
description	Profile-driven performance optimization protocol. Use when profiling data (CPU, heap, trace) is available or when the user requests performance analysis. Covers methodology, pattern catalog, safety invariants, and when-to-stop heuristics. Language-specific tooling is in languages/*.md.

Performance Optimization Skill

When to Use

User provides profiling data (pprof, flamegraph, py-spy, Chrome DevTools, Dart DevTools)
User asks to analyze or optimize performance of a specific component
A benchmark regression is detected
After deploying a new feature that touches a hot path

Core Methodology

graph LR
    P1[Profile] --> P2[Analyze]
    P2 --> P3[Prioritize]
    P3 --> P4[Optimize]
    P4 --> P5[Benchmark]
    P5 --> P6{Improvement?}
    P6 -->|Yes| P7[Verify & Ship]
    P6 -->|No| P3

Step 1: Profile

Collect profiling data using the language-appropriate tool. Load the relevant languages/*.md module for exact commands.

Output: Raw profiling data (CPU profile, heap profile, or trace).

Step 2: Analyze

Read the profile. Focus on these principles (universal across all runtimes):

Focus on cum (cumulative): The total resources consumed by a function AND everything it called. This finds the expensive architectural flows.
Contextualize flat: Resources consumed by the function itself only. If a runtime function (GC, malloc, syscall) has high flat time, trace it UP the call chain to find the user-land code that triggered it.
Ignore runtime noise: Scheduler overhead (runtime.mcall, runtime.systemstack, GC workers) will always appear. Note if GC pressure is high, but don't try to "fix" the scheduler.
Separate benchmark artifacts from production cost: Test harness allocations (e.g., httptest.NewRequest, ResponseRecorder) inflate heap profiles but don't exist in production.

Output: Structured analysis document in docs/research_logs/{component}-perf-analysis.md.

Step 3: Prioritize

Rank fixes by impact/risk ratio:

Priority	Criteria
Do first	Low risk, high impact (caching, pre-allocation, fast-reject)
Do second	Medium risk, high impact (library swap, algorithm change)
Do last	High risk, high impact (major refactor, custom implementation)
Skip	Any risk, low impact (micro-optimization below noise floor)

Rule: If a fix requires more than 1 day AND saves < 20% on the hot path, defer it.

Step 4: Optimize

Implement one fix at a time. For each fix:

Write tests FIRST (TDD — Red → Green → Refactor)
Implement the fix
Run all existing tests to verify no regression
Benchmark immediately

Never batch multiple optimizations into one commit. Each fix must be independently verifiable and revertable.

Step 5: Benchmark

Compare before/after with the exact same benchmark configuration (same -benchtime, same -count, same machine load). Report:

ns/op (latency)
B/op (memory per operation)
allocs/op (heap allocations per operation)

Step 6: When to Stop

Stop optimizing when any of these are true:

Remaining CPU is in hardware-optimized assembly (AES-NI, P-256, SIMD) — you cannot beat the hardware
Remaining allocations are from the language runtime itself (GC, goroutine stacks, HTTP server internals)
The fix requires a custom implementation of a well-audited library — the security/maintenance risk outweighs the perf gain
The measured improvement is < 5% and within benchmark noise

Optimization Pattern Catalog

These are generic, language-agnostic patterns. Apply them when the profiling data shows the corresponding symptom.

Pattern: Result Caching

Symptom: Same expensive computation repeated with identical inputs (crypto verification, JSON parsing, regex compilation).

Fix: Cache results keyed by input hash. Use bounded LRU with TTL to prevent memory exhaustion.

Safety invariant: When caching security-sensitive results (auth tokens, permission checks):

ALWAYS re-validate expiry/revocation on cache hit
ALWAYS bound cache size (DoS protection)
ALWAYS set TTL shorter than the security credential's validity period

Pattern: Pre-allocation

Symptom: High allocs/op from repeatedly constructing the same objects (option structs, config slices, header maps).

Fix: Build the object once at init time, share it read-only across requests. Safe for concurrent use if the object is immutable after construction.

Pattern: Fast-Reject / Short-Circuit

Symptom: Expensive validation path runs even for clearly invalid inputs.

Fix: Add a cheap structural pre-check before the expensive path. Examples: check string length before regex, count delimiters before parsing, check content-type before deserialization.

Pattern: Library Swap

Symptom: High allocation count or CPU in a third-party library's internal parsing/serialization.

Fix: Replace with a library that uses lower-allocation strategies (manual scanners vs encoding/json.Decoder, zero-copy parsing, arena allocation).

Safety invariant: When swapping security-critical libraries (JWT, TLS, crypto):

Explicitly restrict accepted algorithms (prevent algorithm confusion attacks)
Verify the replacement library is well-audited and actively maintained
Run the full existing test suite — no behavioral change allowed

Pattern: Pooling

Symptom: High GC pressure from many short-lived objects of the same type being allocated and discarded rapidly.

Fix: Use an object pool (sync.Pool in Go, object pool in Java, arena in Rust) to reuse allocations.

Caveat: Only effective when objects are uniform in size and have a clear acquire/release lifecycle. Misuse creates subtle bugs.

Pattern: Batching

Symptom: Many small I/O operations (DB queries, HTTP calls, file writes) dominating wall-clock time.

Fix: Batch operations into fewer, larger calls. Examples: batch INSERT, pipeline Redis commands, buffer writes.

Pattern: Artifact Partitioning by Change Frequency

Symptom: Deploying a small change invalidates a large cached artifact (JS bundle, Docker image, compiled binary), forcing consumers to re-download/rebuild the entire thing.

Fix: Partition build artifacts by change frequency so that stable layers survive volatile deploys:

Stable layer: dependencies, vendor libraries, base images — changes rarely
Volatile layer: application code — changes on every deploy

Examples across stacks:

JS/Bundler: Vite manualChunks / Webpack splitChunks to isolate vendor libraries into separate chunks
Docker: multi-stage builds with COPY go.mod + RUN go mod download BEFORE COPY . . — dependency layer caches across builds
Monorepo: separate packages by change frequency so CI only rebuilds what changed

Safety invariant: Total artifact size stays the same or slightly increases (chunk overhead). The benefit is on repeat consumption — stable layers serve from cache.

When NOT to apply: One-shot artifacts with no caching benefit (single-use CI, ephemeral environments).

Pattern: Dependency Discovery Parallelization

Symptom: Sequential resource discovery creates waterfalls — each resource is discovered only after the previous one completes (download → parse → discover next → download → ...).

Fix: Declare dependencies as early as possible so the system can fetch them in parallel:

Move resource declarations upstream (earlier in the boot/parse sequence)
Use explicit hints to bypass sequential discovery chains

Examples across stacks:

Browser: <link rel="preconnect"> to establish connections before CSS/JS requests them; move CSS @import to HTML <link> for parallel discovery
Go: go mod download before build to prefetch modules
DB: connection pool warm-up at startup instead of on first query
DNS: dns-prefetch hints for domains the app will contact

Safety invariant: Only pre-declare resources you WILL use. Unused preconnects/prefetches waste resources (TCP connections, DNS queries, module downloads).

Pattern: Concurrent-Fetch Dedup

Symptom: Network tab shows two identical API calls fired at the same time. Multiple UI components mount simultaneously and each independently calls the same fetch function.

Fix: Add a loading-state guard (semaphore) at the store/service layer:

async function fetchData() {
    if (isLoading) return    // ← drop duplicate in-flight request
    isLoading = true
    try { data = await api.getData() }
    finally { isLoading = false }
}

When to apply: When the same data store is used by multiple co-mounted components (e.g., a navigation bar and a page view both calling fetchProfile() on mount).

Caveat: This is a simple semaphore, not request dedup. If the data needs refreshing after the in-flight call completes, the caller should retry. For advanced use cases, consider a proper request dedup cache (e.g., TanStack Query's staleTime).

Anti-Patterns (Things NOT to Do)

Don't optimize runtime internals. If runtime.mallocgc or runtime.gcBgMarkWorker is high, fix the USER CODE that triggers allocations — don't try to tune the GC directly.
Don't replace battle-tested crypto with custom implementations. The performance ceiling of ECDSA/RSA is in the math. Accept it.
Don't optimize based on gut feeling. Always profile first. Premature optimization is the root of all evil.
Don't combine multiple optimizations into one commit. If a combined commit causes a regression, you can't isolate which fix is at fault.
Don't disable security features for performance. Algorithm restriction, input validation, and expiry checks are non-negotiable.
Don't profile without a stable baseline. Run benchmarks with fixed parameters (-benchtime, -count, same machine load). Without a reproducible baseline, before/after comparisons are meaningless noise.

Language Modules

Load the relevant language module when working with a specific runtime:

Module	Use when
Go	Go services, APIs, CLI tools
TypeScript	Node.js/Deno backend (event loop, streams, connection pools)
Python	Python services, CLI, data pipelines
Rust	Rust binaries, libraries
Java	Java/JVM services (JFR, GC tuning, JIT, JMH benchmarks)
C#	C#/.NET services (Span, ObjectPool, EF Core, BenchmarkDotNet)
Swift	Swift apps (Instruments, value types, TaskGroup, os_signpost)
Flutter	Flutter apps (const widgets, ListView.builder, isolates, DevTools)
C++	C++ (data-oriented design, cache locality, SIMD, Google Benchmark)
Kotlin	Kotlin/JVM (inline functions, sequences, value classes, coroutine overhead)
PHP	PHP (OPcache/JIT, eager loading, caching, queue offloading, phpbench)
Ruby	Ruby/Rails (eager loading, batch processing, caching, stackprof)
Frontend	Web frontends (JS/TS bundle, rendering, network)

Contributing: After completing a perf optimization session, extract generalizable patterns from your docs/research_logs/ findings into this catalog. Project-specific details stay in the research log; reusable patterns belong here.

Profiling Scripts

Language-specific data extraction scripts live in scripts/:

Script	Purpose
go-pprof.sh	Extract Go pprof CPU/heap profiles into agent-readable markdown
frontend-lighthouse.sh	Two modes: `lighthouse` (Core Web Vitals, needs Chrome) or `bundle` (Vite chunk analysis, always works)

perf-optimization

Más de este repositorio

Más de este repositorio

Performance Optimization Skill

When to Use

Core Methodology

Step 1: Profile

Step 2: Analyze

Step 3: Prioritize

Step 4: Optimize

Step 5: Benchmark

Step 6: When to Stop

Optimization Pattern Catalog

Pattern: Result Caching

Pattern: Pre-allocation

Pattern: Fast-Reject / Short-Circuit

Pattern: Library Swap

Pattern: Pooling

Pattern: Batching

Pattern: Artifact Partitioning by Change Frequency

Pattern: Dependency Discovery Parallelization

Pattern: Concurrent-Fetch Dedup

Anti-Patterns (Things NOT to Do)

Language Modules

Profiling Scripts

Performance Optimization Skill

When to Use

Core Methodology

Step 1: Profile

Step 2: Analyze

Step 3: Prioritize

Step 4: Optimize

Step 5: Benchmark

Step 6: When to Stop

Optimization Pattern Catalog

Pattern: Result Caching

Pattern: Pre-allocation

Pattern: Fast-Reject / Short-Circuit

Pattern: Library Swap

Pattern: Pooling

Pattern: Batching

Pattern: Artifact Partitioning by Change Frequency

Pattern: Dependency Discovery Parallelization

Pattern: Concurrent-Fetch Dedup

Anti-Patterns (Things NOT to Do)

Language Modules

Profiling Scripts