Run any Skill in Manus with one click

speedup-analysis

Analyze a slow process, identify bottleneck, write spec, delegate optimization to Codex. Triggers on: /speedup-analysis, speed this up, optimize performance, this is too slow, why is this slow.

Run Skill in Manus

Overview

Analyze a slow process, identify bottleneck, write spec, delegate optimization to Codex. Triggers on: /speedup-analysis, speed this up, optimize performance, this is too slow, why is this slow.

Install command

npx skills add https://github.com/dkorduban/twilight-struggle-ai --skill speedup-analysis

Copy and paste this command into Claude Code to install the skill

Source

dkorduban/twilight-struggle-ai

Stars0

Forks0

UpdatedApril 4, 2026 at 07:30

SKILL.md

readonly

name	speedup-analysis
description	Analyze a slow process, identify bottleneck, write spec, delegate optimization to Codex. Triggers on: /speedup-analysis, speed this up, optimize performance, this is too slow, why is this slow.

Speedup Analysis — Opus Analyzes, Codex Implements

Per standing policy §8: when encountering a slow process, don't accept it — analyze and speed it up. Investment in speedups always pays off because experiments run many times.

Input

Either:

A description of what's slow: /speedup-analysis ISMCTS takes 3 min/game
A running process to profile: /speedup-analysis the current benchmark
A script path: /speedup-analysis scripts/run_ismcts_diagnostic.sh

Execution

Step 1: Measure current performance (Opus, 1-2 turns)

Profile the slow operation:

# Time a representative run
time <command> 2>&1 | tail -20

# Check resource utilization during run
nvidia-smi --query-gpu=utilization.gpu,memory.used --format=csv,noheader
top -bn1 | head -20

Record: wall time, GPU%, CPU%, memory usage, I/O.

Step 2: Identify bottleneck (Opus, 1-2 turns)

Read the code being profiled. Look for:

Unbatched NN inference — AXIOM: all NN inference must be batched. If single-sample forward passes exist, this is the #1 bottleneck. Fix: batch across available parallelism (games, determinizations, time steps).
Sequential loops over parallelizable work — games, determinizations, simulations running one-at-a-time when they could overlap.
CPU when GPU is idle — if GPU is at 0% and the task does NN inference, move to CUDA.
Single-threaded on multi-core — if CPU utilization is 1/N_cores, add threading or increase pool_size.
Python overhead in hot loop — if a tight loop calls Python per iteration, consider moving to C++ or batching.

Step 3: Write spec (Opus, 1 turn)

Write specs/<name>_speedup.md with:

Current performance (measured)
Bottleneck identified
Proposed fix (specific code changes)
Expected speedup (estimated)
Correctness verification plan (before/after comparison)

Step 4: Delegate to Codex (background)

Agent(
  subagent_type: "bg-codex-implementer",
  isolation: "worktree",
  run_in_background: true,
  prompt: "TASK_ID: speedup_<name>\nMODE: implement\nTASK: <spec content>"
)

Report immediately:

Speedup analysis complete:
  Bottleneck: <description>
  Expected improvement: <X>x
  Delegated to Codex in background worktree.
  Verify after: run same benchmark, compare results within expected variance.

Step 5: Verify correctness (after Codex completes)

CRITICAL: Run the same benchmark/test before and after. Compare:

Results must be bit-identical or within expected variance
A 10× speedup that silently changes results is worse than no speedup

Rules

Be proactive: don't wait for the user to complain. When you see a slow process during normal work, initiate this analysis yourself.
Don't trust "it's hard": historically, simple batching/parallelization patterns yielded 15-20× speedups that were initially dismissed as hard.
Codex in background is free: implementation cost is near-zero when done in a background worktree. The only cost is verification time.
Always verify correctness: optimization without verification is a regression risk.
Increase parallelism within one process first: two 50%-GPU processes thrash. One at 100% is always better. (Standing policy §3)

More from this repository

same repository

codex-mcp

dkorduban/twilight-struggle-ai

How to use OpenAI Codex via MCP tools for code generation, plan review, and collaborative implementation loops. Use when calling Codex for implementation, review, or any cross-model collaboration.

2026-04-040

dispatch

dkorduban/twilight-struggle-ai

Master task dispatcher. Simple tasks (≤2 files, ≤30 lines, clear spec) are done inline by the main agent. Complex tasks are routed to bg-codex-implementer as a background agent. Triggers on: /dispatch, implement this, fix this, debug this, dispatch task.

2026-04-040

bench

dkorduban/twilight-struggle-ai

Benchmark one or more model checkpoints (2000 games/side) with auto-logging. Triggers on: /bench, benchmark this model, run benchmark, bench checkpoint.

2026-04-040

train-and-bench

dkorduban/twilight-struggle-ai

Chain training → export → benchmark as one background pipeline. Triggers on: /train-and-bench, train and benchmark, run training pipeline, train this model.

2026-04-040

check-tasks

dkorduban/twilight-struggle-ai

Full system status: background tasks, running processes, GPU/CPU/memory utilization, lock files. Triggers on: /check-tasks, /status, task status, what's running, system status, check background.

2026-04-040

rules-batcher

dkorduban/twilight-struggle-ai

Batch 5-20 rules/card questions into one Haiku agent call that reads PDF + docs once and answers all. Replaces per-question rules-lawyer calls. Triggers on: /rules-batcher, batch rules questions, look up these cards, verify these rules.

2026-03-270

Source

dkorduban

dkorduban/twilight-struggle-ai

View GitHub Repository View Creator Repositories

Install command

Download

Run Skill in Manus

Useful forSOC

Software DevelopersComputer and Mathematical Occupations15-1252L4

name	speedup-analysis
description	Analyze a slow process, identify bottleneck, write spec, delegate optimization to Codex. Triggers on: /speedup-analysis, speed this up, optimize performance, this is too slow, why is this slow.

Speedup Analysis — Opus Analyzes, Codex Implements

Per standing policy §8: when encountering a slow process, don't accept it — analyze and speed it up. Investment in speedups always pays off because experiments run many times.

Input

Either:

A description of what's slow: /speedup-analysis ISMCTS takes 3 min/game
A running process to profile: /speedup-analysis the current benchmark
A script path: /speedup-analysis scripts/run_ismcts_diagnostic.sh

Execution

Step 1: Measure current performance (Opus, 1-2 turns)

Profile the slow operation:

# Time a representative run
time <command> 2>&1 | tail -20

# Check resource utilization during run
nvidia-smi --query-gpu=utilization.gpu,memory.used --format=csv,noheader
top -bn1 | head -20

Record: wall time, GPU%, CPU%, memory usage, I/O.

Step 2: Identify bottleneck (Opus, 1-2 turns)

Read the code being profiled. Look for:

Unbatched NN inference — AXIOM: all NN inference must be batched. If single-sample forward passes exist, this is the #1 bottleneck. Fix: batch across available parallelism (games, determinizations, time steps).
Sequential loops over parallelizable work — games, determinizations, simulations running one-at-a-time when they could overlap.
CPU when GPU is idle — if GPU is at 0% and the task does NN inference, move to CUDA.
Single-threaded on multi-core — if CPU utilization is 1/N_cores, add threading or increase pool_size.
Python overhead in hot loop — if a tight loop calls Python per iteration, consider moving to C++ or batching.

Step 3: Write spec (Opus, 1 turn)

Write specs/<name>_speedup.md with:

Current performance (measured)
Bottleneck identified
Proposed fix (specific code changes)
Expected speedup (estimated)
Correctness verification plan (before/after comparison)

Step 4: Delegate to Codex (background)

Agent(
  subagent_type: "bg-codex-implementer",
  isolation: "worktree",
  run_in_background: true,
  prompt: "TASK_ID: speedup_<name>\nMODE: implement\nTASK: <spec content>"
)

Report immediately:

Speedup analysis complete:
  Bottleneck: <description>
  Expected improvement: <X>x
  Delegated to Codex in background worktree.
  Verify after: run same benchmark, compare results within expected variance.

Step 5: Verify correctness (after Codex completes)

CRITICAL: Run the same benchmark/test before and after. Compare:

Results must be bit-identical or within expected variance
A 10× speedup that silently changes results is worse than no speedup

Rules

Be proactive: don't wait for the user to complain. When you see a slow process during normal work, initiate this analysis yourself.
Don't trust "it's hard": historically, simple batching/parallelization patterns yielded 15-20× speedups that were initially dismissed as hard.
Codex in background is free: implementation cost is near-zero when done in a background worktree. The only cost is verification time.
Always verify correctness: optimization without verification is a regression risk.
Increase parallelism within one process first: two 50%-GPU processes thrash. One at 100% is always better. (Standing policy §3)