Run any Skill in Manus with one click

$pwd:

sp1-benchmark

Name: Sp1 Benchmark
Author: erigontech

// Use when the user asks to "benchmark", "run benchmark", "compare performance", "profile blocks", "cycle count comparison", or discusses SP1 prover performance testing. Runs batch block execution benchmarks and compares C++ vs Rust (or other) provers.

Run Skill in Manus

$ git log --oneline --stat

stars:12

forks:5

updated:March 24, 2026 at 23:13

SKILL.md

readonly

related-skills.json

same repository

execute-blocks.md

from "erigontech/zilkworm"

Use to execute Ethereum mainnet blocks from a local directory (dry run). Use it to execute downloaded blocks without proving.

2026-03-2412

fetch-blocks.md

from "erigontech/zilkworm"

Use to fetch Ethereum mainnet blocks from an RPC URL and save them on a local directory. Use it to acquire blocks for later execution/proving.

2026-03-2412

package.json

"author": "erigontech"

"repository": "erigontech/zilkworm"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Software DevelopersComputer and Mathematical Occupations15-1252L4

name	sp1-benchmark
description	Use when the user asks to "benchmark", "run benchmark", "compare performance", "profile blocks", "cycle count comparison", or discusses SP1 prover performance testing. Runs batch block execution benchmarks and compares C++ vs Rust (or other) provers.

SP1 Benchmark Skill

Run batch SP1 prover benchmarks, compare C++ vs Rust performance, and report results.

Overview

This skill benchmarks the z6m SP1 prover by executing blocks from witness data and measuring cycle counts, gas used, prover gas, and syscall counts. It supports comparing two prover builds (e.g., C++ vs Rust, or before/after an optimization).

Step 1: Resolve Parameters

Block range

If the user specifies a range (e.g., "benchmark 500 blocks"), use that count starting from the first available block.
If the user specifies specific blocks (e.g., "24490786 to 24491786"), use that range.
Default: 200 blocks starting from the first available block in the data directory.

Data directory

If the user specifies a directory, use it.
Otherwise, check memory files for recently used directories.
The --data-dir flag should point to the parent of the blocks/ directory (e.g., /mnt/my_drive/witness_blocks, not .../witness_blocks/blocks/).
Discover available block range: ls <data-dir>/blocks/ | sort -n | head -1 and ... | tail -1 and ... | wc -l.

Comparison target

If the user asks to compare (e.g., "vs Rust", "vs main branch"), set up a comparison.
For Rust comparison: check if /tmp/main_500_worktree/prover/target/release/z6m_prover exists. If not, create an independent worktree of the main branch and build the Rust prover there via a background subagent.
For before/after comparison: create an independent worktree of the current commit, run the "before" benchmark there.

Output directory

Create a timestamped output directory: ./temp/benchmarks/<YYYY-MM-DD_HHMMSS>/
Generate the timestamp at the start of the run (before launching benchmarks).
All output files for this run go into this directory.

Step 2: Build (if needed)

If the prover binary doesn't exist or is stale, build it:

make z6m_guest_cpp && make z6m_prover_cpp

For comparison targets in worktrees, launch a background subagent (ALWAYS background, NEVER foreground) to build in the worktree.

Step 3: Run Benchmarks

CRITICAL: Always use background subagents or background Bash commands

NEVER run benchmarks in the foreground. Benchmarks take minutes to hours. Always use:

run_in_background: true

Prover invocation

Single block:

<prover_path> execute --block-number <N> --data-dir <data_dir>

Batch benchmark:

<prover_path> --test-service \
  --start-block <START> --end-block <END> \
  --data-dir <data_dir> \
  --execution-log-file <output_log>

Use absolute paths for the prover binary. The binary location depends on the worktree:

Main: /workspace/prover/target/release/z6m_prover
Worktree: <worktree_path>/prover/target/release/z6m_prover

Git info

At the start of every run, capture commit and branch dynamically:

git log --oneline -1   # e.g., "b9ed9a35b Add memmove forwarding..."
git branch --show-current  # e.g., "chfast/memmove"

NEVER hardcode or guess commit/branch — always query git.

Memory-aware parallel execution

Each z6m_prover instance uses ~8 GB of RAM. ALWAYS run in parallel when memory allows.

Step 3a: Assess available memory

awk '/MemAvailable/ {printf "%.0f\n", $2/1024/1024}' /proc/meminfo

max_parallel = floor((available_gb - 4) / 8) — reserve 4 GB for OS
Minimum: 1, maximum: cap at 8

Step 3b: Split block range into chunks

Minimum chunk size: 25 blocks
actual_parallel = min(max_parallel, floor(total_blocks / 25))
actual_parallel = max(actual_parallel, 1)

Each chunk gets its own log: execution_chunk_N.log.

Step 3c: Staggered launch

Launch ALL chunks as parallel background Bash commands in a single message, each with a built-in sleep delay:

# Chunk N (delay = (N-1) * 30 seconds):
sleep <delay> && <prover_path> --test-service \
  --start-block <CHUNK_START> --end-block <CHUNK_END> \
  --data-dir <data_dir> \
  --execution-log-file <output_dir>/execution_chunk_<N>.log

Stagger is 30 seconds between chunks.

Step 3d: Wait for all chunks to complete

All background commands send completion notifications. Wait for ALL before analysis.

Step 3e: Merge logs

cat <output_dir>/execution_chunk_*.log | sort -t' ' -k2 > <output_dir>/execution.log

Log file locations

Single run (parallel): execution_chunk_1.log ... execution_chunk_N.log → merged into execution.log
Single run (serial, max_parallel=1): execution.log directly
Comparison run: execution_cpp.log / execution_rust.log (each side gets its own chunks if parallel)

Comparison parallel execution

When comparing two provers, apply chunking to BOTH. Memory budget:

max_parallel_per_prover = floor((available_gb - 4) / (8 * 2))
Minimum 1 per prover.

Log format

Each line in the execution log:

[<timestamp>] block <N> executed, gas_used=<gas>, cycle_count=<cycles>, prover_gas=<pgas>, syscall_count=<sc>, input=<path>

Step 4: Analyze Comparison Results

Once both benchmarks complete, parse and compare using inline Python:

import re, statistics

def parse_log(path):
    blocks = {}
    with open(path) as f:
        for line in f:
            m = re.search(r'block (\d+) executed, gas_used=(\d+), cycle_count=(\d+), prover_gas=(\d+), syscall_count=(\d+)', line)
            if m:
                blocks[int(m.group(1))] = {
                    'cycles': int(m.group(3)),
                    'gas': int(m.group(2)),
                    'prover_gas': int(m.group(4)),
                    'syscall_count': int(m.group(5)),
                }
    return blocks

Required comparison output metrics

Report a summary table with:

Metric	Prover A	Prover B	Delta
Blocks tested	N	N	-
Avg cycles	X	Y	+/-Z%
Avg prover gas	X	Y	+/-Z%
Avg cycles/gas	X	Y	+/-Z%
Avg prover_gas/gas	X	Y	+/-Z%
Total cycles	X	Y	savings
Total prover gas	X	Y	savings
Gas mismatches	N	-	-
A faster	N/total	-	pct
B faster	N/total	-	pct

Percentile distribution (delta between A and B)

Percentile	cycles delta	prover_gas delta	cycles/gas delta	prover_gas/gas delta
p5	X%	X%	X%	X%
p10	X%	X%	X%	X%
p25	X%	X%	X%	X%
p50 (median)	X%	X%	X%	X%
p75	X%	X%	X%	X%
p90	X%	X%	X%	X%
p95	X%	X%	X%	X%
p99	X%	X%	X%	X%

Gas correctness

Report the number of gas mismatches (blocks where gas_used differs between the two provers). This MUST be zero for a valid comparison.

Biggest wins/losses

Report the block numbers with the largest improvement and largest regression for both cycles and prover_gas.

Step 4a: Single Run Analysis

When running a single prover benchmark (no comparison target), produce a full standalone report.

Per-block table

Block	gas_used	cycle_count	prover_gas	syscall_count	cycles/gas	prover_gas/gas
N	X	Y	Z	S	Y/X	Z/X

For runs with more than 50 blocks, show only the top 10 most expensive blocks (by cycle count) and bottom 10, plus a note about the full table being in the log.

Summary statistics

Metric	Avg	Median	Min	Max	Total
cycle_count	X	X	X	X	X
prover_gas	X	X	X	X	X
gas_used	X	X	X	X	X
syscall_count	X	X	X	X	-
cycles/gas	X	X	X	X	total_cycles/total_gas
prover_gas/gas	X	X	X	X	total_pgas/total_gas

Percentile distribution

Percentile	cycle_count	prover_gas	cycles/gas	prover_gas/gas
p5	X	X	X	X
p10	X	X	X	X
p25	X	X	X	X
p50	X	X	X	X
p75	X	X	X	X
p90	X	X	X	X
p95	X	X	X	X
p99	X	X	X	X

Step 5: Persist Results

Output directory structure

All results are saved to ./temp/benchmarks/<YYYY-MM-DD_HHMMSS>/ (relative to project root /workspace).

For a single run:

./temp/benchmarks/2026-03-06_143022/
  summary.md          # Self-contained human-readable report
  execution.log       # Raw prover output log

For a comparison run:

./temp/benchmarks/2026-03-06_143022/
  summary.md          # Self-contained comparison report
  execution_a.log     # Prover A raw log (e.g., execution_cpp.log)
  execution_b.log     # Prover B raw log (e.g., execution_rust.log)

summary.md format

The summary.md must be a self-contained report that includes:

Header: Date, block range, prover binary path(s), data directory, git commit hash(es)
Results: All tables from Step 4 or Step 4a (depending on run type)
Configuration: Build flags, branch name, any relevant environment details
Raw log paths: Absolute paths to the execution log files in the same directory

Example header for summary.md:

# SP1 Benchmark Results — 2026-03-06 14:30:22

- **Type**: Comparison (C++ vs Rust)
- **Block range**: 24490786 - 24491786 (983 blocks)
- **Data dir**: /mnt/data/witness_blocks
- **Prover A**: /tmp/feature_worktree/prover/target/release/z6m_prover (commit abc1234, branch som/remove-rust)
- **Prover B**: /tmp/main_500_worktree/prover/target/release/z6m_prover (commit def5678, branch main)
- **Logs**: execution_cpp.log, execution_rust.log

## Results
...

Memory updates

If results are significant (new baseline, regression detected, etc.), also update MEMORY.md with a one-line summary pointing to the benchmark directory.
Always note the benchmark directory path in the response to the user.

Reference: Recent Benchmark Results

1000-block benchmark (2026-03-06)

Range: 24490786-24491786 (983 blocks executed)
C++ avg: 175,421,331 cycles, Rust avg: 192,962,364 cycles
C++ faster on 983/983 blocks (100%)
Median improvement: -8.9% (C++ faster)
Distribution: p5=-10.7%, p25=-9.5%, p50=-8.9%, p75=-8.5%, p95=-8.0%
Zero gas mismatches
Logs: /tmp/cpp_1000_benchmark.log, /tmp/rust_1000_benchmark.log

Reference: Known Prover Locations

C++ (current branch): prover/target/release/z6m_prover

sp1-benchmark

More from this repository

More from this repository

SP1 Benchmark Skill

Overview

Step 1: Resolve Parameters

Block range

Data directory

Comparison target

Output directory

Step 2: Build (if needed)

Step 3: Run Benchmarks

CRITICAL: Always use background subagents or background Bash commands

Prover invocation

Git info

Memory-aware parallel execution

Step 3a: Assess available memory

Step 3b: Split block range into chunks

Step 3c: Staggered launch

Step 3d: Wait for all chunks to complete

Step 3e: Merge logs

Log file locations

Comparison parallel execution

Log format

Step 4: Analyze Comparison Results

Required comparison output metrics

Percentile distribution (delta between A and B)

Gas correctness

Biggest wins/losses

Step 4a: Single Run Analysis

Per-block table

Summary statistics

Percentile distribution

Step 5: Persist Results

Output directory structure

summary.md format

Memory updates

Reference: Recent Benchmark Results

1000-block benchmark (2026-03-06)

Reference: Known Prover Locations

SP1 Benchmark Skill

Overview

Step 1: Resolve Parameters

Block range

Data directory

Comparison target

Output directory

Step 2: Build (if needed)

Step 3: Run Benchmarks

CRITICAL: Always use background subagents or background Bash commands

Prover invocation

Git info

Memory-aware parallel execution

Step 3a: Assess available memory

Step 3b: Split block range into chunks

Step 3c: Staggered launch

Step 3d: Wait for all chunks to complete

Step 3e: Merge logs

Log file locations

Comparison parallel execution

Log format

Step 4: Analyze Comparison Results

Required comparison output metrics

Percentile distribution (delta between A and B)

Gas correctness

Biggest wins/losses

Step 4a: Single Run Analysis

Per-block table

Summary statistics

Percentile distribution

Step 5: Persist Results

Output directory structure

summary.md format

Memory updates

Reference: Recent Benchmark Results

1000-block benchmark (2026-03-06)

Reference: Known Prover Locations