Run any Skill in Manus with one click

$pwd:

benchmarkoor

Name: Benchmarkoor
Author: erigontech

// Run benchmarkoor performance benchmarks against a locally-built Erigon binary and produce per-test MGas/s comparison tables. Covers image build, dataset reset, run invocation, result parsing, and before/after comparisons.

Run Skill in Manus

$ git log --oneline --stat

stars:3,570

forks:1,520

updated:May 22, 2026 at 13:57

SKILL.md

readonly

related-skills.json

same repository

erigon-test-all.md

from "erigontech/erigon"

Run the full Erigon test suite locally using GOGC=80 make test-all. Use this before marking a PR ready for review. Equivalent to the "All tests" CI workflow.

2026-05-283.6k

erigon-test-race.md

from "erigontech/erigon"

Run Erigon tests with Go race detector to find data races and concurrency bugs. Use this for concurrency-sensitive changes (parallel executor, p2p, txpool). Takes 30-60 minutes.

2026-05-283.6k

erigon-ci.md

from "erigontech/erigon"

Run Erigon CI checks locally and/or trigger them remotely on a branch via GitHub Actions workflow_dispatch. Use this when you need to verify a branch passes all CI before or after pushing — especially for branches like bal-devnet-2 that don't auto-trigger on push/PR events.

2026-05-253.6k

erigon-implement-eip.md

from "erigontech/erigon"

Implement a new EIP for a hardfork under development in Erigon. Use when the user asks to implement, port, or wire up an EIP — covers spec lookup, dep analysis, prior-work check, implementation, lint, tests, and a wrap-up saved to `agentspecs/`.

2026-05-253.6k

erigon-ephemeral.md

from "erigontech/erigon"

Run an ephemeral Erigon instance with a temporary datadir. Use this whenever the user wants to spin up a temporary, throwaway, or sandboxed Erigon node for quick testing, launch a second Erigon instance alongside an existing one, clone a datadir into a temp copy for safe experimentation, or find and clean up leftover ephemeral datadirs and processes from previous sessions. Handles port conflict detection and automatic port offsetting. Trigger on any mention of temporary/throwaway/ephemeral/disposable Erigon instances, running erigon briefly for testing or debugging, starting a second/additional erigon node, or cleaning up old temp erigon data.

2026-05-223.6k

erigon-network-ports.md

from "erigontech/erigon"

Reference for all Erigon network ports. Use this when running multiple Erigon instances to avoid port conflicts. Lists every CLI flag that binds a port, its default value, and the protocol used.

2026-05-223.6k

package.json

"author": "erigontech"

"repository": "erigontech/erigon"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Software Quality Assurance Analysts and TestersComputer and Mathematical Occupations15-1253L4

name	benchmarkoor
description	Run benchmarkoor performance benchmarks against a locally-built Erigon binary and produce per-test MGas/s comparison tables. Covers image build, dataset reset, run invocation, result parsing, and before/after comparisons.
allowed-tools	Bash, Read, Write, Edit, Glob, Monitor

Benchmarkoor: per-test throughput benchmarks

Benchmarkoor (ethpandaops/benchmarkoor) drives an execution client through the Engine API and measures engine_newPayloadV<N> throughput in MGas/s per test (the V<N> payload version depends on the dataset's hardfork — V5 for Amsterdam, V4 for Prague, etc.). This skill teaches you to run it against a locally-built Erigon, parse results, and compare two runs.

The skill assumes the host already has the benchmarkoor binary, a config YAML, a snapshot/working datadir pair, and Docker installed — it focuses on the agent workflow, not first-time setup.

References

(All $VAR placeholders used below — $BENCH_DIR, $CONFIG, $INSTANCE_ID, etc. — are defined in the next section, "Adapt to the user's environment first".)

When something in this skill is ambiguous, or for newer/unfamiliar config options, consult these upstream sources before guessing:

Skylenet's BAL benchmark runs notes: https://notes.ethereum.org/@skylenet/bal-benchmark-runs — narrative writeup of how the BAL devnet benchmarks were set up; useful for the intent behind the suite (which precompiles, why 120M gas, etc.) and for the dataset preparation steps that aren't visible from the YAML alone.
Benchmarkoor docs: https://github.com/ethpandaops/benchmarkoor/tree/master/docs — authoritative reference for config schema, CLI flags, and the datadir-method matrix.
Example configs: https://github.com/ethpandaops/benchmarkoor/tree/master/examples/configuration — working YAML samples covering different clients, network configs, and datadir methods. When the user wants a config knob you haven't seen before, find the closest example here and adapt rather than guessing.

If you change anything in $CONFIG that isn't already in this skill, cross-check it against the docs or examples links above before running.

Adapt to the user's environment first

Before running anything, identify the host-specific values. Don't hard-code these — different hosts and different datasets use different names. Ask the user if anything below is ambiguous.

Placeholder	What it is	How to discover
`$BENCH_DIR`	benchmarkoor host root (contains the binary, config, snapshot dirs)	The binary usually isn't in `$PATH`. Find it with `find ~ /opt /srv -maxdepth 4 -name benchmarkoor -type f -executable 2>/dev/null` and take the parent dir, or ask the user. Common pattern: `~/<dataset-name>/`
`$CONFIG`	YAML config file passed to `benchmarkoor run --config <…>`	`ls $BENCH_DIR/*.yaml` — there may be more than one (one per network/dataset). Ask which to use.
`$INSTANCE_ID`	The `instances[].id` to benchmark	`grep -E '^\s*-\s+id:' $CONFIG`
`$IMAGE_TAG`	Docker image tag the Erigon instance references	Look at `instances[].image` for the chosen `$INSTANCE_ID` (e.g. `erigon-local:traced`)
`$SNAPSHOT_DIR`	Read-only pristine dataset; never mutated	Look at `client.datadirs.erigon.source_dir` in the config, or the `PRISTINE=...` line in the reset script
`$WORKING_DIR`	Writable copy that benchmarkoor mutates	The `--datadir` benchmarkoor's docker mount uses; see `HYBRID=...` in the reset script or the `source_dir` for `method: direct` datadirs
`$RESET_SCRIPT`	Script that re-syncs $SNAPSHOT_DIR → $WORKING_DIR	Typically `reset-hybrid.sh` or `reset-<dataset>.sh` in `$BENCH_DIR`. May not exist if the snapshot/working dirs are the same (then state simply persists across runs).

Concrete example for one host (perf-devnet-3):

$BENCH_DIR     = /home/erigon/perf-devnet-3-erigon-snapshot
$CONFIG        = benchmarkoor.interop.bal.yaml
$INSTANCE_ID   = erigon-bal-full
$IMAGE_TAG     = erigon-local:traced
$SNAPSHOT_DIR  = /home/erigon/perf-devnet-3-erigon-snapshot/erigon_snapshot
$WORKING_DIR   = /home/erigon/perf-devnet-3-erigon-snapshot/erigon_hybrid
$RESET_SCRIPT  = reset-hybrid.sh

Don't assume these names elsewhere — discover them per session.

Datadir setup approaches

Benchmarkoor's source tree (pkg/datadir/) exposes five client.datadirs.<client>.method values: copy, overlayfs, fuse-overlayfs, zfs, direct. Only the first four are listed in upstream's docs/configuration.md; direct is present in the code (pkg/datadir/direct.go) but documented in its source as "not suitable for normal benchmarking … intended for inspection / resume workflows." Choose based on host filesystem and how clean you need isolation between runs.

1. "Hybrid" (= `method: direct` + external reset script) — what we use today

Strictly speaking this is method: direct pointing at a pre-prepared writable copy (erigon_hybrid/) of a read-only pristine snapshot (erigon_snapshot/). A user-owned script (reset-hybrid.sh) rsyncs snapshot → working copy between runs, and the read-only snapshots/ subdir is bind-mounted from the pristine source to avoid duplicating immutable segment files.

Pros: works on any filesystem; no kernel/ZFS features needed; no per-test copy-up cost during execution; full reset is one rsync (~tens of seconds).
Cons: user-managed, not built into benchmarkoor; requires sudo for the reset (containers leave root-owned files in the working dir).
Use when: the host has no ZFS, and you want fast, repeatable, isolated runs without paying overlayfs copy-up at runtime.

Config side: method: direct pointing at $WORKING_DIR. Reset side: external script before each run.

2. `method: overlayfs` — doesn't work for our Erigon dataset

Native Linux overlayfs with $SNAPSHOT_DIR as the read-only lower layer and a /tmp/benchmarkoor-overlay-* upper. Mount itself is instant. The problem is that MDBX's open path touches enough chaindata pages during recovery/steady-sync that the kernel copies up substantial parts of the file to the upper layer, taking minutes. Benchmarkoor's RPC-readiness probe (observed at ~2 minutes; check pkg/runner/ for the current value) trips, and Erigon gets killed mid-open with Got interrupt, shutting down....

Past evidence in our results dir: runs 1779001673_b1eeb60b_… and 1779002095_ae096a8b_… (May 17) — both timed out around the Opening Database step.

Use when: working with clients whose datadirs don't trigger heavy copy-up (smaller / less-touched files). For Erigon with the current dataset size, skip it.
Possible workaround if you must: increase whatever timeout benchmarkoor exposes for RPC readiness; we didn't pursue this.

3. `method: zfs` — promising once the host has ZFS

ZFS snapshot + clone provides copy-on-write isolation that should avoid the overlayfs copy-up problem because COW is the native operation, not a degraded fallback.

Requires: $SNAPSHOT_DIR must live on a ZFS dataset; root or appropriate ZFS delegations.
What benchmarkoor does: snapshots the source dataset, clones the snapshot to a working dataset, mounts that into the container; cleans up the clone after the run.
Not yet tested here: the current host's root FS is ext4, so we never exercised it. When migrating to a ZFS host, this is the path to try first — it should subsume the "hybrid" workflow with no external reset script needed.

4. `method: direct` (raw) — not for benchmarking, and dangerous if pointed at the snapshot

Mounts source_dir directly into the container with no isolation. Whatever Erigon writes persists in source_dir. From benchmarkoor's own code comment: "not suitable for normal benchmarking … intended for inspection / resume workflows."

⚠️ If source_dir is the pristine snapshot, it will be irreversibly mutated. The snapshot is ~2 TB on this host, and re-downloading it takes many hours. There is no automatic backup. Double-check client.datadirs.erigon.source_dir in the YAML before running with method: direct — if it points at $SNAPSHOT_DIR, change it or abort.
Use only when: you specifically want to inspect / iterate on the chain state left behind by a prior run, or you're debugging.
The "hybrid" approach above uses method: direct correctly because it points at a disposable working copy ($WORKING_DIR), not the pristine snapshot. That's the safe pattern.

(Bonus) `method: copy` and `method: fuse-overlayfs`

Two more methods exist but weren't explored:

copy — parallel file copy of source_dir to a fresh working dir each run. Universal but slow for large datadirs.
fuse-overlayfs — userspace overlayfs; documented as ~3× slower than native overlayfs. Fallback when native isn't available.

Layout convention (typical)

$BENCH_DIR/                                            # benchmarkoor host root
├── benchmarkoor                                       # binary (root-owned, requires sudo)
├── $CONFIG                                            # main config (one of possibly several)
├── $RESET_SCRIPT                                      # resets datadir (sudo-only); may not exist
├── $SNAPSHOT_DIR/                                     # read-only pristine dataset (never touched)
├── $WORKING_DIR/                                      # writable copy that benchmarkoor mutates
└── results/runs/                                      # per-run output dirs
    ├── index.json                                     # generated run index
    └── <unix_ts>_<short_hash>_<instance-id>/          # one dir per run

# The Erigon clone (containing build/bin/erigon) typically lives elsewhere on
# the host — sibling of $BENCH_DIR or a separate workspace — and is referenced
# via cp/COPY in Step 1. Don't assume it's under $BENCH_DIR.

Prerequisites (verify before starting)

Erigon binary built at $BENCH_DIR/erigon/build/bin/erigon (or wherever the user's clone lives). If missing, build with make erigon from the erigon repo.
Docker image $IMAGE_TAG exists. Confirm with sudo -n docker images | grep <image-name>.
Benchmarkoor binary at $BENCH_DIR/benchmarkoor. Requires sudo -n to invoke (controls docker, cpuset, cpufreq).
Dataset snapshot at $SNAPSHOT_DIR (untouched) and $WORKING_DIR (working copy). $RESET_SCRIPT rsyncs the former into the latter.
No conflicting Erigon process using the benchmark datadir. Check with pgrep -af "build/bin/erigon" and stop any local node that has $WORKING_DIR as its --datadir (many setups have a stop.sh next to the datadir; otherwise pkill -f "datadir.*$WORKING_DIR").
No stale benchmarkoor containers — they don't always get cleaned up on aborted runs. If you see benchmarkoor-<oldhash>-* containers still up, run sudo -n ./benchmarkoor cleanup (or sudo -n docker rm -f <container>) before starting.

Workflow

Step 1 — Rebuild Docker image with the new binary

If you've changed Erigon source since the last image was built, you need a fresh image. A full Dockerfile build is slow; use a quick overlay instead:

mkdir -p /tmp/erigon-img-overlay
cp "$BENCH_DIR/erigon/build/bin/erigon" /tmp/erigon-img-overlay/erigon
# Unquoted EOF on purpose — $IMAGE_TAG must expand into the heredoc.
cat > /tmp/erigon-img-overlay/Dockerfile <<EOF
FROM $IMAGE_TAG
COPY --chown=erigon:erigon erigon /usr/local/bin/erigon
EOF
cd /tmp/erigon-img-overlay && sudo -n docker build -t "$IMAGE_TAG" .

This re-tags $IMAGE_TAG in under a second. The original Dockerfile multi-stage build is only needed if base layers (OS, deps) changed.

If you've never built the base image, fall back to:

cd "$BENCH_DIR/erigon" && sudo -n docker build -t "$IMAGE_TAG" .

Caveat: the stock Erigon Dockerfile may not reproduce the original $IMAGE_TAG's build flags (e.g. tracing builds use extra args). If the original image was built with non-default args, ask the user how it was first built before falling back.

Step 2 — Reset the working datadir

$RESET_SCRIPT (typically) rsyncs $SNAPSHOT_DIR/ → $WORKING_DIR/ (excluding the read-only snapshots/ bind mount). Requires sudo because containers leave root-owned files behind.

sudo -n bash "$BENCH_DIR/$RESET_SCRIPT"

Always run this before every benchmark run. Skipping it means leftover state from the previous run pollutes results.

If no reset script exists for this dataset, the user has chosen a setup where state persists across runs — in that case ask whether they want a manual rsync from $SNAPSHOT_DIR to $WORKING_DIR before starting, or to deliberately run on the prior state.

Step 3 — Inspect/edit the config

Open $BENCH_DIR/$CONFIG. Don't construct YAML from scratch — read the existing file and edit it in place; the snippet below shows the knobs you'll likely touch, not the full schema. (Full schema in the upstream examples/configuration/ reference.) Key knobs:

runner:
  benchmark:
    tests:
      # The filter regex picks which tests run. Edit alternations to add/remove
      # tests; edit the size suffix (e.g. 120M / 60M) to change gas budgets per test.
      filter: 'regex:__test_(<test_name_1>|<test_name_2>|...)\[.*benchmark_<size>M\]'

  client:
    config:
      resource_limits:
        # Prefer explicit `cpuset:` over `cpuset_count:` for reproducibility.
        # Pin to exactly 6 distinct physical cores to match ethpandaops's upstream
        # reference runs (which also use 6) so results are comparable.
        # See "CPU pinning" notes below for how to pick the actual ids per host.
        cpuset: [<6 logical CPU ids, one per distinct physical core>]
        # cpuset_count: 6         # alternative: random 6 CPUs each run — adds variance
        cpu_freq: "3600MHz"
        cpu_turboboost: false
        cpu_freq_governor: performance
        memory: "32g"

instances:
  - id: <instance-id>
    client: erigon
    image: <image-tag>           # must match the image tag from Step 1
    pull_policy: never            # critical — local image only
    extra_args:
      # fork overrides for the snapshot's chain state — values are dataset-specific

CPU pinning

cpuset: (explicit logical-CPU list) and cpuset_count: (random N CPUs each run) are mutually exclusive. Prefer cpuset: — it's deterministic across runs and lets you pick topology-aware values. cpuset_count picks N logical CPUs at random each run, which on SMT hosts produces different physical-core counts each time (the random selection often double-books some physical cores via their SMT siblings), baking noise into A/B comparisons.

Pin to exactly 6 logical CPUs, one per distinct physical core, avoiding SMT sibling pairs. The "6" matches what ethpandaops's upstream reference runs use (e.g. cpuset: [6,7,8,9,10,11] at https://benchmarkoor.core.ethpandaops.io/runs/), so any A/B against published reference numbers is core-count-comparable. If the host has more than 6 physical cores, leave the extras unpinned so the docker daemon, benchmarkoor itself, and the host kernel don't compete with the bench workload. If the host has fewer than 6 physical cores, that's a deeper problem — note it and ask the user.

Discover the host's topology:

lscpu | grep -E "^CPU|^Thread|^Core|^Socket|Model name"
for c in $(seq 0 $(($(nproc)-1))); do
  printf 'cpu%s: core=%s siblings=%s\n' \
    "$c" \
    "$(cat /sys/devices/system/cpu/cpu$c/topology/core_id)" \
    "$(cat /sys/devices/system/cpu/cpu$c/topology/thread_siblings_list)"
done

Read off 6 logical CPUs whose core_ids are distinct (i.e. skip SMT siblings). On a typical Linux topology where logical CPUs 0..N-1 are physical and N..2N-1 are SMT siblings of cores 0..N-1, pick any 6 from the lower half.

The literal cpuset numbers in the reference run (6,7,8,9,10,11) are specific to that host — don't copy them; replicate the intent (deterministic + physical-only + count=6).

Sanity-check the filter

To know how many tests will actually run, dry-run the filter against the extracted test fixtures. Strip the regex: prefix from the YAML filter value and feed the rest to grep -E:

# e.g. for filter 'regex:__test_(blake2f_benchmark|ecrecover)\[.*benchmark_120M\]'
ls "$BENCH_DIR"/.cache/opcode-archive-extract-*/eest_bal/testing/ 2>/dev/null \
  | grep -cE '__test_(blake2f_benchmark|ecrecover)\[.*benchmark_120M\]'

Step 4 — Run benchmarkoor

cd "$BENCH_DIR" && sudo -n ./benchmarkoor run \
  --config "$CONFIG" \
  --limit-instance-id "$INSTANCE_ID" \
  2>&1 | tee /tmp/benchmarkoor.log

Notes:

benchmarkoor run --help shows both --limit-instance-id (specific instance ids; what we use) and --limit-instance-client (any instance for a given client name). They coexist; pick the one matching how you've keyed your instances. Without either flag, benchmarkoor runs every instance in the config.
Each test runs as its own freshly-recreated container; the suite wall-clock scales linearly with <number of tests matched by the filter>. Pre-test orchestration (container start, gas-bump, funding) dominates over the actual test on this setup, so expect tens of seconds per test even for tiny test payloads.
Run in background and watch progress: either tail+grep tail -F /tmp/benchmarkoor.log | grep -E "index=[0-9]+/|Error|FAIL|panic", or use the Monitor tool with the same filter to get notifications.
While running, sudo -n docker ps shows the active container (benchmarkoor-<runid>-<instance>-<index>).

Step 5 — Locate results

ls -t "$BENCH_DIR"/results/runs/ | head -3

Most recent dir matches <unix_timestamp>_<short_hash>_<instance-id>/. Inside, each test has its own dir:

$BENCH_DIR/results/runs/<run-id>/
├── config.json                                        # snapshot of the YAML
├── result.json                                        # aggregated run-level stats
├── benchmarkoor.log
├── test_<name>.py__test_<func>[<params>].txt/
│   ├── setup.result-aggregated.json
│   ├── setup.result-details.json
│   ├── test.result-aggregated.json                    # ← per-test MGas/s lives here
│   └── test.result-details.json
└── ...                                                # one dir per test that ran

To confirm the run completed all matched tests, check result.json's tests_total / tests_passed (or grep Run result written ... tests_count=<N> in benchmarkoor.log).

The MGas/s value for each test is at:

.method_stats.mgas_s.engine_newPayloadV<N>.last

The V<N> suffix depends on the dataset's hardfork (V5 for Amsterdam, V4 for Prague, V3 for Cancun, etc. — matching line at top of skill). Don't guess the key — inspect one test.result-aggregated.json first, e.g.:

jq -r '.method_stats.mgas_s | keys[]' \
  "$BENCH_DIR"/results/runs/<run-id>/test_*/test.result-aggregated.json | sort -u | head

A .last value (instead of .mean) is fine because each test runs exactly one such call against its payload. If you're A/B-comparing runs across hardforks, the keys differ — the comparator below will show n/a for any mismatched key.

Step 6 — Build a comparison table

Use a Python one-liner that reads two run dirs and produces a per-test speedup table sorted by ratio. The script auto-handles different test counts and naming patterns:

# Substitute <bench-dir>, <old-run-dirname>, <new-run-dirname> below before running.
# Python does NOT expand shell variables; use literal paths or os.environ.
import json, glob, os, re

RUNS_DIR  = '<bench-dir>/results/runs'   # or: os.environ['BENCH_DIR'] + '/results/runs'
before_id = '<old-run-dirname>'
after_id  = '<new-run-dirname>'

def shorten(name):
    m = re.search(r'test_(\w+)\.py__(test_\w+)\[(.+)\]', name)
    if not m: return name
    tname, params = m.group(2), m.group(3)
    parts = [tname]
    for label, pat in [('', r'opcode_(\w+)'), ('mod', r'mod_bits_(\d+)'),
                       ('rounds', r'rounds_(\d+)'), ('', r'benchmark_(\d+M)')]:
        mm = re.search(pat, params)
        if mm: parts.append(f'{label}{mm.group(1)}')
    return '-'.join(parts)

def mgas(run_id):
    """Read MGas/s from each test in a run. Auto-detects the engine_newPayloadV<N> key."""
    out = {}
    for f in sorted(glob.glob(os.path.join(RUNS_DIR, run_id, 'test_*/test.result-aggregated.json'))):
        with open(f) as fp: d = json.load(fp)
        name = os.path.basename(os.path.dirname(f)).removesuffix('.txt')
        stats = d.get('method_stats', {}).get('mgas_s', {})
        # Pick the first engine_newPayloadV* entry — same hardfork ⇒ same key across tests.
        key = next((k for k in stats if k.startswith('engine_newPayloadV')), None)
        if key and 'last' in stats[key]:
            out[name] = stats[key]['last']
    return out

b, a = mgas(before_id), mgas(after_id)
rows = [(shorten(n), b.get(n), a.get(n)) for n in sorted(set(b)|set(a))]
rows = [(n, bv, av,
         (av / bv) if (bv is not None and av is not None and bv > 0) else None)
        for n, bv, av in rows]
rows.sort(key=lambda r: r[3] if r[3] is not None else 0, reverse=True)

def fmt_num(x): return f'{x:>14.1f}' if x is not None else f'{"n/a":>14}'
def fmt_sp(s):  return f'{s:>8.2f}x' if s is not None else f'{"n/a":>9}'

print(f'{"Test":<55} {"Before":>14} {"After":>14} {"Speedup":>9}')
for n, bv, av, sp in rows:
    print(f'{n:<55} {fmt_num(bv)} {fmt_num(av)} {fmt_sp(sp)}')
if b and a:
    print(f'\navg: {sum(b.values())/len(b):.1f} → {sum(a.values())/len(a):.1f} ({sum(a.values())/sum(b.values()):.2f}x)')

Render as a markdown table for inclusion in the response; write to /tmp/benchmark_comparison.md if the user wants to copy-paste. Important: the average is over the test set actually present in both runs. If the filter changed between runs, the averages aren't directly comparable — call that out explicitly.

Failure modes & gotchas

Symptom	Cause	Fix
`Failed to set turbo boost` warning	CPU governor not user-controllable	Harmless; ignore.
`HEAD failed; using cached file`	GitHub Actions artifact HEAD requires auth	Harmless if cache is present at `$BENCH_DIR/.cache/`.
`Container stopped for recreate` count = N–1 (not N) at end	Last container's "stopped" log fires after the suite completion log	Verify with `Run result written ... tests_count=<N>` in the log.
Big variance between identical runs	CPU governor not pinned, or other heavy workload	Always set `cpu_freq_governor: performance`; don't run other CPU-heavy tasks (full make test-all, syncing nodes) simultaneously.
Image-tag mismatch (the right `IMAGE_TAG` not used)	Docker cached an older layer	Rebuild the image (Step 1) explicitly; confirm `docker images` shows a recent `CREATED` time.
Old `benchmarkoor-<hash>-*` container lingering	Previous run aborted before cleanup	`sudo -n docker rm -f <container>` + `sudo -n ./benchmarkoor cleanup`.
Run "completes" instantly with 0 tests	Wrong cwd, or filter regex matches 0 tests	Confirm cwd is `$BENCH_DIR`; sanity-check the filter against `$BENCH_DIR/.cache/opcode-archive-extract-/eest_bal/testing/.txt`.

A few "what to remember" rules

Discover the host-specific names per session. Don't hard-code erigon_snapshot/erigon_hybrid/benchmarkoor.interop.bal.yaml/erigon-local:traced/erigon-bal-full — they vary between datasets and hosts.
Always run the reset script (if it exists) before each run. State leakage between runs is real and silently skews numbers.
Always pass --limit-instance-id when comparing just one client — otherwise the run also exercises every other client in the config (geth/besu/reth/nethermind/…) which adds tens of minutes and clutters results/runs/.
The image tag in the YAML must match pull_policy: never — benchmarkoor will refuse to pull, so a missing local image fails immediately rather than silently downloading a stale upstream one.
results/runs/index.json is regenerated by generate-index-file; don't hand-edit. After a successful run it's auto-generated when generate_results_index: true in the config. If it's stale or missing (e.g. an aborted previous run, or comparing against an older directory the index doesn't list), regenerate explicitly: sudo -n ./benchmarkoor generate-index-file --config "$CONFIG".
A run dir is keyed by suite-hash (under results/suites/). Two runs with the same filter regex have the same suite hash, so comparing them is direct. A different filter ⇒ different suite hash ⇒ different test set, and shortened-name matching is the only sane cross-suite comparison — flag this to the user.
For PR-style before/after testing: stash the change, rebuild image (Step 1), reset working dir (Step 2), run baseline; unstash, rebuild image, reset, run again. Compare the two newest run dirs. Don't compare a fresh run against an old result captured before unrelated config changes — too many variables.

benchmarkoor

More from this repository

More from this repository

Benchmarkoor: per-test throughput benchmarks

References

Adapt to the user's environment first

Datadir setup approaches

1. "Hybrid" (= method: direct + external reset script) — what we use today

2. method: overlayfs — doesn't work for our Erigon dataset

3. method: zfs — promising once the host has ZFS

4. method: direct (raw) — not for benchmarking, and dangerous if pointed at the snapshot

(Bonus) method: copy and method: fuse-overlayfs

Layout convention (typical)

Prerequisites (verify before starting)

Workflow

Step 1 — Rebuild Docker image with the new binary

Step 2 — Reset the working datadir

Step 3 — Inspect/edit the config

CPU pinning

Sanity-check the filter

Step 4 — Run benchmarkoor

Step 5 — Locate results

Step 6 — Build a comparison table

Failure modes & gotchas

A few "what to remember" rules

Benchmarkoor: per-test throughput benchmarks

References

Adapt to the user's environment first

Datadir setup approaches

1. "Hybrid" (= method: direct + external reset script) — what we use today

2. method: overlayfs — doesn't work for our Erigon dataset

3. method: zfs — promising once the host has ZFS

4. method: direct (raw) — not for benchmarking, and dangerous if pointed at the snapshot

(Bonus) method: copy and method: fuse-overlayfs

Layout convention (typical)

Prerequisites (verify before starting)

Workflow

Step 1 — Rebuild Docker image with the new binary

Step 2 — Reset the working datadir

Step 3 — Inspect/edit the config

CPU pinning

Sanity-check the filter

Step 4 — Run benchmarkoor

Step 5 — Locate results

Step 6 — Build a comparison table

Failure modes & gotchas

A few "what to remember" rules

1. "Hybrid" (= `method: direct` + external reset script) — what we use today

2. `method: overlayfs` — doesn't work for our Erigon dataset

3. `method: zfs` — promising once the host has ZFS

4. `method: direct` (raw) — not for benchmarking, and dangerous if pointed at the snapshot

(Bonus) `method: copy` and `method: fuse-overlayfs`

1. "Hybrid" (= `method: direct` + external reset script) — what we use today

2. `method: overlayfs` — doesn't work for our Erigon dataset

3. `method: zfs` — promising once the host has ZFS

4. `method: direct` (raw) — not for benchmarking, and dangerous if pointed at the snapshot

(Bonus) `method: copy` and `method: fuse-overlayfs`