Ejecuta cualquier Skill en Manus
con un clic

Ejecuta cualquier Skill en Manus con un clic

$pwd:

optimize-model

Name: Optimize Model
Author: ethpandaops

// Deeply optimize a Xatu CBT transformation model query using live ClickHouse evidence. Use when the user wants performance analysis (not code edits) for `models/transformations/*.sql`, including dependency rendering, schema introspection, benchmark runs, and high-impact recommendations.

Ejecutar en Manus

$ git log --oneline --stat

stars:0

forks:1

updated:27 de febrero de 2026, 07:15

Explorador de archivos

13 archivos

SKILL.md

readonly

related-skills.json

mismo repositorio

optimize-model.md

from "ethpandaops/xatu-cbt"

Deeply optimize a Xatu CBT transformation model query using live ClickHouse evidence. Use when performance analysis is requested for `models/transformations/*.sql`, including dependency rendering, schema introspection (`SHOW CREATE` with `_local` table checks), cache-aware benchmarking, and high-impact recommendations without editing model files.

2026-02-270

package.json

"author": "ethpandaops"

"repository": "ethpandaops/xatu-cbt"

Abrir repositorio de GitHub Ver repositorios del creador

$ install --global

$ download --local

Ejecutar en Manus

$ useful --forSOC

Científicos de datosOcupaciones informáticas y matemáticas15-2051L4

name	optimize-model
description	Deeply optimize a Xatu CBT transformation model query using live ClickHouse evidence. Use when the user wants performance analysis (not code edits) for `models/transformations/*.sql`, including dependency rendering, schema introspection, benchmark runs, and high-impact recommendations.
compatibility	Requires Bash 3.2+, Python 3, jq, HTTP access to ClickHouse on port 8123, and internet access for web research.
argument-hint	<model_path_or_name>

Optimize CBT Transformation Model (No Model Edits)

Analyze transformation model $ARGUMENTS and report high-impact ClickHouse performance improvements.

Never modify the model file. Only report findings and recommendations.

Defaults To Confirm First

Before any analysis, collect and confirm (allow overrides):

External models cluster:
- endpoint: http://chendpoint-xatu-clickhouse.analytics.production.ethpandaops:8123
- default_database: default
- username: empty
- password: empty
Transformation models cluster:
- endpoint: http://chendpoint-xatu-cbt-clickhouse.analytics.production.ethpandaops:8123
- default_database: mainnet
- username: empty
- password: empty
- access_to_external_cluster: cluster('{remote_cluster}', database.table_name)
- Cluster substitution rule: when template uses cluster(...), external dependencies are resolved to <table>_local automatically.

If any value is missing or ambiguous, ask follow-up questions before running benchmarks.

Inputs

Target transformation model (example): models/transformations/fct_block.sql
Optional benchmark window overrides.
Optional period mode:
- sane (default): use frontmatter interval with protective caps to avoid timeouts.
- daring: expand beyond sane range for deeper testing.

Workflow

Resolve target model path

Accept full path or model name.
Normalize to models/transformations/<name>.sql.
Stop with a clear error if not found.

Run one-shot preparation via wrapper (no flag discovery)

Run:

SESSION_ID="$(date +%s)-$$"
PREP_OUTPUT="/tmp/optimize-model.${SESSION_ID}.prepare.json"

SESSION_ID="$SESSION_ID" PREP_OUTPUT="$PREP_OUTPUT" \
  .claude/skills/optimize-model/scripts/run_prepare.sh "$ARGUMENTS"

This command generates session-isolated /tmp artifacts.
Use summary, period, unresolved_fragments, and summary.introspection_ok from $PREP_OUTPUT.
Confirm period.bounds_start/period.bounds_end are sane before benchmarking.
If unresolved fragments remain, ask targeted follow-up questions and rerun.

Introspect dependency schemas

This is already executed by prep wrapper.
Always use SHOW CREATE TABLE for dependencies.
If a dependency is Distributed, also inspect its target table (often _local) and reason from local engine/order/partition keys.
Treat _local table ORDER BY as the effective primary key when PRIMARY KEY is not explicitly defined.
For any cluster(...) table access, always target _local tables instead of Distributed table names.

Early version-aware research pass (recommended before benchmarking)

Read ClickHouse versions from prep output first:
- jq '.clickhouse_versions' "$PREP_OUTPUT"
Use web search (explicitly) to review docs and changelog/release notes for the detected version before designing candidate rewrites.
Prefer doing this before baseline/candidate benchmarking; if a quick baseline run already happened, continue and complete research before final recommendations.
Use official ClickHouse docs first, then release/changelog pages relevant to the detected version and newer versions.
Include version-aware search terms such as:
- clickhouse <major.minor> release notes
- clickhouse <major.minor> query optimization
- clickhouse replacingmergetree final performance
Use references:
- references/clickhouse-playbook.md
- references/dependency-resolution.md
- references/research-sources.md

Benchmark rendered query (baseline, fast sample first)

Run via wrapper:

SESSION_ID="$SESSION_ID" \
  .claude/skills/optimize-model/scripts/run_benchmark.sh "$PREP_OUTPUT"

Wrapper output includes BENCH_OUTPUT=... path.
Do not manually widen bounds unless user opts into PERIOD_MODE=daring or provides explicit bounds.
The benchmark must disable query caches where supported.
Note cache caveats (filesystem/page cache may still influence timings).
Use explain output and query_log metrics for evidence.

Validate across multiple windows (required before recommendation)

Use at least 2 windows:
- sane window from frontmatter interval (default).
- One additional window (daring or explicit custom bounds).
Prefer 3 windows when runtime is acceptable (short/sane/daring) to reduce overfitting.
For each window:
- Produce window-specific baseline SQL via prep.
- Benchmark baseline.
- Benchmark each candidate rewrite derived from that same window-specific SQL via:

SESSION_ID="$SESSION_ID" WINDOW_LABEL="sane" CANDIDATE_LABEL="argmax" \
  .claude/skills/optimize-model/scripts/run_candidate_benchmark.sh "$PREP_OUTPUT" "/tmp/candidate.sql"

Run hash correctness checks baseline vs candidate for that same window.
It is acceptable to keep candidates with negative impact in the comparison table; mark them as rejected.

Hash correctness gate (required for every rewritten query)

For each candidate optimization SQL file, run:

SESSION_ID="$SESSION_ID" \
  .claude/skills/optimize-model/scripts/run_hash_check.sh "$PREP_OUTPUT" "/tmp/candidate.sql"

Only treat a candidate as valid when hashes_match = true.
If hash check fails, reject that candidate regardless of speed gains.
Wrapper exit semantics:
- Hash mismatches are reported via HASH_STATUS=mismatch and HASH_MATCH=false (non-fatal by default).
- Set HASH_MISMATCH_FAIL=1 to make mismatches return exit code 3.
- Execution/runtime failures still return non-zero and HASH_STATUS=error.
Hash check script behavior:
- Uses a version gate first:
  - ClickHouse >= 25.8: tries FORMAT Hash.
  - ClickHouse < 25.8: skips FORMAT Hash and directly uses exact byte-level comparison.
- If FORMAT Hash is attempted but returns UNKNOWN_FORMAT, automatically falls back to exact byte-level comparison using:
  - ORDER BY tuple(*)
  - FORMAT RowBinary (value-based fallback)
  - SHA256 over the response stream
If both methods fail, report that explicitly and do not claim exact-result equivalence.

Research-backed optimization pass

Use findings from the early research pass to prioritize only high-impact candidates.
Candidate rewrites are allowed in temporary SQL files (for benchmarking/hash checks), including moving/combining/removing CTEs when readability and performance both improve.

Recommendation filter

Recommend only changes likely to produce meaningful gains.
Do not trade readability for tiny gains.
Explicitly evaluate memory impact.
For ReplacingMergeTree tables, test whether FINAL can be removed and replaced with dedup logic such as argMax(column, update_date_time) when update_date_time is the version column.
Do not treat “no duplicates right now” as sufficient evidence to remove FINAL; future duplicates are expected in ReplacingMergeTree workflows.
Only recommend the argMax(..., update_date_time) pattern when benchmark evidence and result checks show correctness is preserved.
Always flag sparse high-cardinality key join risks (address/username-like keys) when relevant.
A candidate is recommended only if gains hold across multiple windows versus original baseline.

Strict Rules

Never edit model SQL files.
Temporary candidate SQL rewrites in /tmp are allowed for experiments and benchmarking.
Never propose materialized views or projections for this workflow.
Always use HTTP on port 8123 for ClickHouse access (curl style / HTTP API).
Never use clickhouse-client or any native-port (9000) workflow in this skill.
Wrapper shell scripts must remain Bash 3.2+ compatible (macOS default bash).
Do not run --help for skill scripts during normal flow; use the canonical commands documented above.
If a script fails, inspect stderr and rerun with corrected arguments instead of probing help output.
Always run hash correctness checks before recommending query rewrites.
Keep recommendations evidence-based (benchmark + explain + schema context).
If evidence is weak or mixed, explicitly say so.
Final report must include a Research Evidence section with version(s), web search queries, and source links used.
Final report must include a comparison table for baseline vs each candidate across tested windows.
Final report must include the final recommended query text (or explicitly state no rewrite is recommended).

Output Format

Provide exactly these sections:

Input Confirmation
Rendered Runnable SQL
Dependency & Schema Findings
Research Evidence
Benchmark Results (with cache caveats)
Hash Correctness Check
Optimization Comparison Table
Top Optimizations (high impact only)
Recommended Final Query
What Not To Change
Open Questions

Research Evidence requirements:

Include detected ClickHouse version(s) used for analysis.
Include the exact web search queries used.
Include links to docs/changelog pages actually consulted.
Include a short applicability note per key source (why it matters for this version/workload).

Hash Correctness Check requirements:

Include which method was used: format_hash or rowbinary_sha256_fallback.
For rowbinary_sha256_fallback, report that comparison semantics are value-based (FORMAT RowBinary).

Optimization Comparison Table requirements:

One row per candidate per window.
Include baseline and candidate metrics side by side and percentage deltas.
Minimum columns:
- candidate
- window_label
- hashes_match
- baseline_wall_time_s_median
- candidate_wall_time_s_median
- delta_wall_time_pct
- baseline_read_bytes_median
- candidate_read_bytes_median
- delta_read_bytes_pct
- baseline_peak_memory_usage_max
- candidate_peak_memory_usage_max
- delta_peak_memory_pct
- decision (recommended or rejected)

Script Notes

Wrapper prep: .claude/skills/optimize-model/scripts/run_prepare.sh
Wrapper benchmark: .claude/skills/optimize-model/scripts/run_benchmark.sh
Wrapper candidate benchmark: .claude/skills/optimize-model/scripts/run_candidate_benchmark.sh
Wrapper hash check: .claude/skills/optimize-model/scripts/run_hash_check.sh
Underlying prep: .claude/skills/optimize-model/scripts/prepare_model_analysis.py
Resolver: .claude/skills/optimize-model/scripts/resolve_model_sql.py
Schema introspection: .claude/skills/optimize-model/scripts/introspect_tables.py
Benchmarking: .claude/skills/optimize-model/scripts/benchmark_query.py
Hash compare: .claude/skills/optimize-model/scripts/compare_query_hash.py

If a script fails, surface exact error text and continue manually only if safety and correctness are preserved.

optimize-model

Más de este repositorio

Optimize CBT Transformation Model (No Model Edits)

Defaults To Confirm First

Inputs

Workflow

Strict Rules

Output Format

Script Notes

Optimize CBT Transformation Model (No Model Edits)

Defaults To Confirm First

Inputs

Workflow

Strict Rules

Output Format

Script Notes

Más de este repositorio