with one click
cmd-golden-tests
// Set up or extend golden/snapshot tests for a project. Covers fixture design, Makefile targets, snapshot storage, diff workflow, and update protocol.
// Set up or extend golden/snapshot tests for a project. Covers fixture design, Makefile targets, snapshot storage, diff workflow, and update protocol.
Ask the agent whether it finished everything or has more to do — a lightweight completeness gate for the end of any task
Audit personal skills for redundancy, verbosity, weak triggers, and overlap. Runs a Claude→Codex review loop, presents per-item approval checkboxes, then applies approved edits and updates README and agent metadata. Use when asked to "review my skills", "audit my skills", "revisit my skills", or "clean up my skills". Accepts an optional skill name to scope the review to a single skill.
Resolve merge conflicts systematically with context-aware 3-tier classification and escalation protocol
Reshape code for readability, naming, structure, TODOs, and reduced surface area across any language
Create or improve Makefiles with minimal complexity. Templates available: base, python-uv, python-fastapi, postgres, nodejs, go, chrome-extension, flutter, electron, static-site.
Build high-signal PR context for review with diff analysis, risk assessment, and discussion questions
| name | cmd-golden-tests |
| description | Set up or extend golden/snapshot tests for a project. Covers fixture design, Makefile targets, snapshot storage, diff workflow, and update protocol. |
| allowed-tools | Read, Write, Edit, Grep, Glob, Bash |
Golden (snapshot) tests capture the exact output of a pipeline or subsystem as a reference file, then fail any run that deviates from it. They are the highest-fidelity regression check: if anything in the pipeline changes — parsing, mapping, migration, export — the diff tells you exactly what shifted.
The paradigm for this skill comes from boredm/gint_to_boredm. Read it before adapting anything:
makefiles/golden.mkscripts/golden_thompson.pyserver/tests/fixtures/golden.mdAnswer these before writing any code or Makefile targets.
The input data the pipeline runs against. Must be:
In boredm: real .gpj files in thompson_sample/ (binary, not committed to the main repo — seeded via make seed-thompson).
How the system is invoked in the golden test. Three patterns:
| Pattern | Use when | Example |
|---|---|---|
| Direct function call | Testing a pure transformation | schema_grouping.build_grouping_outputs(schemas) |
| HTTP endpoint via TestClient | Testing a route end-to-end | POST /api/units/generate-migrations/{schema} |
| Full workflow via manifest | Testing the entire pipeline | Run wf1–wf7, read manifest from disk |
Avoid mixing patterns in the same golden — pick the right scope.
The reference snapshot file. Key design choices:
Semantic golden (single compact file):
Phase-level goldens (per-schema per-phase files):
{phase}_{schema}.json — e.g., wf1_grouping_file_group_1.jsonUnit-test-style fixture (step-by-step JSON):
baseline → mutation → restoredNever silently. The update workflow must be:
Makefile targets enforce this:
golden-{dataset}-verify # compare without running pipeline
golden-{dataset}-update # rewrite golden from latest output
golden-{dataset}-wf # run pipeline + compare (fails on diff)
| Scenario | Action |
|---|---|
| Expected change (feature added, behavior improved) | Inspect diff → make golden-{dataset}-update → commit both code and golden |
| Regression (pipeline broke something) | Fix root cause, never update golden to hide it |
| False positive (volatile field leaked into snapshot) | Remove volatile field from snapshot extractor, not from golden |
| False negative (golden too coarse, misses real change) | Add a phase-level golden or tighten the snapshot to cover the missed surface |
Follow this naming pattern. Replace {dataset} with the fixture dataset name (e.g., thompson).
## Golden Tests
golden-{dataset}-test ## Unit-test-like suite; pytest against JSON fixtures; no server needed
golden-{dataset}-wf ## Full end-to-end workflow + semantic manifest comparison; server required
golden-{dataset}-verify ## Compare latest manifest to golden without re-running workflow
golden-{dataset}-update ## Rewrite semantic golden from latest manifest output
golden-{dataset}-phase-verify ## Compare all phase-level goldens ({phase}_{schema}.json files)
golden-{dataset}-phase-update ## Update all phase-level goldens from latest manifest
Separate into two modes in the help output:
[Golden — unit-like]
golden-{dataset}-test Fast; no server; pytest fixtures
[Golden — full workflow]
golden-{dataset}-wf Slow; server required; full e2e
golden-{dataset}-verify Compare only; no re-run
golden-{dataset}-update Rewrite golden (inspect diff first!)
server/tests/fixtures/
{dataset}_semantic_wf_verification_golden.json # single compact semantic golden
{dataset}_schema_grouping_golden.json # unit-test-style fixture
{dataset}_unit_migration_golden.json # lifecycle fixture with steps
{dataset}_phase_goldens/
wf1_grouping_{schema}.json
wf2_mapping_{schema}.json
wf3_units_{schema}.json
wf3b_migrations_{schema}.json
wf4_normalized_{schema}.json
wf5_qc_{schema}.json
wf6_exports_{schema}.json
wf7_insights_{schema}.json
A standalone script (not pytest) manages semantic and phase goldens. Key functions:
def build_snapshot(manifest: dict) -> dict:
"""Extract stable, deterministic subset from full pipeline manifest."""
# 1. Pull only the fields you care about (row counts, match types, etc.)
# 2. Exclude volatile fields: timestamps, paths, binary hashes, job IDs
# 3. Sort all nested dicts for deterministic output
def compare_to_golden(snapshot: dict, golden_path: Path) -> bool:
"""Unified diff, colorized. Returns True if match."""
# Uses difflib.unified_diff with red/green color codes
def write_golden(golden_path: Path, snapshot: dict) -> None:
"""Rewrite golden file. Sort keys, trailing newline."""
# json.dumps(data, indent=2, sort_keys=True) + "\n"
CLI flags:
--verify compare latest manifest to golden (default)
--update-golden rewrite semantic golden
--phase-goldens verify per-phase files
--update-phase-goldens rewrite per-phase files
--schema target a specific schema only
--all-schemas run across all schemas
These leak into diffs and cause false positives:
created_at, updated_at, dataset directory names with dates)make seed-{dataset})scripts/golden_{dataset}.py)server/tests/test_*_golden.py)