name	regen-shelltests
description	Regenerate expected output in failing shelltestrunner v3 fixtures (`hledger/test/*/.test`) after a hledger output-format change (alignment, column shifts, rendering tweaks). Use when many `.test` fixtures are stale because of a deliberate, audited rendering change.

Regenerating shelltestrunner fixtures

When a change to hledger's output rendering (alignment, padding, column positions) breaks many fixtures in hledger/test/, use tools/regen-shelltests.py to bulk-update them.

The script's default mode is whitespace-only: it overwrites a test's expected stdout with the actual output only when the difference is purely whitespace (column shifts). Anything else — added/removed lines, changed numbers, different commodities — is left for human review. This makes it safe to run on a batch of failing tests without losing in-flight content edits.

When to use this skill

The user is dealing with many failing functional tests after a deliberate rendering change, and wants the column shifts auto-applied while preserving any other content for hand-review.

Do NOT use this skill if the user wants to regenerate all output blindly without auditing — that's a sign the underlying change isn't well understood, and a hand-review of a few representative failures is the right next step.

Workflow

Build hledger first — the script invokes stack exec -- which hledger to find the binary and prepends its dir to PATH so test wrapper scripts (e.g. csvtest.sh) pick up the just-built one.
```
stack build hledger
```

Identify failing test files (not individual tests — the script processes whole files):

stack exec -- shelltest --execdir --exclude=/_ hledger/test/ \
  -x hledger/test/perf.test \
  -x ledger-compat/ledger-baseline -x ledger-compat/ledger-regress -x ledger-compat/ledger-extra \
  -j8 2>&1 \
  | grep "Failed\]" \
  | sed 's/^:\(.*\):[0-9]*: \[Failed\]$/\1/' \
  | sort -u > /tmp/failing-files.txt

Run the regen on just those files:
```
cat /tmp/failing-files.txt | xargs tools/regen-shelltests.py
```
Output looks like:
```
hledger/test/csv.test: 63 tests, 3 updated, 7 skipped
...
TOTAL: 102 tests, 9 updated, 17 skipped
```
"Updated" = expected stdout overwritten because the diff was whitespace-only. "Skipped" = left as-is for one of these reasons (the script logs each):
- Test has a stderr expectation (>2 … or >>>2 …).
- No exit marker (malformed or v2-format).
- Exit code mismatched (real failure, not just formatting).
- Diff was more than whitespace.
Re-run the tests to confirm the regen reduced failures:
```
just functest --hide
```
Hand-fix the remainder. Common remaining cases:
- Stderr-regex tests with embedded posting renders (errors/, journal/parse-errors.test, etc.) — adjust the spaces/carets in the regex by hand.
- Tests without exit markers — add >= or >= 0 if the format is malformed.
- Real content changes — investigate; these are not just-formatting issues.
git diff and review before committing. The whitespace-only safeguard is good but not infallible.

Flags

--all — bypass the whitespace-only check and overwrite whenever the run produced the expected exit code. Use only after you've audited a few failures and confirmed the actual output is correct in a non-whitespace way.

Limitations

Doesn't handle stderr regenerate. Stderr regex blocks (>2 /…/ and >>>2 /…/) are skipped because regex regen is fiddly (escaping, multi-line patterns, regex-tdfa size limits).
Doesn't handle v2-format input blocks (<<< / >>> / >>>=). Files using those are passed through verbatim.
Doesn't update doctests in Haskell sources (run just doctest separately, then hand-update).
Regenerates in place. Always work on a clean tree so git diff is meaningful for review.

Anti-patterns

Regen with --all on a fresh transcript without inspection. This is the same as accepting whatever hledger emits today as correct, including bugs. Always run with the default (whitespace-only) first; only escalate to --all for tests where you've manually verified the new output.
Regen across the whole hledger/test/ directory. The whitespace-only mode protects against this somewhat, but it's still safer to feed only the failing files — that way you'll notice if a passing test mysteriously changes.
Skipping the rebuild. If stack build hledger is stale, the regen reflects yesterday's binary, not your latest code. Always build first.

Regenerating shelltestrunner fixtures

When a change to hledger's output rendering (alignment, padding, column positions) breaks many fixtures in hledger/test/, use tools/regen-shelltests.py to bulk-update them.

When to use this skill

The user is dealing with many failing functional tests after a deliberate rendering change, and wants the column shifts auto-applied while preserving any other content for hand-review.

Workflow

Build hledger first — the script invokes stack exec -- which hledger to find the binary and prepends its dir to PATH so test wrapper scripts (e.g. csvtest.sh) pick up the just-built one.

stack build hledger

Identify failing test files (not individual tests — the script processes whole files):

stack exec -- shelltest --execdir --exclude=/_ hledger/test/ \
  -x hledger/test/perf.test \
  -x ledger-compat/ledger-baseline -x ledger-compat/ledger-regress -x ledger-compat/ledger-extra \
  -j8 2>&1 \
  | grep "Failed\]" \
  | sed 's/^:\(.*\):[0-9]*: \[Failed\]$/\1/' \
  | sort -u > /tmp/failing-files.txt

Run the regen on just those files:

cat /tmp/failing-files.txt | xargs tools/regen-shelltests.py

Output looks like:

hledger/test/csv.test: 63 tests, 3 updated, 7 skipped
...
TOTAL: 102 tests, 9 updated, 17 skipped

"Updated" = expected stdout overwritten because the diff was whitespace-only. "Skipped" = left as-is for one of these reasons (the script logs each):

Test has a stderr expectation (>2 … or >>>2 …).
No exit marker (malformed or v2-format).
Exit code mismatched (real failure, not just formatting).
Diff was more than whitespace.

Re-run the tests to confirm the regen reduced failures:

just functest --hide

Hand-fix the remainder. Common remaining cases:

Stderr-regex tests with embedded posting renders (errors/, journal/parse-errors.test, etc.) — adjust the spaces/carets in the regex by hand.
Tests without exit markers — add >= or >= 0 if the format is malformed.
Real content changes — investigate; these are not just-formatting issues.

git diff and review before committing. The whitespace-only safeguard is good but not infallible.

Flags

--all — bypass the whitespace-only check and overwrite whenever the run produced the expected exit code. Use only after you've audited a few failures and confirmed the actual output is correct in a non-whitespace way.

Limitations

Doesn't handle stderr regenerate. Stderr regex blocks (>2 /…/ and >>>2 /…/) are skipped because regex regen is fiddly (escaping, multi-line patterns, regex-tdfa size limits).

Doesn't handle v2-format input blocks (<<< / >>> / >>>=). Files using those are passed through verbatim.

Doesn't update doctests in Haskell sources (run just doctest separately, then hand-update).

Regenerates in place. Always work on a clean tree so git diff is meaningful for review.

Anti-patterns

Regen with --all on a fresh transcript without inspection. This is the same as accepting whatever hledger emits today as correct, including bugs. Always run with the default (whitespace-only) first; only escalate to --all for tests where you've manually verified the new output.

Regen across the whole hledger/test/ directory. The whitespace-only mode protects against this somewhat, but it's still safer to feed only the failing files — that way you'll notice if a passing test mysteriously changes.

Skipping the rebuild. If stack build hledger is stale, the regen reflects yesterday's binary, not your latest code. Always build first.