| argument-hint | [file] |
| disable-model-invocation | false |
| name | spreadsheets |
| user-invocable | true |
| description | Use when CSV, TSV, or Excel (.xlsx) is the primary input/output: inspect, clean, transform, dedupe, merge, validate, convert, recalc formulas, or create/fix spreadsheets. Do not trigger when tabular data is incidental. |
Spreadsheets
Opinionated tabular-data handling for macOS. TSV/CSV is the primary format; .xlsx is the exception. Stack: qsv for fast validation/profiling, duckdb for SQL, uv-run Python (stdlib csv + Decimal) for precision transforms, qsv excel/DuckDB/fastexcel for values-only Excel reads, XlsxWriter for new workbooks, openpyxl for existing workbook mechanics, and headless LibreOffice for formula recalculation. Run all Python through uv โ never bare python or pip.
Tool Selection
scripts/peek.py, scripts/profile.py, and scripts/recalc.py are bundled with this skill. Resolve them relative to this SKILL.md's directory, not the user's current project. They are not supposed to exist in the target repo.
| Job | Use |
|---|
| First look or structural validation of CSV/TSV | uv run scripts/peek.py <file> [--strict] |
| Quality profile of CSV/TSV | uv run scripts/profile.py <file> [--markdown] |
| Counts, stats, frequencies, dedupe, column select | qsv (use --cache-threshold 0 for stats) |
| Joins, group-bys, pivots, cross-file SQL, format conversion | duckdb -c "..." |
| Row-level transforms, precision-critical edits | uv run Python with a PEP 723 header |
Anything .xlsx in or out | read references/xlsx.md first |
Recalculating .xlsx formulas | uv run scripts/recalc.py <file.xlsx> |
| Row/column-aware diff of two tables | bunx daff old.tsv new.tsv |
| Interactive viewing (suggest to the user; never launch TUIs) | csvlens, vd, Numbers.app |
Read references/recipes.md for common exact-decimal transforms, idempotent appends, schema validation, keyed diffs, and safe workbook output patterns.
Hard Rules
- Decimals, never floats. Crypto amounts carry up to 18 decimals โ beyond float64. Keep amounts as strings end to end; compute with
decimal.Decimal or DuckDB DECIMAL(38, 18). Read with all_varchar = true in DuckDB and plain stdlib csv in Python. If pandas is unavoidable, pass dtype=str.
- Touch only what was asked. No reordering, re-quoting, renumbering, or whitespace "tidying" outside the requested change. The diff must contain the change and nothing else.
- House format for authored files: TSV; UTF-8 without BOM; LF line endings; single trailing newline; lowercase
snake_case headers; ISO 8601 dates (YYYY-MM-DD; prb-finance timestamps use YYYY-MM-DD@HH:MM:SS); . decimal point; no thousands separators or currency symbols inside cells; - for null. Conventions already present in an existing file override every one of these.
- Strip BOMs on read, never write them. Open files of unknown provenance with
encoding="utf-8-sig".
- Validate after editing. Before no-shape-change edits, save a
peek.py report; after the edit, run peek.py --strict --expect-like <before-report>. For intentional row/schema changes, use --strict --expect-columns <n> instead. Add --house for authored TSVs that should follow this skill's house format. Do not skip this because peek.py is absent from the target repo; the script lives next to this SKILL.md. In prb-finance: just tsv-check, then just cli::write-changed to regenerate derived reports โ never hand-edit generated .pool.tsv/.annual.tsv/.md artifacts.
- In-place edits are atomic. Write to a temp file next to the target, verify it, then
mv over the original.
- Finance data stays local. Treat transaction logs and bank/exchange exports as private tax records: never send their contents to web services or external APIs.
- Escape spreadsheet formula injection when writing cells sourced from external data: prefix a leading
=, +, or @ with ' (a bare - null is exempt).
Inspect
peek.py is at scripts/peek.py inside this skill directory, beside this SKILL.md. In normal installed-skill use, that means:
~/.agents/skills/spreadsheets/scripts/peek.py
It is not a project-local helper and does not need to be installed in the repo being edited. If your current directory is the skill directory, run:
uv run scripts/peek.py <file> [--rows N] [--strict] [--house] [--expect-like before.peek.json]
If your current directory is the target project, run it by absolute path:
uv run ~/.agents/skills/spreadsheets/scripts/peek.py <file> [--rows N] [--strict] [--house] [--expect-like before.peek.json]
The report includes status, issues, encoding and BOM, newline style and trailing newline, delimiter and how it was detected, header with duplicates flagged, column/row counts, ragged and empty rows, - null usage, qsv validation metadata when available, and sample rows. Default inspection exits 0 for parseable delimited files. Validation flags exit 1 when they find issues. Operational errors such as missing files, binary spreadsheets, or empty files exit 2.
Fast validation loop:
uv run ~/.agents/skills/spreadsheets/scripts/peek.py txs.tsv > txs.before.peek.json
uv run ~/.agents/skills/spreadsheets/scripts/peek.py txs.tsv --strict --expect-like txs.before.peek.json
uv run ~/.agents/skills/spreadsheets/scripts/peek.py txs.tsv --strict --expect-columns 12
uv run ~/.agents/skills/spreadsheets/scripts/peek.py txs.tsv --redact-samples
--expect-like catches drift in column count, header, delimiter, encoding, newline style, trailing newline, and data row count; it also fails on newly introduced ragged, empty, duplicate-header, BOM, or qsv validation problems. --engine auto uses a fast parser for simple unquoted delimited files and falls back to stdlib csv; use --engine python if you need to force the conservative path. On a binary spreadsheet, peek.py exits with a pointer to the xlsx workflow.
Full local quality profile:
uv run ~/.agents/skills/spreadsheets/scripts/profile.py txs.tsv --markdown --redact-samples
The profile combines peek.py, qsv stats/frequencies without sidecar caches, header safety, formula-injection detection, and next-step recommendations. Use JSON output by default for machine reading; use --markdown for a concise human report.
Quick follow-ups with qsv:
qsv count txs.tsv
qsv headers txs.tsv
qsv stats --cache-threshold 0 -E txs.tsv | qsv table
qsv frequency -s event txs.tsv
qsv select date_utc,amount txs.tsv
qsv dedup txs.tsv
qsv infers the input delimiter from the file extension, but stdout is always comma-separated. When the result must stay TSV, write it with -o out.tsv (the output extension sets the delimiter) โ never shell redirection. qsv stats creates sidecar caches by default when runs are slow; use --cache-threshold 0 unless the user explicitly wants reusable qsv caches or temporary indexes for very large files.
Query with DuckDB
Canonical read โ everything as strings, - mapped to NULL:
FROM read_csv('txs.tsv', delim = '\t', header = true, all_varchar = true, nullstr = '-');
SUMMARIZE SELECT * FROM read_csv('txs.tsv', delim = '\t', all_varchar = true, nullstr = '-');
SELECT event, SUM(amount::DECIMAL(38, 18)) AS total
FROM read_csv('txs.tsv', delim = '\t', all_varchar = true, nullstr = '-')
GROUP BY event
ORDER BY total DESC;
COPY (SELECT ...) TO 'out.tsv' (FORMAT csv, DELIMITER '\t', HEADER true, NULLSTR '-');
DuckDB also reads and writes .xlsx (read_xlsx, COPY ... (FORMAT xlsx)) โ see references/xlsx.md.
Transform with uv-run Python
Stdlib csv keeps every cell a string โ precision-safe by default. Script template:
import csv
from decimal import Decimal
with open("in.tsv", encoding="utf-8-sig", newline="") as f:
rows = list(csv.DictReader(f, delimiter="\t"))
with open("out.tsv", "w", encoding="utf-8", newline="") as f:
writer = csv.DictWriter(f, fieldnames=rows[0].keys(), delimiter="\t", lineterminator="\n")
writer.writeheader()
writer.writerows(rows)
- Pass
newline="" to every open() the csv module touches, and lineterminator="\n" for LF output.
- Third-party deps go in the PEP 723 block; for one-liners use
uv run --with <pkg> python -c "...".
- For idempotent backfills, dedupe by multiset difference โ count existing identical rows and append only the surplus, because identical rows can be legitimate (e.g. batch payouts).
Excel (.xlsx)
Read references/xlsx.md whenever a .xlsx/.xlsm is input or deliverable: openpyxl create/edit, DuckDB xlsx I/O, styling, conversion recipes, and the recalculation loop. The two absolutes:
- Write real formulas (
=SUM(B2:B9)), not values precomputed in Python.
- After writing any formula, run
uv run scripts/recalc.py <file.xlsx> and deliver only when it exits 0 with "status": "success". Formula errors and incomplete cached values exit nonzero by default; use --soft only when you deliberately want a report without failing automation.
Recalculation needs LibreOffice: brew install --cask libreoffice. The script finds the app bundle on its own; soffice does not need to be on PATH.