| name | reverse-engineer |
| description | One-off, operator-run skill that reverse-engineers house coding standards, conventions, and anti-patterns from a repository's git history and GitLab MR discussions, then proposes concrete edits to the sql-migration skill's artifacts. Trigger phrases like "reverse engineer conventions", "mine review comments", "audit the sql-migration flow". |
reverse-engineer
A deeper, self-learning companion to sql-conventions-miner. Where the
miner extracts structural patterns from migration files, this skill
does archaeology: for each file it cares about, it walks the
commit → MR → reviewer-comments trail and synthesises why the file
looks the way it does. It also reads the other skills in this repo and
proposes concrete flow improvements.
Run this once at bootstrap, then periodically (quarterly is a good
cadence) to keep the runtime skill's knowledge fresh.
Operator
A human (DBA / tech lead / ops). Not per ticket. All outputs land in
.scratch/reverse-engineer/ as drafts and require human review before
any file under skills/sql-migration/ is updated.
Inputs
- Path to a full local clone of the migrations repo (so
git log,
git blame, git show work).
- Optional: commit range (
--since=<date>, --until=<date>,
--tag-range <old>..<new>).
- Optional: file/dir filter (default:
db/migrations/**/*.sql).
- Optional:
--with-mrs to enrich with GitLab MR metadata and comments
via ./bin/gitlab-read. If omitted, phase B is skipped and outputs
are produced from git alone.
Allowed tools
Read on anything under the clone's working tree and on
skills/** in this repo.
Bash restricted to: git log, git show, git blame,
git diff, git rev-list, git merge-base, git name-rev,
git cat-file.
./bin/gitlab-read (read-only GitLab access).
- Write is permitted only to
.scratch/reverse-engineer/**.
Denied: every other wrapper (db-query, migration-write, jira-*,
gitlab-mr), every direct network/DB client, any write outside
.scratch/reverse-engineer/**.
Method (god-level reverse engineering, five phases)
Phase 1 — Corpus selection
- List candidate files under the filter (default
db/migrations/**/*.sql).
- Stratify by change type (DDL keyword, path prefix, author,
decade of history) so the sample spans clusters, not just recency.
- Cap the working set at ~200 files; record the selection manifest
in
.scratch/reverse-engineer/manifest.json with reasons per file.
Phase 2 — Per-file git archaeology
For each file in the corpus:
git log --follow --format='%H%x09%ci%x09%an%x09%s' -- <file> →
ordered commit list (handles renames via --follow).
- For each commit:
git show --stat --format=fuller <sha> — what changed, by whom,
when.
git show <sha>:<file> — the file at that revision.
- Parse the commit message for MR references:
See merge request <proj>!<iid>, Merge branch ... into ...,
full GitLab URLs.
git blame --line-porcelain <file> — attribute current lines to
their introducing commits; cluster blame by author for "who owns
this pattern" signals.
- Record per-file in
.scratch/reverse-engineer/files/<sha-of-path>.json:
{ path, commits: [...], current_blame_summary, mr_refs: [...] }.
Phase 3 — MR enrichment (optional, --with-mrs)
For each unique MR reference from phase 2:
./bin/gitlab-read mr-view <iid> → title, description, merge state.
./bin/gitlab-read mr-diff <iid> → the final accepted diff.
./bin/gitlab-read mr-comments <iid> → reviewer discussion;
filter system notes.
- Summarise each MR into
.scratch/reverse-engineer/mrs/<iid>.json:
{ iid, title, description, reviewers, discussion_themes, requested_changes, rationale_signals }.
- Signals to extract from comments (heuristics — document them in
the manifest):
- "please wrap in BEGIN/COMMIT" → transaction-wrapping rule.
- "use IF NOT EXISTS" → idempotency rule.
- "batch this" / "too many rows" → batch-threshold rule.
- "revert" / "hotfix" / "broke prod" → anti-pattern.
- "header missing X" → header-template rule.
Phase 4 — Synthesis
Cluster the signals and emit drafts:
.scratch/reverse-engineer/conventions.draft.md — proposed
replacement for skills/sql-migration/conventions.md. Each section
cites the commit SHAs and MR iids that support it (so the DBA can
verify). Diff against the existing conventions.md for easy review.
.scratch/reverse-engineer/coding-standards.draft.md — broader
style and practice rules beyond the SQL specifics: commit message
shape, PR/MR title shape, branch naming, header required fields,
review SLAs, release cadence. This is a new artifact the runtime
skill can reference once a DBA signs off.
.scratch/reverse-engineer/anti-patterns.draft.md — replacement
for skills/sql-migration/anti-patterns.md, one entry per
revert/hotfix pair with commit SHAs and a one-line cause.
.scratch/reverse-engineer/templates/<type>.{forward,rollback}.draft.sql
— updated templates when the archaeology reveals a better idiom.
.scratch/reverse-engineer/examples/ — 5–10 canonical files with
short annotations.
Phase 5 — Self-review of the sql-migration skill (the "self-learning" loop)
Read the existing skill in place:
skills/sql-migration/SKILL.md
skills/sql-migration/checklist.md
skills/sql-migration/conventions.md
skills/sql-migration/templates/
skills/sql-migration/subagents/
Cross-reference against the archaeology from phases 2–4 and produce:
.scratch/reverse-engineer/skill-review.md — a structured report
with sections:
- Gaps — rules the history supports but the skill does not
enforce.
- Overreach — rules the skill enforces that the history does not
justify.
- Workflow edits — concrete step-by-step suggestions for
SKILL.md's Workflow section, with before/after snippets.
- Checklist edits — items to add, remove, or reword, each
backed by MR/commit citations.
- Template edits — specific placeholder or guard changes.
- Subagent edits — changes to the schema-discovery or
code-review contracts (e.g. "schema-discovery should also check
index existence; see !412").
Hard rules
- Drafts only. Never overwrites files under
skills/** or bin/**.
All output lands under .scratch/reverse-engineer/**.
- Read-only on the target clone. No commits, no pushes, no
network calls outside
./bin/gitlab-read. The operator opens the
improvement MR by hand after reviewing the drafts.
- Cite or stay silent. Every proposed rule must carry at least
one commit SHA or MR iid. No rule without a citation.
- Idempotent. Re-running the skill on the same inputs produces
the same outputs (order-independent clustering; stable filenames
derived from inputs).
- Bounded. If the corpus exceeds 200 files or 1000 MRs, the skill
STOPS and asks the operator to narrow the range rather than silently
truncating.
Execution by harness
- Claude Code: dispatched as a
Task subagent with tools scoped
to the allowed list above; write access limited to
.scratch/reverse-engineer/** via the per-task permission scope.
- Codex: runs under the standard sandbox
(
sandbox_mode = "workspace-write", network off). The main agent
narrows its own tool use to the allowed list for the duration of
each phase. .scratch/reverse-engineer/** is already covered by
the .scratch writable root in .codex/config.toml.
Output handoff
When complete, the skill prints a summary like:
reverse-engineer: wrote 7 drafts to .scratch/reverse-engineer/
- conventions.draft.md (diff: +42 / -11 lines vs. live)
- coding-standards.draft.md (NEW, 128 lines)
- anti-patterns.draft.md (22 entries, each with SHAs)
- templates/add_column_nullable.forward.draft.sql (diff: +3 / -1)
- templates/add_column_nullable.rollback.draft.sql (unchanged)
- examples/ (8 canonical files)
- skill-review.md (14 findings, 3 blockers)
Review these, diff against skills/sql-migration/, then open an MR
by hand if you want to promote them.
The operator reviews, possibly edits, and lands changes through the
normal review process — not via any wrapper in this repo.