| name | writing:analyze |
| description | Curate the writing plugin's trope ruleset by auditing wordlist entries against session history and surfacing candidate phrases. Use when refreshing trope detection, reviewing wordlist health, or mining sessions for new AI-writing patterns to add or stale rules to remove. |
| argument-hint | [--since date] [--model glob] [--judge] |
| disable-model-invocation | true |
| allowed-tools | ["Bash","Read","Skill(claude-code:session)"] |
Writing Analyze
Mine the session DuckDB index for assistant writing patterns, compare against the user's voice, and propose a diff to plugins/writing/wordlists/*.txt.
Arguments
Forward these from $ARGUMENTS to analyze.ts (see Run):
--since <date>: restrict the corpus to sessions on or after the date. Default: the full index.
--model <glob>: restrict to matching model IDs (e.g. *opus*). Default: all models.
--judge: add the LLM-judge pass over the deliverable corpus. See Meaning-Layer Judge. Default: off.
Prerequisites
Activate the claude-code:session skill first. Run its refresh script to update the index and capture the DB path:
DB_PATH=$(<session-skill-dir>/scripts/refresh.ts --refresh)
The refresh script prints the resolved DB path to stdout.
Build the local voice baseline once, and refresh it as new writing accumulates. It is the comparison surface for deliverable-aware rule health, local-only and never committed, stored in the plugin data directory (CLAUDE_PLUGIN_DATA, else ~/.claude/plugins/data/writing-bendrucker).
bun ${CLAUDE_SKILL_DIR}/scripts/ingest-voice.ts --source file --file <data-dir>/voice-baseline/github-prs.txt
bun ${CLAUDE_SKILL_DIR}/scripts/ingest-voice.ts --source github --author <user> --created 2019-01-01..2025-01-01
bun ${CLAUDE_SKILL_DIR}/scripts/voice-profile.ts
Both scripts write to the plugin data directory, outside the default sandbox allowlist, so run them with dangerouslyDisableSandbox: true (or via a terminal outside Claude Code).
If no profile exists, analyze still runs: deliverable-surface rules fall back to the chat audit and the report flags the baseline as not loaded.
Run
Pass the DB path via --session-db:
bun ${CLAUDE_SKILL_DIR}/scripts/analyze.ts --session-db "$DB_PATH"
bun ${CLAUDE_SKILL_DIR}/scripts/analyze.ts --session-db "$DB_PATH" --since 2026-04-01 --model '*opus*' --top 50
bun ${CLAUDE_SKILL_DIR}/scripts/analyze.ts --session-db "$DB_PATH" --project bendrucker.me --min-lift 7
Run with --help for all flags. --data-dir overrides where the voice baseline is read from (default: CLAUDE_PLUGIN_DATA or ~/.claude/plugins/data/writing-bendrucker). Writes a markdown report to tmp/trope-analysis-<date>.md (override with --out). The report may quote any host in the combined index, so keep it under tmp/ and never paste host-specific content into committed work.
Meaning-Layer Judge
--judge adds an LLM-judge pass over the deliverable corpus (six binary criteria, information-density first). It requires ANTHROPIC_API_KEY, prints a cost estimate before any call, and caps documents with --judge-limit (default 100, cents per run on the default Haiku-class model):
bun ${CLAUDE_SKILL_DIR}/scripts/analyze.ts --session-db "$DB_PATH" --judge
The judge prompt is a versioned artifact (resources/judge/prompt.md). The report records its hash, and numbers from different hashes are not comparable. Judge flag rates are uncalibrated until the #791 labeling passes run (a user checkpoint). The standalone runner covers ad-hoc files, the reproducibility gate, and the #769 heading baseline:
bun ${CLAUDE_SKILL_DIR}/scripts/judge-run.ts files <paths...>
bun ${CLAUDE_SKILL_DIR}/scripts/judge-run.ts gate
bun ${CLAUDE_SKILL_DIR}/scripts/judge-run.ts headings tmp/heading-labels.tsv
See the "Meaning-Layer Judge" section of references/methodology.md for the rubric, prompt versioning, gate, and calibration protocol.
Metrics
Lift: how distinctive a phrase is to assistant output vs. user text. lift = rate_assistant / rate_user_smoothed, where rates are per-million-token frequencies. A lift of 10.0 means the assistant uses the phrase 10x more per token than the user. The --min-lift threshold (default 5.0) gates new candidate phrases only. Rule keep/remove uses a direct rate comparison plus --min-count, not lift (see methodology).
Session count: distinct sessions containing a phrase. Candidates require session count >= 3 to filter project-specific jargon that dominates a single session.
Output
- Summary stats (corpus sizes, voice-baseline size, rule count, model breakdown)
- Voice delta (per-feature corpus rates beside the local baseline rates, each labeled skill-prescribed, skill-encouraged, or ungoverned so drift points at the right fix: tune the skill or build a detector; aggregate trends only, never per-document flags)
- Proposed removals, each tagged dead (model produced it fewer than
--min-count times on the surface where the hook fires it) or not distinctive (baseline uses it at least as often as the model)
- Proposed additions (high-lift n-grams not already covered), each with its voice-baseline rate and a spot-checkable quote
- Rule health table (every entry with type, audit surface, and
keep / remove (reason)), plus deliverable quotes for the deliverable-surface tells
- Structural pattern audit (the hook's regex patterns, hit counts across sessions)
- Structural signatures (part-of-speech tag sequences distinctive to the model's deliverable prose; word-independent, so they survive vocabulary drift between model releases)
- Meaning-layer audit (per-criterion judge flag rates with sampled spans, only when
--judge is passed)
- Corrective feedback (short human messages naming a writing problem, with the preceding model output)
- Correction candidates (long-assistant, short-user pairs suggesting prose pushback)
Each rule is judged on the surface where the hook fires it. Chat-surface rules (openers, sycophantic patterns, conversational vocabulary) compare the model's chat against the user's chat. Deliverable-surface rules (flowery-phrases.txt, soft-phrasing.txt) compare the model's deliverable prose against the user's voice baseline, so a tell frequent in PR bodies and absent from the baseline reads as keep, not dead. A rule the model uses far more than the baseline is kept even when its lift reads low. Lift is not used for removal: the smoothed user baseline (see methodology) compresses it for any word the user never types, which would flag the model's strongest tells (delve, comprehensive, robust) for removal.
Corpora
Four corpora, each serving a different purpose:
- All model-generated text: conversational assistant text (
text-export, role=assistant) combined with deliverable prose (deliverable-prose.sql). The session DB's text_content view captures only conversational text blocks, not tool inputs (Write/Edit/Bash); the deliverable query fills this gap. Used for n-gram candidates.
- Deliverable prose (
deliverable-prose.sql): Write/Edit to prose files, Bash commands with --body/--message/-m. The candidate-mining corpus and the model side of the deliverable-aware rule audit.
- Human-only user text (
text-export, role=user, filtered): the baseline for lift calculation. Filters out system-injected content (skill injections, context compaction summaries, task notifications, system reminders) and pasted model output (see methodology) that arrives as user-role messages but is machine-generated.
- Voice baseline (local-only,
voice-profile.ts): the user's hand-written, pre-AI pull requests. The baseline side of the deliverable-aware rule audit and the "absent from my baseline" signal for additions.
The FTS rule health audit uses text_content (a view covering both roles) with its own FTS indexes for chat-surface rules. Deliverable-surface rules compare deliverable prose against the voice baseline.
Workflow
Review the report. Edit plugins/writing/wordlists/*.txt by hand, then re-run to confirm. The skill never edits wordlists.
Methodology
See references/methodology.md for query details, known gaps, and tuning guidance.
See references/linguistics.md for the part-of-speech tagger evaluation behind scripts/headings-eval.ts: classifier comparison, synthetic corpus generation, promotion criteria, and dependency earn/retire rules.