원클릭으로 Manus에서 모든 스킬 실행

fill-cards

스타2

포크0

업데이트2026년 5월 22일 15:27

Investigation-driven card grooming — investigate across data sources, synthesize findings into card content, present for approval

설치

Codex 또는 Claude로 설치 이 Prompt를 복사해 Codex, Claude 또는 다른 어시스턴트에 붙여 넣으면 Skill 페이지를 검토하고 설치를 진행할 수 있습니다.

Manus에서 실행

출처

paulyokota

paulyokota/FeedForward

GitHub 저장소 열기 Creator 저장소 보기

다운로드

Manus에서 실행

Fill Cards

Find Shortcut stories with empty template sections, investigate across data sources (Intercom, PostHog, codebase, Slack, Jam), synthesize findings into card content, and present a complete draft for Paul's approval.

The goal is an approved draft, not a pushed card. Pushing to Shortcut is a separate step that happens only after Paul has reviewed the full card text and said to ship it. "Let me push it" is not approval. Present the wording, wait for explicit go-ahead.

Supports voice mode if available.

Arguments

$ARGUMENTS determines the mode:

SC-NNN (default): Single card, standard investigation. Run the full protocol below on the named card.
--lead: Parallel fill (3a) lead role. Find candidates, propose a product-area cluster split, brief the partner instance in an agenterminal thread, work your own cards, own session-end documentation.
--partner: Parallel fill (3a) partner role. Join the agenterminal thread, read your assignment, acknowledge, then work your assigned cards.

All three modes share the same per-card investigation protocol. The modes differ only in how cards are selected and how coordination happens.

Constraints

Mutation gate: A PreToolUse hook blocks all Slack and Shortcut mutations through Bash. Route through agenterminal.execute_approved or present the command for the user to run.
Human-in-the-loop: Present one card at a time with full content rendered. Wait for Paul's decision. Don't batch-execute without review.
Context protection: Delegate codebase exploration and high-volume Intercom filtering to subagents. Primary context is for targeted verification and synthesis, not broad exploration.
Mutation cap: 25 mutations per run. Each description update, state change, story link, or archive = 1. Finish the current item before checking the cap.
Rate limiting: 0.5s delay between sequential API calls. Respect Retry-After.
Before Shortcut or Slack API calls: use the wrapper scripts named in the play steps (Intercom: intercom-search.py, intercom-evidence.py; Shortcut: see Mutation Scripts table in box/shortcut-ops.md). If no wrapper exists for the specific operation, grep reference/tooling-logistics.md for the recipe. Don't re-derive payload shapes or endpoint paths.

Steps

0. Resume check (single-card mode only)

Before anything else:

ls .agent/fill-cards/ 2>/dev/null

If sc{NNN}.md exists for the card you are about to work:

Read it in full.
Re-set TodoWrite items from the Investigation Plan section — these do not survive compaction and must be restored for step 4b accounting to work.
Report current state to the user based only on what is written in the session file. Do not use the compaction summary. Name which steps completed, what was found, and where the session ended.
Ask whether to resume from that point or start fresh.
Wait for explicit direction before proceeding.

If the user chooses "start fresh": archive the existing file before proceeding:

mv .agent/fill-cards/sc{NNN}.md .agent/fill-cards/sc{NNN}.bak.$(date +%s).md

Then continue to Step 1. Preflight will create a new session file.

If no session file exists, continue to Step 1.

Trustworthy result files on resume: Only read sidecar .json files that have a matching ### {label} result saved receipt in the current session file — path matches and file exists on disk. Any sc{NNN}-*.json files on disk without a matching receipt in the current session file are stale and must be ignored. If multiple receipts exist for the same path, the one with the latest timestamp is authoritative; earlier receipts for the same path are superseded.

1. Find candidates (single-card mode)

If $ARGUMENTS is SC-NNN, skip candidate discovery — you already have the card.

Otherwise, find cards with empty sections:

python3 box/shortcut-cards.py --state "In Definition" --empty-sections --summary

This ranks by most empty sections first, then oldest.

2. Pre-flight

Run these before starting investigation. Not optional.

Fetch the card: python3 box/shortcut-cards.py --id SC-NNN --description. Check the workflow state: Need Requirements cards get context to support a stakeholder conversation. In Definition cards get the full treatment (investigation, solution sketch, all sections).
Bug or feature? Bug cards use a leaner template (skip blank Monetization, UI, Reporting, Release sections).
New feature or extension? Frame as extension when possible. Identify the closest existing feature surface.
- Same-day sibling cards: If other cards in the same product area were created the same day (brainstorm batch, investigation batch, or related bug filings), read all of them — not just those already in story links. Scope overlap and disambiguation needs are common and not caught by the story-links check alone. Proved SC-1517: SC-1518 created same day, same area, required explicit disambiguation but wasn't in story links.
- Check In Build cards in the same product area. A card actively changing pricing or feature behavior (e.g. SC-46 adding credit costs) invalidates assumptions on sibling cards. Fetch In Build cards for the product area before starting investigation.
Visual/UX cards: Ask for screenshots of the actual behavior before investigating the code mechanism. Code tells you what can happen; a screenshot tells you what does happen. Screenshots are the primary source for visual issues; code is secondary.
Run pre-flight checks (hard gate):
```
python3 box/preflight.py --product-area AREA
```
Replace AREA with the card's product area (e.g. SMARTPIN, TURBO). This verifies all tokens, DB connectivity, sync freshness, and extracts known PostHog event names for the product area. Results are automatically logged to the preflight_log audit table.

If preflight exits non-zero, stop. Do not proceed to investigation. A stale Intercom index means any "no signal" finding is unreliable. Report the failure to the user and wait for resolution before continuing. This is not advisory — investigating on stale data produces false negatives that are silent, permanent, and undetectable after the fact.

Preflight output includes PostHog event names for the product area. Use these in step 4 instead of reading box/posthog-events.md separately.

After preflight passes, scan .claude/rules/product-knowledge.md for entries relevant to this card's product area. Note any feature state, entity scope, or vocabulary facts that apply to the investigation.

Run preflight with session file. Pass --session-file .agent/fill-cards/sc{NNN}.md to the preflight command. Preflight creates the file if it does not exist:

python3 box/preflight.py --product-area AREA --card-id SC-NNN --session-file .agent/fill-cards/sc{NNN}.md

Pass --session-file .agent/fill-cards/sc{NNN}.md to all subsequent script calls in this session. Pass session_file and save_path to all agenterminal.delegate calls. No separate file creation step needed.

Fill-cards does not use collect. Disk persistence is handled by save_path on delegate (push-first, PR #210). The checkpoint-approval gate that collect provides is not needed — approve_content at step 7 is the human-in-the-loop point. Push delivery keeps the agent responsive (no blocking call to intercept). [Delegate completed] and [Delegate terminal] messages arrive as session turns, not conversation thread entries.

3. Plan the investigation

Before dispatching anything, plan which data sources this card needs and what question each answers. Not every card needs all sources. A card about a UI change may not need PostHog. A card originating from a Slack idea with no user-facing reports may not need Intercom. Write the plan as a short list before starting. TodoWrite the plan items so they persist through step 4b (investigation accounting). Narrated plan items are invisible at accounting time; pending todos are not. Proved 2026-04-30: routing question planned in step 3, abandoned after one failed grep, accounting reported "Planned but NOT executed: None."

Write the plan to the session file. After setting todos, append to the session file:

## Investigation Plan
- {source 1}: {question}
- {source 2}: {question}

This is the cheapest recovery artifact: a fresh agent can restore todos from this section without replaying any investigation steps.

Record the delegation decision and expected read count for each data source. The investigation plan must explicitly state how each source will be handled. For codebase, direct investigation reads are capped at 3 per card. If the expected count exceeds 3, the plan must use delegation.

## Investigation Plan
- Intercom: delegate filter (Sonnet mode, >8 expected candidates)
- Codebase: delegate explore (3 axes: autosave architecture, event emission, FormResetHandler)
- PostHog: direct (iterative queries)

or:

## Investigation Plan
- Intercom: direct search (≤8 expected, narrow keyword)
- Codebase: direct (2 reads: grep for event name, read emit file)
- PostHog: direct (single funnel query)

The declared read count makes the delegation decision visible at planning time and auditable at accounting time (Step 4b). If actual reads exceed the declared count, the accounting must flag the deviation.

Bug cards: verify the code mechanism before searching Intercom. Understanding the failure mode tells you what symptoms to look for and how to distinguish this bug from related ones. Without code grounding, you evaluate Intercom conversations against the card's claims — proxy trust on the thing you're supposed to verify. Proved SC-1517: assessed conversation relevance against card description before reading any code; human caught it.

Multi-card sessions: compare investigation scope. If this is the Nth card in a session, list the data sources and search breadth used for the previous card before planning this one. The quality gate checklist measures artifact existence, not investigation depth — a card built from reused adjacent findings passes the same checklist as one with a full standalone investigation. Proved 2026-04-08: Ghostwriter card drafted from SC-1238 codebase findings with 2 Intercom searches vs 7+ for SC-1238; checklist passed; human caught it.

4. Investigate

Work through the data sources identified in the plan. The investigation phase is complete when every data source in the plan has findings (or an honest "no signal" result), and those findings came through the defined channel: delegated exploration or within-budget direct reads for codebase (see Step 4 budget rule), direct search or filter subagent for Intercom, saved insights for PostHog.

Intercom

Three-phase action through scripts. No raw SQL.

Search: python3 box/intercom-search.py "terms" (add --since, --no-canned, --limit as needed). The script has a mechanical freshness gate — it blocks with exit code 2 if the index is stale (>36h).

Delegation decision tree (based on expected volume from Step 3 plan):
- ≤8 expected results → Search directly (intercom-search.py in primary context). Primary reads all results. No delegation overhead.
- >8 expected results → Delegate intercom-filter (Sonnet mode): python3 box/compose-delegation.py intercom-filter --var SEARCH_GOAL="..." --var NOISE_PATTERNS="..." --var MODEL_OVERRIDE=sonnet. Delegate searches, reads, classifies. Primary reads SIGNAL conversations only. Intercom audit (Step 8) catches missed signal.
- Unknown volume → Default to delegation (Sonnet mode). The >8 path is safe for small result sets (just more overhead). The ≤8 path is unsafe for large result sets (fills primary context).
For multi-phase investigations with file_path, read the template directly (box/intercom-filter-prompt.md).

Session persistence for intercom-filter delegation. Call agenterminal.delegate, passing:
- conversation_id: (the active session conversation)
- session_file: ".agent/fill-cards/sc{NNN}.md"
- session_label: "intercom-filter"
- save_path: ".agent/fill-cards/sc{NNN}-intercom-filter-result.json"
The delegate tool writes the task_id and label to the session file automatically. The server writes the result JSON to save_path on completion, before the push.

Continue other investigation work. When a [Delegate completed] or [Delegate terminal] message arrives for this delegation:
1. Check the header for saved_to= — if present, the file is on disk.
2. Verify: ls .agent/fill-cards/sc{NNN}-intercom-filter-result.json
3. If the file exists, append to the session file:
  
  intercom-filter result saved ({timestamp})
  
  Task ID: {task_id} Path: .agent/fill-cards/sc{NNN}-intercom-filter-result.json Status: {completed|timed_out}
4. If saved_to is absent from the header, append:
  
  intercom-filter result save failed ({timestamp})
  
  Task ID: {task_id} Expected path: .agent/fill-cards/sc{NNN}-intercom-filter-result.json
Vocabulary shift (after first pass). Card titles anchor search terms toward product framing ("wrong site," "duplicate draft"). Users say "combined," "jumbled," "can't find," "manually made copies." After the first search pass, pause and ask: "how would a user describe this to support?" Run a second pass with user-vocabulary terms before declaring search complete. Logged 5 times (SC-201, SC-874, SC-1108, SC-67, SC-984) — the audit agent catches it every time because it only sees the What section, not the card title.

Vocabulary category checklist (second pass). Before declaring Intercom search complete, check each category:
- Symptom language ("scrambled," "mixed up," "wrong order," "lost")
- Use-case framing ("editorial calendar," "campaign," "seasonal content")
- Non-English terms if the feature has international users (German, Spanish)
- Workaround language ("manually," "one by one," "how do I fix") The primary search anchors to product framing from the card title; these categories reach users who describe the need, not the feature. Proved 2026-04-22: primary search (6 queries, English product terms) found 1 conversation; audit (20 queries including German, use-case framing) found 5 additional.
Check shipped-card timeline. Before citing Intercom conversations as current evidence, check whether a related card shipped since those conversations. Use --since to filter to post-ship dates. Pre-ship conversations about a now-fixed issue inflate the evidence count for a problem that no longer exists. Proved 2026-04-08: multiple Jan-Feb credit purchase complaints predated SC-470 (shipped Mar 18).
Read: python3 box/intercom-search.py --read ID1,ID2 for the conversations going on the card. Search snippets are not evidence. Read the actual conversation text before citing it. Long conversations (>30 parts): scan the final 5-10 exchanges separately. Topic transitions cluster in later parts as customers report new issues in existing threads. Classifying by opening topic misses these. Proved SC-1541: conversation classified as "engagement tracking" from opening messages; later parts contained support confirming the pin-add bug.

Cross-reference ambiguous conversations with PostHog. When a bug has a trigger condition visible in PostHog person properties (billing version, subscription state, feature flags), look up the Intercom contact's email in PostHog to confirm they match the bug's profile. Resolves "could be this bug" into confirmed or ruled out. Proved SC-1517: turned 1 confirmed
- 2 ambiguous conversations into 4 confirmed matches across 8 months.
Evidence block: Use python3 box/intercom-evidence.py to generate the structured evidence block for the card. Records search date, index freshness, terms used, and links to every signal conversation. See the Intercom evidence schema in box/shortcut-ops.md for format details. Every card gets one of three blocks: searched/signal, searched/no-signal, or not-searched with rationale. Link all signal conversations classified as evidence, not a sample.

Codebase

Codebase exploration is delegated by default. Multi-file exploration (architecture discovery, feature surface mapping, instrumentation tracing) goes through compose-delegation.py codebase-explore:

python3 box/compose-delegation.py codebase-explore --var QUESTION="..." --var CODEBASE_PATH="/Users/paulyokota/Dev/aero". For the file-writing variant, read the template directly (box/verified-explore-prompt.md, File-writing variant section). Keep to 3 search axes max per delegation.

After collecting: read the specific files the subagent identified that bear on card claims. Primary context is for targeted verification, not broad exploration.

Aggregate budget: 3 direct investigation reads per card. Direct codebase reads in primary context are capped at 3 per card investigation. A "direct investigation read" is one tool call that reads codebase content for the purpose of discovering facts: a Read or git show of a file/section, a Grep that returns file content, or a Bash command that outputs code. File-path-only greps (output_mode: files_with_matches) do not count.

Examples of what fits within 3: a single-file lookup (1 read), a stub-level validation (grep for event name + read emit file = 2 reads), a short chained lookup (2-3 reads). If the investigation would exceed 3, delegate.

Post-delegation verification reads are exempt. Reading files the subagent identified to verify claims against primary sources is expected and does not count against the budget.

Cross-cutting concerns use targeted grep delegation. Cross-cutting patterns (auth, session management, storage) touch many subsystems and timeout in explore delegates (SC-399). Delegate via agenterminal.delegate with a targeted grep-first prompt: instruct the delegate to grep for specific patterns and read only matching files. Do not use primary context for cross-cutting investigation.

When delegation fails (timeout, empty result, bad output): diagnose why. Common causes: too many search axes (split and re-dispatch), timeout too short (increase timeout_ms), wrong search terms (refine and re-dispatch). Do not absorb exploration into primary context. If you're tempted to "just read the files yourself," that's the signal to fix the delegation, not skip it.

Session persistence for codebase delegation. Call agenterminal.delegate, passing:

conversation_id: (the active session conversation)
session_file: ".agent/fill-cards/sc{NNN}.md"
session_label: "codebase-explore"
save_path: ".agent/fill-cards/sc{NNN}-codebase-result.json"

The delegate tool writes the task_id and label to the session file automatically. The server writes the result JSON to save_path on completion, before the push.

Continue other investigation work. When a [Delegate completed] or [Delegate terminal] message arrives for this delegation:

Check the header for saved_to= — if present, the file is on disk.
Verify: ls .agent/fill-cards/sc{NNN}-codebase-result.json
If the file exists, append to the session file:

codebase-explore result saved ({timestamp})

Task ID: {task_id} Path: .agent/fill-cards/sc{NNN}-codebase-result.json Status: {completed|timed_out}
If saved_to is absent from the header, append:

codebase-explore result save failed ({timestamp})

Task ID: {task_id} Expected path: .agent/fill-cards/sc{NNN}-codebase-result.json

PostHog

Query for relevant events. Save queries as insights at query time, not after drafting. Every number on the card needs a linkable saved insight. Use the PostHog event names from preflight output (step 2). If preflight reported NO_MATCH for the product area, use the PostHog MCP event-definitions-list tool to discover events. For richer schema notes beyond event names, grep box/posthog-events.md for the product area section.

Write insight permalinks to the session file. After saving each insight, append to the session file:

## PostHog
- {event name} ({date}): {permalink}

Required manual write. The permalink is the only PostHog artifact that survives compaction.

Slack

Read thread content for cards that originated from Slack ideas. Use:

python3 box/slack-scanner.py --channel C0ADJ4ATJE4 --threads <permalink>

Reply text is in the thread_context array of the JSON output. For ideas with cross-channel links, the scanner's Play 1 output already includes thread content inline — check cross_channel_links[].thread_context before making a second call.

Check for file attachments. The scanner's files array lists images and screenshots. Download and view any screenshots:

curl -s -H "Authorization: Bearer $TOKEN" <url_private> -o /tmp/file.png

For visual/UX issues, screenshots are primary sources that code alone cannot substitute.

Jam recordings

If the card's Slack thread, Intercom evidence, or description contains a Jam URL (jam.dev/c/...), pull structured debug data via Jam MCP. getDetails first (overview + investigation guide), then getNetworkRequests (filter by statusCode="4xx" or "5xx"), getConsoleLogs, and getUserEvents for the reproduction timeline. This surfaces technical root causes that text descriptions miss. Especially valuable for bug cards: the Jam contains the reproduction evidence, not just a link to it.

4b. Investigation accounting

Before synthesizing, produce a structured accounting block:

Codebase files read directly: list with checkmarks and read method (full file, targeted sed range, grep). Separate investigation reads (discovering facts) from verification reads (confirming delegate findings). If a git show returned [...Nch omitted...], the file is partially read: note the omitted range and read it before marking complete. Proved 2026-04-23: delegate line-number labels filled in for unread sections, survived the accounting step, and produced three monitor escalations.
Codebase budget check: count investigation reads (not verification reads). Format:
```
### Budget check
Investigation reads (independent):
- [x] file1.ts (grep for event name)
- [x] file2.ts (read emit site)
Total: 2 (budget: 3)
Verification reads (delegate-identified):
- [x] file3.ts (confirmed delegate finding)
Total: 1 (exempt)
Budget status: WITHIN BUDGET (2/3)
```
If investigation reads exceed 3: flag as OVER BUDGET. The per-read categorization makes the count auditable — a reviewer can check whether a read classified as "verification" was actually independent investigation.
PostHog queries run: count, insights saved
Intercom: searches run (count), conversations read in full (list IDs), selection rationale if filtering from a larger set. When parallel searches run in the same turn, count each Bash call individually, not the turn count. Proved 2026-04-23: 3 turns × 2 parallel calls = 6 searches, miscounted as 5. When reads were batched, verify per-batch: "Batch N of M: [count] classified [inline at #turn / gap]." A gap entry forces explicit acknowledgment rather than silent rounding to "all." Proved 2026-04-28: "all narrated with per-conversation classification" written when 10/29 conversations (batches 5-6) had no individual classification text; monitor caught at high severity.
Slack: threads read
Planned but NOT executed: list anything from the investigation plan that was skipped, with rationale. If empty, write "None." Absorbing a gap into affirmative framing ("captured via X" when X was not a direct read) defeats the accounting purpose. Proved 2026-04-22: CS squad thread read planned, never executed; accounting said "captured via cross-channel link" instead of marking the gap. If a monitor injection about investigation gaps is in context, list the flagged items here and resolve before declaring complete. The accounting checkpoint is where completion bias activates on the verification process itself. Proved 2026-04-24: wrote "None" with a live medium-severity monitor flag about unverified SmartPin Generated/Queued emit sites.
Commitments completed/not: anything promised during investigation that was or was not done

This is not optional — it is the gate between investigation and synthesis. The accounting verifies investigation completeness before synthesis starts. Proved 2026-04-15: monitor flagged investigation-to-synthesis transition; accounting produced after the flag but should have been spontaneous.

Write the accounting block to the session file before advancing to synthesis. Do not proceed to Step 5 until this write is complete. Append verbatim:

## Accounting Block ({timestamp})
{full accounting block}

Then append:

## STATUS: accounting ({timestamp})

A fresh agent reading this file can determine exactly what was investigated without replaying any steps. This is the highest-value recovery artifact.

5. Synthesize

Synthesize into clear, concise product writing (bullet points, no jargon inflation)
Don't add ideas Paul didn't express, don't drop ideas he did
Removal/cleanup cards need a two-pass audit. Pass 1: "where is X mentioned?" (content inventory). Pass 2: "what breaks if we remove X?" (risk scan, e.g. URL::route() calls that would 500 on route removal). These hit different files. The second pass can be a separate delegation.
Black hat test on acceptance criteria. After writing completion criteria, ask: "could someone pass these checks without doing the work?" Narrow definitions and require human approval for judgment calls. Allowlists need closed boundaries.

6. Story links

After filling a card, search Shortcut for non-archived cards that share infrastructure, prerequisites, or overlapping scope. Propose links with the right verb:

Verb	Direction	When to use
`blocks`	Subject blocks Object	Object cannot ship without Subject being done first. Test: "Could Object ship if Subject didn't exist?" If no, it blocks.
`relates to`	Bidirectional	Cards share infrastructure, overlap in scope, or inform each other, but neither is a prerequisite.
`duplicates`	Subject duplicates Object	Used in Find Dupes play. Subject is the loser (gets archived).

Default to relates to unless there's a genuine prerequisite dependency. Present proposed links alongside the card draft for approval.

For Released cards, read the description to confirm whether the fix addressed root cause or a surface symptom before including in the card landscape. Title + Released state is insufficient — ongoing Intercom complaints post-release indicate partial fixes. Proved SC-1798: SC-226 (Released) incorporated from title alone; description would reveal whether SmartPin multi-profile fix was root cause or patch.

For story link creation, use shortcut-mutate.py story-link:

python3 box/shortcut-mutate.py story-link SUBJECT_ID OBJECT_ID "relates to"

The script handles mutation + read-back verification in one call (exit 0 = verified). Route through execute_approved. Proved 2026-04-22: raw curl via reference/tooling-logistics.md recipe works but requires a separate verification call; the wrapper script eliminates that gap.

Verify between each story link mutation. The script's built-in verification covers the link itself. Still run shortcut-cards.py --id SC-NNN between links to confirm cumulative state before creating the next one. Batch momentum erodes per-mutation verification on sequential similar mutations — proved twice 2026-04-01.

Batch mutation rule. When executing 3+ sequential Shortcut mutations (card creates, updates, story links), verify each one independently before proceeding to the next. "Verify" means a GET request confirming the change, not reading the execute_approved response. Narrate the verification result explicitly — not just "verified." This pattern co-activates batch momentum (narration compresses) and proxy trust (API response treated as verification). Proved 2026-04-08: verification dropped by card 3 of 7, caught by human at card 4.

7. Present for approval

Sequence: Write → Read → spot-check → quality gate → approve_content. After writing the draft to a file, read it back before evaluating the quality gate. The quality gate’s checkmarks are claims about the draft content — verify them against what was actually written, not what you intended to write. Spot-check gate: before declaring the quality gate passed, run at least 2 fresh Grep or Read calls against the card’s highest-stakes factual claims (file paths, specific code properties, render order). In-context memory of prior reads is not verification — it’s reconstruction. Proved 2026-04-22 and 2026-04-23: quality gate passed from memory both times; fresh spot-checks confirmed accuracy but process was wrong both times. Select spot-checks targeting the card's most novel assertion and the file that is ground truth for it. Event emission claims: the emit site file. Data model claims: the schema/type file. Don't spot-check adjacent confirming evidence when the highest-risk claim has a specific ground-truth file. Proved 2026-04-24: spot-checked data-layer code (import-processor.ts, smart-pin-repository.ts) instead of the event emit site (csv-import-page.tsx) for a novel "CsvImport lacks generationFrequency" claim.

Present the completed checklist inline first (this is the due diligence evidence). Then present the card description via approve_content with content_type: "card-draft", filename: "scNNN". The user reviews and can edit in the modal.

After approve_content returns, append to the session file:

## Draft approved ({timestamp})
Path: .agenterminal/approved/card-draft/sc{NNN}.md

## STATUS: approved ({timestamp})

The checklist is not presented without the draft, and the draft is not presented without the checklist. Any checklist item that fails must be resolved before presenting. If you can't resolve it, say so explicitly rather than presenting with a known gap.

### Pre-approval checklist

Related cards (all story links): SC-15, SC-68, SC-44
PostHog insights saved: [SmartPin adds](link), [Add distribution](link)
Intercom evidence block: schema-compliant (searched/no-signal/not-searched with date, index freshness, terms, all signal conversation IDs)
Codebase: all file paths on card read directly, no subagent-only claims

Card metadata: product area set, story links created
Severity (bug cards): Sev N (Level) — state the discriminator answer (see `reference/severity-framework.md`). Must be assessed here, not deferred to ship time.
Quality gate:
- Problem before solution
- Scoping-ready
- Verifiable evidence
- Observable done state

In multi-card sessions: copy-paste the checklist structure from card 1 for every subsequent card. Do not reconstruct it from memory — a template resists compression; a remembered format invites shortcuts. The checklist erodes predictably: manual verification drops first, then the checklist itself.

Checklist field definitions:

Card metadata: product area is set on the card (ship script requires it). Story links are proposed with verbs. Both verified before declaring ready to ship.
Related cards: list every card in the story links. All must be checked.
PostHog insights saved: every number cited on the card maps to a saved insight with a permalink. No ad-hoc query results without a saved link.
Intercom evidence block: the card uses the structured Intercom evidence schema from box/shortcut-ops.md. One of three formats: searched/signal, searched/no-signal, or not-searched with rationale. Must include: search date, index freshness date, all search terms used, and links to all signal conversations classified as evidence (not a sample). Generated via python3 box/intercom-evidence.py. Every cited conversation was read directly by the primary instance. IDs must be linked as full Intercom URLs (https://app.intercom.com/a/apps/2t3d8az2/inbox/inbox/all/conversations/{ID}), not plain text. "Verifiable" means the reader can click through, not construct the URL themselves. Subagent-classified conversations not yet verified by the primary instance are listed separately as unverified candidates and do not appear in the card's Evidence section. Verify counts before asserting. Run grep -c on the evidence file or comment file to confirm conversation counts match what you claim in the checklist and card body. Do not recall counts from memory — three count discrepancies in one session (2026-04-22) all came from recalling rather than counting.
Codebase: every file path named on the card was read directly in this session. No claims backed only by subagent output.
Quality gate: the four criteria from the Card Quality Gate section.

8. Verification gate (after approve_content, before ship)

Dispatch two independent verification agents in parallel. State which verifiers you're dispatching and why — if skipping one, state the rationale explicitly. Silent omission is batch momentum (proved 2026-03-25: SC-348's skipped Intercom audit found genuinely valuable evidence when the human insisted on it).

a) Codebase verification (existing):

Compose the prompt: python3 box/compose-delegation.py codebase-verify --var card_draft_path="$SAVED_PATH". Parse the JSON output and pass prompt, output_instructions, and model to agenterminal.delegate. Also pass:
- conversation_id: a unique ID for this delegate (e.g. verify-codebase-{NNN})
- session_file: ".agent/fill-cards/sc{NNN}.md"
- session_label: "codebase-verify"
- save_path: ".agent/fill-cards/sc{NNN}-codebase-verify-result.json"

b) Intercom evidence audit (multi-pass):

Compose the prompt: python3 box/compose-delegation.py intercom-audit --var card_what_section="$(cat /tmp/what.txt)". Parse the JSON output and pass prompt, output_instructions, and model to agenterminal.delegate. Pass ONLY the card's What section — the auditor must reason independently about search terms. Do not pass the Evidence section. Also pass:
- conversation_id: a unique ID for this delegate (e.g. verify-intercom-{NNN})
- session_file: ".agent/fill-cards/sc{NNN}.md"
- session_label: "intercom-audit"
- save_path: ".agent/fill-cards/sc{NNN}-intercom-audit-result.json"

Dispatch both in the same message (they have no dependency). Each delegate must have a unique conversation_id — the server rejects a second delegate with an already-active conversation_id. Any unique string works (the server creates the conversation on the fly).

After dispatching both delegates, append to the session file:

## STATUS: verifying ({timestamp})

The delegate tool writes each task_id and label to the session file automatically. The server writes each result JSON to save_path on completion, before the push.

Narrate the dispatch. Tell the user what you dispatched and roughly how long verification takes ("two verification agents running — codebase fact-check and Intercom audit — results will arrive as push messages"). Silence >2 minutes removes the user's ability to steer. Proved 2026-04-03: 22-minute silence, user had to ask "where are we."

As each [Delegate completed] or [Delegate terminal] push arrives:

Check the header for saved_to= — if present, the file is on disk.
Verify: ls {save_path}
If the file exists, append to the session file:

{session_label} result saved ({timestamp})

Task ID: {task_id} Path: {save_path} Status: {completed|timed_out}
If saved_to is absent from the header, append:

{session_label} result save failed ({timestamp})

Task ID: {task_id} Expected path: {save_path}
Codebase: If all_verified: true — pick 1-2 claim numbers from the delegate's report and run a fresh Read/Grep against the delegate's specific source_file and exact_evidence. Pre-delegation reads that confirm the same underlying facts do not count — the spot-check verifies the delegate's work product, not the facts themselves. Proved 2026-04-23: SC-1760, pre-delegation greps confirmed top-level claims but were substituted for post-collect verification of the delegate's 17 specific claim-evidence pairs. If all_verified: false — read the cited source_file yourself for each failed claim before presenting. The verifier's verified: false is a pointer to investigate, not a verdict to forward. If you already read the file earlier in the session, re-check your own reading against the claim — don't restructure your assessment around the verifier's conclusion. Then present findings. User decides whether to fix or ship as-is. Proved 2026-04-06: SC-1172, verifier flagged simplified param shape as failure; agent forwarded verdict without checking own prior read; human caught it. If the codebase verifier times out: do not simply proceed on the basis of files you read earlier. Run the inverse check: list every factual claim in the Architecture Context, then confirm each was verified against a primary source in this session. Any unverified claim is a reason to re-dispatch a narrower verifier (one codebase, fewer claims) or read the file yourself. For dense Architecture Context sections (20+ claims, multiple codebases), prefer splitting into two focused verifier dispatches from the start. Include negative assertions in the claim list. Claims like "no X exists in the codebase" from the delegate's not_found section need at least one targeted grep to convert trusted absence into verified absence. The inverse check pattern naturally lists positive claims but lets negative delegate claims pass unchecked. Proved 2026-04-20: "no bot actions" assertion from delegate passed unverified through the inverse check. Proved twice 2026-04-07: SC-64 and SC-1288 both timed out at 10 min.
Intercom audit: Two phases, in order. Do not assess before reading. Phase 1 — Read all. Run the pre-composed read_commands from the auditor's output sequentially. These are --read ID1,ID2,ID3 commands grouping all IDs into batches of 3-5. Run every command; do not skip, truncate, or substitute partial reads (head, limit, first-N-lines). Do not assess relevance during this phase — read first, think second. The auditor's snippets and relevance classifications are proxies — you cannot judge what a conversation adds from a one-line snippet. Proved 2026-04-15: three effort substitution activations on the same surface (partial reads, filtering to "strongest 6," pre-judging from snippets). Human corrected all three. Phase 1 gate — count and narrate before proceeding. After each batch command, state: "Batch N of M: [conversation IDs read]. [Per-conversation signal classification: strong/weak/tangential + what it adds]." Then state the running count: "Read N of M audit batch commands." If N < M, run the remaining before proceeding to Phase 2. The per-batch summary serves two functions: it makes batch momentum visible (narration dropping from substantive to silent is the tell), and it forces per-conversation judgment rather than inheriting the auditor's relevance labels. Proved 2026-04-20 (session dd8e8dc1): assessed batch 1 substantively, went silent for batches 2-6 despite naming the completion bias risk in writing before starting. Also proved 2026-04-15 (session 278a42e5): advisory "run every command" was in the play text, read 6/27, concluded from proxies. Full reads added 10 meaningful conversations. Phase 2 — Assess what the evidence says about the recommendation. Only after reading all conversations, determine two things: (1) which add distinct information to the evidence block, and (2) whether the evidence collectively says anything about whether the card's recommendation is correct. Artifact completeness (how many conversations to link) is not the only output of verification — investigation depth (what do these conversations mean for the card) matters more. Link every signal conversation classified as evidence per the schema ("link all signal conversations, not a sample"). Present findings to user. User decides whether to add them to the card. After assessment, regenerate the Intercom evidence block with all signal conversation IDs from filter + audit. Noise exclusions from the Sonnet-mode filter carry forward; audit additions are included. Exclude only false positives (wrong feature, wrong product). The schema handles

10 via the comment format. Don't filter to 'strongest' — selectivity instinct conflicts with schema completeness. Proved 2026-04-20: presented 9 curated of 29 found; human redirected 'follow the schema fully.' If total_found == 0 — note "Multi-pass audit: no additional signal" in the checklist. Proved 2026-04-06: assessed 9 conversations from auditor snippets, recommended ship without reading any. Human caught it twice. Proved 2026-04-14: read 5 of 22, declared rest redundant from snippets. Human challenge unlocked 3 material findings (Safari confirmed broken by eng, returning user case, international domains).
Spot-check before recommending ship. Before saying "ship as-is," make at least one Read or Grep call that could falsify the recommendation. Delegate reports that read well are exactly when proxy trust activates. Proved 2026-04-03: recommended ship based entirely on delegate output without a single primary-source check.

Do not auto-fix failed codebase claims or auto-add Intercom conversations. Present findings, let the user decide.

9. Re-approval (if the card changed after initial `approve_content`)

If the card was modified after the initial approve_content — verification fixes, evidence additions, schema restructuring, any edits — re-present via approve_content before proceeding to ship. The initial approval covered the pre-verification draft, not the post-fix version. The user needs to see and approve what will actually be pushed.

Before editing, enumerate ALL changes. List every change from both (1) verifier failures and (2) audit findings, numbered. Then edit. Check each off. If the edit count < the list count, something was dropped. "Just fixing line numbers" after an audit that found substantive new content is effort substitution — the assessment scope must carry through to the execution scope. Proved 2026-04-20: SC-1657, Anna's conversation assessed as "important context" then silently excluded from the re-draft.

Re-approval cycle: Read → Edit → Read → Submit. Read the saved file at the saved_path from the previous approval. Edit the specific change. Re-read to verify no other instances of the problem remain (e.g., grep for all local file paths, not just the one flagged). Then submit. Do not reconstruct the card content from context memory — context memory is a proxy for the file, and proxy trust activates on your own prior output. Proved 2026-04-08: SC-1311.

Quantitative consistency gate. When any count, date, or quantitative claim changes in the re-edit, grep the full document for all instances of the old value before re-submitting. Downstream claims that derive from the changed number (e.g., "12 of 21" when the total changed to 30) need recalculation, not just find-and-replace. This is a mechanical pre-step, not a post-flag recovery. Proved 2026-05-15: evidence block updated from 21 to 30 conversations; "12 of 21" in Architecture Context and "12 Intercom conversations from ~11 distinct users" in What section left stale; required a third approval cycle after monitor flag.

10. Completion (after verification passes or user accepts gaps)

All four steps, every time. This is mechanical, not conditional.

If the title needs to change, plan the title mutation alongside the ship command before executing either. shortcut-ship.py handles description + state + unassign but not title. A separate PUT is needed. Build the complete mutation set first, then execute sequentially with verification between each. Proved 2026-04-08: SC-1238 shipped with old title, required a follow-up mutation to correct.

Run python3 box/ship-gate.py enter before the first production execute_approved call (the hook will block production mutations until the plan is declared). The dry-run in step 1 below is exempt.

Pre-ship check (dry-run). Re-fetches the card's current state from Shortcut and shows what the ship command will change. Present the output to the user and wait for go-ahead before proceeding.
```
python3 box/shortcut-ship.py SC-NNN <saved_path> --severity N --dry-run
```
- Exit 0: clean. Present the output, wait for go-ahead.
- Exit 3: backward state transition — the card has moved past Backlog since you last fetched it (e.g. someone moved it to Near Term, In Build, or In Test while you were investigating). Present the warning. Suggest --description-only to update content without changing state. Do not ship without explicit user confirmation of the state change.
Ship. Submit via agenterminal.execute_approved:
```
python3 box/shortcut-ship.py SC-NNN <saved_path>
```
The script strips YAML frontmatter automatically. The saved file at .agenterminal/approved/card-draft/scNNN.md is compaction insurance and audit trail. The script handles all four operations in a single call: update description, move to Backlog, unassign owners, then re-fetch and verify (state, owners, section headers, description length). Exit code 0 = verified, 1 = verification failed, 2 = mutation failed, 3 = backward state transition blocked (use --force to override). Do not construct payloads or curl commands manually.
Evidence comment (when >10 conversations). If the Intercom evidence block says "full list in comment," post the comment via shortcut-comment.py SC-NNN "text". Route through execute_approved. This is part of shipping, not a follow-up. Verify the comment appears on the card. Proved 2026-04-28: declared "shipped" with evidence comment still pending; human correction required.
If any new PostHog event names were discovered during investigation, add them to the appropriate section in box/posthog-events.md (Grep for the section header). If the product area has no section, create one at the bottom and add it to the index at the top of the file.

Idempotency

Re-fetch current description from Shortcut before each update (don't use stale cache)
Only update sections Paul explicitly changed — preserve everything else

Parallel Fill Mode (--lead)

When $ARGUMENTS is --lead, you are the lead instance in a two-instance parallel fill session.

When to use parallel fill

3-4 cards to fill, at least two product-area clusters, agenterminal available. For 1-2 cards, single-instance /fill-cards SC-NNN is simpler. For 5+ cards, run multiple rounds rather than overloading one session.

Card limit

Three straightforward cards per instance. The original 2-card limit was calibrated for ~200K context where both instances compacted at card 3. With 1M context, both instances completed 3 cards without compaction. The qualifier matters: cards were pre-selected for low ambiguity (narrow scope, clear investigation path). High-ambiguity cards (pricing strategy, feature sunset, cross-cutting architecture) consume significantly more context per card. For mixed batches, count high-ambiguity cards as 2 toward the limit.

Splitting

Cluster by product area, not random. Architecture knowledge compounds across cards in the same area: the second card is faster because the mental model from the first card carries over. If there's a natural "dense cluster + everything else" shape, that's the split. Example: SmartPin cluster (SC-135, SC-51, SC-68, SC-132) vs mixed bag (SC-90, SC-118, SC-131).

Lead coordination protocol

Find candidates using Step 1 above.
Propose the product-area cluster split to Paul.
Create an agenterminal conversation thread and brief both assignments.
Join the thread and post the briefing before starting pre-flight. Do not begin pre-flight until the partner instance has acknowledged in the thread.
Work your own assigned cards using Steps 2-9 above.
Own session-end documentation for this parallel session.

Coordination rules

The conversation thread is for the initial briefing and session-end sync. Don't post running status updates to the thread. Every message costs context in both instances for information the user already has from the session panes.
When a conversation notification arrives, respond before continuing tool work. A 10-second acknowledgment costs less than the trust damage from silence. This applies even mid-investigation.
Intercom search index refresh only needs to happen once. First instance to start runs it; second instance skips.
Paul sees both session panes directly and reviews/approves cards in whatever order he prefers.

Compaction risk

Two instances means twice the compaction exposure. The compaction gate hooks are the primary protection: if an instance compacts, its tools lock until explicitly resumed. The 3-card limit accounts for this; don't exceed it even if context feels comfortable mid-session.

Session-end

Each instance saves its own log observations and session notes. If an instance compacts before session-end, its takeaways are lost. This is acceptable for 3 cards of observations. The other instance and the human together can reconstruct what matters.

Parallel Fill Mode (--partner)

When $ARGUMENTS is --partner, you are the partner instance in a two-instance parallel fill session.

Partner coordination protocol

Join the agenterminal conversation thread.
Read the lead's briefing to get your assigned cards.
Acknowledge in the thread before starting pre-flight. The lead instance waits for this before beginning its own work.
Work your assigned cards using Steps 2-9 above.

Coordination rules

Same rules as the lead instance:

Thread is for briefing and session-end sync only. No running status updates.
Respond to conversation notifications before continuing tool work.
Skip Intercom search index refresh if the lead already ran it.

Everything else is the standard per-card protocol. Pre-flight, investigation, verification bar, quality gate, approval flow, completion steps: all identical to single-card mode.

name	fill-cards
description	Investigation-driven card grooming — investigate across data sources, synthesize findings into card content, present for approval
disable-model-invocation	true

Fill Cards

Supports voice mode if available.

Arguments

$ARGUMENTS determines the mode:

SC-NNN (default): Single card, standard investigation. Run the full protocol below on the named card.
--lead: Parallel fill (3a) lead role. Find candidates, propose a product-area cluster split, brief the partner instance in an agenterminal thread, work your own cards, own session-end documentation.
--partner: Parallel fill (3a) partner role. Join the agenterminal thread, read your assignment, acknowledge, then work your assigned cards.

All three modes share the same per-card investigation protocol. The modes differ only in how cards are selected and how coordination happens.

Constraints

Mutation gate: A PreToolUse hook blocks all Slack and Shortcut mutations through Bash. Route through agenterminal.execute_approved or present the command for the user to run.
Human-in-the-loop: Present one card at a time with full content rendered. Wait for Paul's decision. Don't batch-execute without review.
Context protection: Delegate codebase exploration and high-volume Intercom filtering to subagents. Primary context is for targeted verification and synthesis, not broad exploration.
Mutation cap: 25 mutations per run. Each description update, state change, story link, or archive = 1. Finish the current item before checking the cap.
Rate limiting: 0.5s delay between sequential API calls. Respect Retry-After.
Before Shortcut or Slack API calls: use the wrapper scripts named in the play steps (Intercom: intercom-search.py, intercom-evidence.py; Shortcut: see Mutation Scripts table in box/shortcut-ops.md). If no wrapper exists for the specific operation, grep reference/tooling-logistics.md for the recipe. Don't re-derive payload shapes or endpoint paths.

Steps

0. Resume check (single-card mode only)

Before anything else:

ls .agent/fill-cards/ 2>/dev/null

If sc{NNN}.md exists for the card you are about to work:

Read it in full.
Re-set TodoWrite items from the Investigation Plan section — these do not survive compaction and must be restored for step 4b accounting to work.
Report current state to the user based only on what is written in the session file. Do not use the compaction summary. Name which steps completed, what was found, and where the session ended.
Ask whether to resume from that point or start fresh.
Wait for explicit direction before proceeding.

If the user chooses "start fresh": archive the existing file before proceeding:

mv .agent/fill-cards/sc{NNN}.md .agent/fill-cards/sc{NNN}.bak.$(date +%s).md

Then continue to Step 1. Preflight will create a new session file.

If no session file exists, continue to Step 1.

1. Find candidates (single-card mode)

If $ARGUMENTS is SC-NNN, skip candidate discovery — you already have the card.

Otherwise, find cards with empty sections:

python3 box/shortcut-cards.py --state "In Definition" --empty-sections --summary

This ranks by most empty sections first, then oldest.

2. Pre-flight

Run these before starting investigation. Not optional.

Fetch the card: python3 box/shortcut-cards.py --id SC-NNN --description. Check the workflow state: Need Requirements cards get context to support a stakeholder conversation. In Definition cards get the full treatment (investigation, solution sketch, all sections).
Bug or feature? Bug cards use a leaner template (skip blank Monetization, UI, Reporting, Release sections).
New feature or extension? Frame as extension when possible. Identify the closest existing feature surface.
- Same-day sibling cards: If other cards in the same product area were created the same day (brainstorm batch, investigation batch, or related bug filings), read all of them — not just those already in story links. Scope overlap and disambiguation needs are common and not caught by the story-links check alone. Proved SC-1517: SC-1518 created same day, same area, required explicit disambiguation but wasn't in story links.
- Check In Build cards in the same product area. A card actively changing pricing or feature behavior (e.g. SC-46 adding credit costs) invalidates assumptions on sibling cards. Fetch In Build cards for the product area before starting investigation.
Visual/UX cards: Ask for screenshots of the actual behavior before investigating the code mechanism. Code tells you what can happen; a screenshot tells you what does happen. Screenshots are the primary source for visual issues; code is secondary.
Run pre-flight checks (hard gate):
```
python3 box/preflight.py --product-area AREA
```
Replace AREA with the card's product area (e.g. SMARTPIN, TURBO). This verifies all tokens, DB connectivity, sync freshness, and extracts known PostHog event names for the product area. Results are automatically logged to the preflight_log audit table.

If preflight exits non-zero, stop. Do not proceed to investigation. A stale Intercom index means any "no signal" finding is unreliable. Report the failure to the user and wait for resolution before continuing. This is not advisory — investigating on stale data produces false negatives that are silent, permanent, and undetectable after the fact.

Preflight output includes PostHog event names for the product area. Use these in step 4 instead of reading box/posthog-events.md separately.

After preflight passes, scan .claude/rules/product-knowledge.md for entries relevant to this card's product area. Note any feature state, entity scope, or vocabulary facts that apply to the investigation.

Run preflight with session file. Pass --session-file .agent/fill-cards/sc{NNN}.md to the preflight command. Preflight creates the file if it does not exist:

python3 box/preflight.py --product-area AREA --card-id SC-NNN --session-file .agent/fill-cards/sc{NNN}.md

3. Plan the investigation

Write the plan to the session file. After setting todos, append to the session file:

## Investigation Plan
- {source 1}: {question}
- {source 2}: {question}

This is the cheapest recovery artifact: a fresh agent can restore todos from this section without replaying any investigation steps.

## Investigation Plan
- Intercom: delegate filter (Sonnet mode, >8 expected candidates)
- Codebase: delegate explore (3 axes: autosave architecture, event emission, FormResetHandler)
- PostHog: direct (iterative queries)

or:

## Investigation Plan
- Intercom: direct search (≤8 expected, narrow keyword)
- Codebase: direct (2 reads: grep for event name, read emit file)
- PostHog: direct (single funnel query)

4. Investigate

Intercom

Three-phase action through scripts. No raw SQL.

Search: python3 box/intercom-search.py "terms" (add --since, --no-canned, --limit as needed). The script has a mechanical freshness gate — it blocks with exit code 2 if the index is stale (>36h).

Delegation decision tree (based on expected volume from Step 3 plan):
- ≤8 expected results → Search directly (intercom-search.py in primary context). Primary reads all results. No delegation overhead.
- >8 expected results → Delegate intercom-filter (Sonnet mode): python3 box/compose-delegation.py intercom-filter --var SEARCH_GOAL="..." --var NOISE_PATTERNS="..." --var MODEL_OVERRIDE=sonnet. Delegate searches, reads, classifies. Primary reads SIGNAL conversations only. Intercom audit (Step 8) catches missed signal.
- Unknown volume → Default to delegation (Sonnet mode). The >8 path is safe for small result sets (just more overhead). The ≤8 path is unsafe for large result sets (fills primary context).
For multi-phase investigations with file_path, read the template directly (box/intercom-filter-prompt.md).

Session persistence for intercom-filter delegation. Call agenterminal.delegate, passing:
- conversation_id: (the active session conversation)
- session_file: ".agent/fill-cards/sc{NNN}.md"
- session_label: "intercom-filter"
- save_path: ".agent/fill-cards/sc{NNN}-intercom-filter-result.json"
The delegate tool writes the task_id and label to the session file automatically. The server writes the result JSON to save_path on completion, before the push.

Continue other investigation work. When a [Delegate completed] or [Delegate terminal] message arrives for this delegation:
1. Check the header for saved_to= — if present, the file is on disk.
2. Verify: ls .agent/fill-cards/sc{NNN}-intercom-filter-result.json
3. If the file exists, append to the session file:
  
  intercom-filter result saved ({timestamp})
  
  Task ID: {task_id} Path: .agent/fill-cards/sc{NNN}-intercom-filter-result.json Status: {completed|timed_out}
4. If saved_to is absent from the header, append:
  
  intercom-filter result save failed ({timestamp})
  
  Task ID: {task_id} Expected path: .agent/fill-cards/sc{NNN}-intercom-filter-result.json
Vocabulary shift (after first pass). Card titles anchor search terms toward product framing ("wrong site," "duplicate draft"). Users say "combined," "jumbled," "can't find," "manually made copies." After the first search pass, pause and ask: "how would a user describe this to support?" Run a second pass with user-vocabulary terms before declaring search complete. Logged 5 times (SC-201, SC-874, SC-1108, SC-67, SC-984) — the audit agent catches it every time because it only sees the What section, not the card title.

Vocabulary category checklist (second pass). Before declaring Intercom search complete, check each category:
- Symptom language ("scrambled," "mixed up," "wrong order," "lost")
- Use-case framing ("editorial calendar," "campaign," "seasonal content")
- Non-English terms if the feature has international users (German, Spanish)
- Workaround language ("manually," "one by one," "how do I fix") The primary search anchors to product framing from the card title; these categories reach users who describe the need, not the feature. Proved 2026-04-22: primary search (6 queries, English product terms) found 1 conversation; audit (20 queries including German, use-case framing) found 5 additional.
Check shipped-card timeline. Before citing Intercom conversations as current evidence, check whether a related card shipped since those conversations. Use --since to filter to post-ship dates. Pre-ship conversations about a now-fixed issue inflate the evidence count for a problem that no longer exists. Proved 2026-04-08: multiple Jan-Feb credit purchase complaints predated SC-470 (shipped Mar 18).
Read: python3 box/intercom-search.py --read ID1,ID2 for the conversations going on the card. Search snippets are not evidence. Read the actual conversation text before citing it. Long conversations (>30 parts): scan the final 5-10 exchanges separately. Topic transitions cluster in later parts as customers report new issues in existing threads. Classifying by opening topic misses these. Proved SC-1541: conversation classified as "engagement tracking" from opening messages; later parts contained support confirming the pin-add bug.

Cross-reference ambiguous conversations with PostHog. When a bug has a trigger condition visible in PostHog person properties (billing version, subscription state, feature flags), look up the Intercom contact's email in PostHog to confirm they match the bug's profile. Resolves "could be this bug" into confirmed or ruled out. Proved SC-1517: turned 1 confirmed
- 2 ambiguous conversations into 4 confirmed matches across 8 months.
Evidence block: Use python3 box/intercom-evidence.py to generate the structured evidence block for the card. Records search date, index freshness, terms used, and links to every signal conversation. See the Intercom evidence schema in box/shortcut-ops.md for format details. Every card gets one of three blocks: searched/signal, searched/no-signal, or not-searched with rationale. Link all signal conversations classified as evidence, not a sample.

Codebase

After collecting: read the specific files the subagent identified that bear on card claims. Primary context is for targeted verification, not broad exploration.

Post-delegation verification reads are exempt. Reading files the subagent identified to verify claims against primary sources is expected and does not count against the budget.

Session persistence for codebase delegation. Call agenterminal.delegate, passing:

conversation_id: (the active session conversation)
session_file: ".agent/fill-cards/sc{NNN}.md"
session_label: "codebase-explore"
save_path: ".agent/fill-cards/sc{NNN}-codebase-result.json"

The delegate tool writes the task_id and label to the session file automatically. The server writes the result JSON to save_path on completion, before the push.

Continue other investigation work. When a [Delegate completed] or [Delegate terminal] message arrives for this delegation:

Check the header for saved_to= — if present, the file is on disk.
Verify: ls .agent/fill-cards/sc{NNN}-codebase-result.json
If the file exists, append to the session file:

codebase-explore result saved ({timestamp})

Task ID: {task_id} Path: .agent/fill-cards/sc{NNN}-codebase-result.json Status: {completed|timed_out}
If saved_to is absent from the header, append:

codebase-explore result save failed ({timestamp})

Task ID: {task_id} Expected path: .agent/fill-cards/sc{NNN}-codebase-result.json

PostHog

Write insight permalinks to the session file. After saving each insight, append to the session file:

## PostHog
- {event name} ({date}): {permalink}

Required manual write. The permalink is the only PostHog artifact that survives compaction.

Slack

Read thread content for cards that originated from Slack ideas. Use:

python3 box/slack-scanner.py --channel C0ADJ4ATJE4 --threads <permalink>

Check for file attachments. The scanner's files array lists images and screenshots. Download and view any screenshots:

curl -s -H "Authorization: Bearer $TOKEN" <url_private> -o /tmp/file.png

For visual/UX issues, screenshots are primary sources that code alone cannot substitute.

Jam recordings

4b. Investigation accounting

Before synthesizing, produce a structured accounting block:

Codebase files read directly: list with checkmarks and read method (full file, targeted sed range, grep). Separate investigation reads (discovering facts) from verification reads (confirming delegate findings). If a git show returned [...Nch omitted...], the file is partially read: note the omitted range and read it before marking complete. Proved 2026-04-23: delegate line-number labels filled in for unread sections, survived the accounting step, and produced three monitor escalations.
Codebase budget check: count investigation reads (not verification reads). Format:
```
### Budget check
Investigation reads (independent):
- [x] file1.ts (grep for event name)
- [x] file2.ts (read emit site)
Total: 2 (budget: 3)
Verification reads (delegate-identified):
- [x] file3.ts (confirmed delegate finding)
Total: 1 (exempt)
Budget status: WITHIN BUDGET (2/3)
```
If investigation reads exceed 3: flag as OVER BUDGET. The per-read categorization makes the count auditable — a reviewer can check whether a read classified as "verification" was actually independent investigation.
PostHog queries run: count, insights saved
Intercom: searches run (count), conversations read in full (list IDs), selection rationale if filtering from a larger set. When parallel searches run in the same turn, count each Bash call individually, not the turn count. Proved 2026-04-23: 3 turns × 2 parallel calls = 6 searches, miscounted as 5. When reads were batched, verify per-batch: "Batch N of M: [count] classified [inline at #turn / gap]." A gap entry forces explicit acknowledgment rather than silent rounding to "all." Proved 2026-04-28: "all narrated with per-conversation classification" written when 10/29 conversations (batches 5-6) had no individual classification text; monitor caught at high severity.
Slack: threads read
Planned but NOT executed: list anything from the investigation plan that was skipped, with rationale. If empty, write "None." Absorbing a gap into affirmative framing ("captured via X" when X was not a direct read) defeats the accounting purpose. Proved 2026-04-22: CS squad thread read planned, never executed; accounting said "captured via cross-channel link" instead of marking the gap. If a monitor injection about investigation gaps is in context, list the flagged items here and resolve before declaring complete. The accounting checkpoint is where completion bias activates on the verification process itself. Proved 2026-04-24: wrote "None" with a live medium-severity monitor flag about unverified SmartPin Generated/Queued emit sites.
Commitments completed/not: anything promised during investigation that was or was not done

Write the accounting block to the session file before advancing to synthesis. Do not proceed to Step 5 until this write is complete. Append verbatim:

## Accounting Block ({timestamp})
{full accounting block}

Then append:

## STATUS: accounting ({timestamp})

A fresh agent reading this file can determine exactly what was investigated without replaying any steps. This is the highest-value recovery artifact.

5. Synthesize

Synthesize into clear, concise product writing (bullet points, no jargon inflation)
Don't add ideas Paul didn't express, don't drop ideas he did
Removal/cleanup cards need a two-pass audit. Pass 1: "where is X mentioned?" (content inventory). Pass 2: "what breaks if we remove X?" (risk scan, e.g. URL::route() calls that would 500 on route removal). These hit different files. The second pass can be a separate delegation.
Black hat test on acceptance criteria. After writing completion criteria, ask: "could someone pass these checks without doing the work?" Narrow definitions and require human approval for judgment calls. Allowlists need closed boundaries.

6. Story links

After filling a card, search Shortcut for non-archived cards that share infrastructure, prerequisites, or overlapping scope. Propose links with the right verb:

Verb	Direction	When to use
`blocks`	Subject blocks Object	Object cannot ship without Subject being done first. Test: "Could Object ship if Subject didn't exist?" If no, it blocks.
`relates to`	Bidirectional	Cards share infrastructure, overlap in scope, or inform each other, but neither is a prerequisite.
`duplicates`	Subject duplicates Object	Used in Find Dupes play. Subject is the loser (gets archived).

Default to relates to unless there's a genuine prerequisite dependency. Present proposed links alongside the card draft for approval.

For story link creation, use shortcut-mutate.py story-link:

python3 box/shortcut-mutate.py story-link SUBJECT_ID OBJECT_ID "relates to"

7. Present for approval

After approve_content returns, append to the session file:

## Draft approved ({timestamp})
Path: .agenterminal/approved/card-draft/sc{NNN}.md

## STATUS: approved ({timestamp})

### Pre-approval checklist

Related cards (all story links): SC-15, SC-68, SC-44
PostHog insights saved: [SmartPin adds](link), [Add distribution](link)
Intercom evidence block: schema-compliant (searched/no-signal/not-searched with date, index freshness, terms, all signal conversation IDs)
Codebase: all file paths on card read directly, no subagent-only claims

Card metadata: product area set, story links created
Severity (bug cards): Sev N (Level) — state the discriminator answer (see `reference/severity-framework.md`). Must be assessed here, not deferred to ship time.
Quality gate:
- Problem before solution
- Scoping-ready
- Verifiable evidence
- Observable done state

Checklist field definitions:

Card metadata: product area is set on the card (ship script requires it). Story links are proposed with verbs. Both verified before declaring ready to ship.
Related cards: list every card in the story links. All must be checked.
PostHog insights saved: every number cited on the card maps to a saved insight with a permalink. No ad-hoc query results without a saved link.
Intercom evidence block: the card uses the structured Intercom evidence schema from box/shortcut-ops.md. One of three formats: searched/signal, searched/no-signal, or not-searched with rationale. Must include: search date, index freshness date, all search terms used, and links to all signal conversations classified as evidence (not a sample). Generated via python3 box/intercom-evidence.py. Every cited conversation was read directly by the primary instance. IDs must be linked as full Intercom URLs (https://app.intercom.com/a/apps/2t3d8az2/inbox/inbox/all/conversations/{ID}), not plain text. "Verifiable" means the reader can click through, not construct the URL themselves. Subagent-classified conversations not yet verified by the primary instance are listed separately as unverified candidates and do not appear in the card's Evidence section. Verify counts before asserting. Run grep -c on the evidence file or comment file to confirm conversation counts match what you claim in the checklist and card body. Do not recall counts from memory — three count discrepancies in one session (2026-04-22) all came from recalling rather than counting.
Codebase: every file path named on the card was read directly in this session. No claims backed only by subagent output.
Quality gate: the four criteria from the Card Quality Gate section.

8. Verification gate (after approve_content, before ship)

a) Codebase verification (existing):

Compose the prompt: python3 box/compose-delegation.py codebase-verify --var card_draft_path="$SAVED_PATH". Parse the JSON output and pass prompt, output_instructions, and model to agenterminal.delegate. Also pass:
- conversation_id: a unique ID for this delegate (e.g. verify-codebase-{NNN})
- session_file: ".agent/fill-cards/sc{NNN}.md"
- session_label: "codebase-verify"
- save_path: ".agent/fill-cards/sc{NNN}-codebase-verify-result.json"

b) Intercom evidence audit (multi-pass):

Compose the prompt: python3 box/compose-delegation.py intercom-audit --var card_what_section="$(cat /tmp/what.txt)". Parse the JSON output and pass prompt, output_instructions, and model to agenterminal.delegate. Pass ONLY the card's What section — the auditor must reason independently about search terms. Do not pass the Evidence section. Also pass:
- conversation_id: a unique ID for this delegate (e.g. verify-intercom-{NNN})
- session_file: ".agent/fill-cards/sc{NNN}.md"
- session_label: "intercom-audit"
- save_path: ".agent/fill-cards/sc{NNN}-intercom-audit-result.json"

After dispatching both delegates, append to the session file:

## STATUS: verifying ({timestamp})

The delegate tool writes each task_id and label to the session file automatically. The server writes each result JSON to save_path on completion, before the push.

Narrate the dispatch. Tell the user what you dispatched and roughly how long verification takes ("two verification agents running — codebase fact-check and Intercom audit — results will arrive as push messages"). Silence >2 minutes removes the user's ability to steer. Proved 2026-04-03: 22-minute silence, user had to ask "where are we."

As each [Delegate completed] or [Delegate terminal] push arrives:

Check the header for saved_to= — if present, the file is on disk.
Verify: ls {save_path}
If the file exists, append to the session file:

{session_label} result saved ({timestamp})

Task ID: {task_id} Path: {save_path} Status: {completed|timed_out}
If saved_to is absent from the header, append:

{session_label} result save failed ({timestamp})

Task ID: {task_id} Expected path: {save_path}
Codebase: If all_verified: true — pick 1-2 claim numbers from the delegate's report and run a fresh Read/Grep against the delegate's specific source_file and exact_evidence. Pre-delegation reads that confirm the same underlying facts do not count — the spot-check verifies the delegate's work product, not the facts themselves. Proved 2026-04-23: SC-1760, pre-delegation greps confirmed top-level claims but were substituted for post-collect verification of the delegate's 17 specific claim-evidence pairs. If all_verified: false — read the cited source_file yourself for each failed claim before presenting. The verifier's verified: false is a pointer to investigate, not a verdict to forward. If you already read the file earlier in the session, re-check your own reading against the claim — don't restructure your assessment around the verifier's conclusion. Then present findings. User decides whether to fix or ship as-is. Proved 2026-04-06: SC-1172, verifier flagged simplified param shape as failure; agent forwarded verdict without checking own prior read; human caught it. If the codebase verifier times out: do not simply proceed on the basis of files you read earlier. Run the inverse check: list every factual claim in the Architecture Context, then confirm each was verified against a primary source in this session. Any unverified claim is a reason to re-dispatch a narrower verifier (one codebase, fewer claims) or read the file yourself. For dense Architecture Context sections (20+ claims, multiple codebases), prefer splitting into two focused verifier dispatches from the start. Include negative assertions in the claim list. Claims like "no X exists in the codebase" from the delegate's not_found section need at least one targeted grep to convert trusted absence into verified absence. The inverse check pattern naturally lists positive claims but lets negative delegate claims pass unchecked. Proved 2026-04-20: "no bot actions" assertion from delegate passed unverified through the inverse check. Proved twice 2026-04-07: SC-64 and SC-1288 both timed out at 10 min.
Intercom audit: Two phases, in order. Do not assess before reading. Phase 1 — Read all. Run the pre-composed read_commands from the auditor's output sequentially. These are --read ID1,ID2,ID3 commands grouping all IDs into batches of 3-5. Run every command; do not skip, truncate, or substitute partial reads (head, limit, first-N-lines). Do not assess relevance during this phase — read first, think second. The auditor's snippets and relevance classifications are proxies — you cannot judge what a conversation adds from a one-line snippet. Proved 2026-04-15: three effort substitution activations on the same surface (partial reads, filtering to "strongest 6," pre-judging from snippets). Human corrected all three. Phase 1 gate — count and narrate before proceeding. After each batch command, state: "Batch N of M: [conversation IDs read]. [Per-conversation signal classification: strong/weak/tangential + what it adds]." Then state the running count: "Read N of M audit batch commands." If N < M, run the remaining before proceeding to Phase 2. The per-batch summary serves two functions: it makes batch momentum visible (narration dropping from substantive to silent is the tell), and it forces per-conversation judgment rather than inheriting the auditor's relevance labels. Proved 2026-04-20 (session dd8e8dc1): assessed batch 1 substantively, went silent for batches 2-6 despite naming the completion bias risk in writing before starting. Also proved 2026-04-15 (session 278a42e5): advisory "run every command" was in the play text, read 6/27, concluded from proxies. Full reads added 10 meaningful conversations. Phase 2 — Assess what the evidence says about the recommendation. Only after reading all conversations, determine two things: (1) which add distinct information to the evidence block, and (2) whether the evidence collectively says anything about whether the card's recommendation is correct. Artifact completeness (how many conversations to link) is not the only output of verification — investigation depth (what do these conversations mean for the card) matters more. Link every signal conversation classified as evidence per the schema ("link all signal conversations, not a sample"). Present findings to user. User decides whether to add them to the card. After assessment, regenerate the Intercom evidence block with all signal conversation IDs from filter + audit. Noise exclusions from the Sonnet-mode filter carry forward; audit additions are included. Exclude only false positives (wrong feature, wrong product). The schema handles

10 via the comment format. Don't filter to 'strongest' — selectivity instinct conflicts with schema completeness. Proved 2026-04-20: presented 9 curated of 29 found; human redirected 'follow the schema fully.' If total_found == 0 — note "Multi-pass audit: no additional signal" in the checklist. Proved 2026-04-06: assessed 9 conversations from auditor snippets, recommended ship without reading any. Human caught it twice. Proved 2026-04-14: read 5 of 22, declared rest redundant from snippets. Human challenge unlocked 3 material findings (Safari confirmed broken by eng, returning user case, international domains).
Spot-check before recommending ship. Before saying "ship as-is," make at least one Read or Grep call that could falsify the recommendation. Delegate reports that read well are exactly when proxy trust activates. Proved 2026-04-03: recommended ship based entirely on delegate output without a single primary-source check.

Do not auto-fix failed codebase claims or auto-add Intercom conversations. Present findings, let the user decide.

9. Re-approval (if the card changed after initial `approve_content`)

10. Completion (after verification passes or user accepts gaps)

All four steps, every time. This is mechanical, not conditional.

Run python3 box/ship-gate.py enter before the first production execute_approved call (the hook will block production mutations until the plan is declared). The dry-run in step 1 below is exempt.

Pre-ship check (dry-run). Re-fetches the card's current state from Shortcut and shows what the ship command will change. Present the output to the user and wait for go-ahead before proceeding.
```
python3 box/shortcut-ship.py SC-NNN <saved_path> --severity N --dry-run
```
- Exit 0: clean. Present the output, wait for go-ahead.
- Exit 3: backward state transition — the card has moved past Backlog since you last fetched it (e.g. someone moved it to Near Term, In Build, or In Test while you were investigating). Present the warning. Suggest --description-only to update content without changing state. Do not ship without explicit user confirmation of the state change.
Ship. Submit via agenterminal.execute_approved:
```
python3 box/shortcut-ship.py SC-NNN <saved_path>
```
The script strips YAML frontmatter automatically. The saved file at .agenterminal/approved/card-draft/scNNN.md is compaction insurance and audit trail. The script handles all four operations in a single call: update description, move to Backlog, unassign owners, then re-fetch and verify (state, owners, section headers, description length). Exit code 0 = verified, 1 = verification failed, 2 = mutation failed, 3 = backward state transition blocked (use --force to override). Do not construct payloads or curl commands manually.
Evidence comment (when >10 conversations). If the Intercom evidence block says "full list in comment," post the comment via shortcut-comment.py SC-NNN "text". Route through execute_approved. This is part of shipping, not a follow-up. Verify the comment appears on the card. Proved 2026-04-28: declared "shipped" with evidence comment still pending; human correction required.
If any new PostHog event names were discovered during investigation, add them to the appropriate section in box/posthog-events.md (Grep for the section header). If the product area has no section, create one at the bottom and add it to the index at the top of the file.

Idempotency

Re-fetch current description from Shortcut before each update (don't use stale cache)
Only update sections Paul explicitly changed — preserve everything else

Parallel Fill Mode (--lead)

When $ARGUMENTS is --lead, you are the lead instance in a two-instance parallel fill session.

When to use parallel fill

Card limit

Splitting

Lead coordination protocol

Find candidates using Step 1 above.
Propose the product-area cluster split to Paul.
Create an agenterminal conversation thread and brief both assignments.
Join the thread and post the briefing before starting pre-flight. Do not begin pre-flight until the partner instance has acknowledged in the thread.
Work your own assigned cards using Steps 2-9 above.
Own session-end documentation for this parallel session.

Coordination rules

The conversation thread is for the initial briefing and session-end sync. Don't post running status updates to the thread. Every message costs context in both instances for information the user already has from the session panes.
When a conversation notification arrives, respond before continuing tool work. A 10-second acknowledgment costs less than the trust damage from silence. This applies even mid-investigation.
Intercom search index refresh only needs to happen once. First instance to start runs it; second instance skips.
Paul sees both session panes directly and reviews/approves cards in whatever order he prefers.

Compaction risk

Session-end

Parallel Fill Mode (--partner)

When $ARGUMENTS is --partner, you are the partner instance in a two-instance parallel fill session.

Partner coordination protocol

Join the agenterminal conversation thread.
Read the lead's briefing to get your assigned cards.
Acknowledge in the thread before starting pre-flight. The lead instance waits for this before beginning its own work.
Work your assigned cards using Steps 2-9 above.

Coordination rules

Same rules as the lead instance:

Thread is for briefing and session-end sync only. No running status updates.
Respond to conversation notifications before continuing tool work.
Skip Intercom search index refresh if the lead already ran it.

Everything else is the standard per-card protocol. Pre-flight, investigation, verification bar, quality gate, approval flow, completion steps: all identical to single-card mode.

fill-cards

이 저장소의 다른 Skills

Fill Cards

Arguments

Also Read

Constraints

Steps

0. Resume check (single-card mode only)

1. Find candidates (single-card mode)

2. Pre-flight

3. Plan the investigation

4. Investigate

Intercom

intercom-filter result saved ({timestamp})

intercom-filter result save failed ({timestamp})

Codebase

codebase-explore result saved ({timestamp})

codebase-explore result save failed ({timestamp})

PostHog

Slack

Jam recordings

4b. Investigation accounting

5. Synthesize

6. Story links

7. Present for approval

8. Verification gate (after approve_content, before ship)

{session_label} result saved ({timestamp})

{session_label} result save failed ({timestamp})

9. Re-approval (if the card changed after initial approve_content)

10. Completion (after verification passes or user accepts gaps)

Idempotency

Parallel Fill Mode (--lead)

When to use parallel fill

Card limit

Splitting

Lead coordination protocol

Coordination rules

Compaction risk

Session-end

Parallel Fill Mode (--partner)

Partner coordination protocol

Coordination rules

Fill Cards

Arguments

Also Read

Constraints

Steps

0. Resume check (single-card mode only)

1. Find candidates (single-card mode)

2. Pre-flight

3. Plan the investigation

4. Investigate

Intercom

intercom-filter result saved ({timestamp})

intercom-filter result save failed ({timestamp})

Codebase

codebase-explore result saved ({timestamp})

codebase-explore result save failed ({timestamp})

PostHog

Slack

Jam recordings

4b. Investigation accounting

5. Synthesize

6. Story links

7. Present for approval

8. Verification gate (after approve_content, before ship)

{session_label} result saved ({timestamp})

{session_label} result save failed ({timestamp})

9. Re-approval (if the card changed after initial approve_content)

10. Completion (after verification passes or user accepts gaps)

Idempotency

Parallel Fill Mode (--lead)

When to use parallel fill

Card limit

Splitting

Lead coordination protocol

Coordination rules

Compaction risk

Session-end

Parallel Fill Mode (--partner)

9. Re-approval (if the card changed after initial `approve_content`)

9. Re-approval (if the card changed after initial `approve_content`)