Multi-session deliverable play for projects spanning 3+ sessions with concrete outputs (proposals, strategies, wireframes). Provides project-level structure, evidence provenance, and cross-session handoff.

2026-05-272

fill-cards

paulyokota/FeedForward

Investigation-driven card grooming — investigate across data sources, synthesize findings into card content, present for approval

2026-05-222

agenterminal-code-reviewer

paulyokota/FeedForward

Use when acting as a reviewer in an AgenTerminal review conversation. Handles both code reviews (REVIEW_APPROVED) and plan reviews (PLAN_APPROVED).

2026-05-212

sync-ideas

paulyokota/FeedForward

Match Slack

2026-05-192

release-review

paulyokota/FeedForward

Weekly release impact review — pull PostHog data for Released cards and tracked PRs, classify, draft observations, post to Slack

2026-05-192

name	daily-digest
description	Produce a daily "What We're Hearing" digest from Intercom conversations and post to Slack
disable-model-invocation	true

/daily-digest [date]

Produce a "What We're Hearing" digest for the specified date (default: prior business day, computed by the script — do not compute day-of-week yourself) and post to #daily-digest on Slack.

Quick reference

Channel: C0ASG05F1SB (#daily-digest)
Format: parent title + threaded body (Slack splits at ~3900 chars; thread makes this harmless)
Research artifacts: box/research/digest-{date}-*.json and digest-{date}-draft.md
Script: python3 box/daily-digest.py (steps 1-3, computes default date)

Steps

1. Run the data pipeline

python3 box/daily-digest.py [--sync]

Do not pass a date argument unless the user specifies a non-default date. The script computes the prior business day automatically and prints the resolved date with the day of week. If an explicit date is passed that differs from the computed default, the script prints a warning.

Use --sync if the target date is recent (within 24h) to ensure full coverage. The script:

Checks index freshness (hard gate, exits 2 if >36h stale)
Queries conversations with CANNED_PATTERNS + operator bot filter applied
Fetches Intercom API metadata (contact name, state, assignee, AI participation)
Saves filtered conversations and metadata to box/research/
Prints conversation IDs for classification

If the script blocks on freshness, run python3 box/intercom-sync.py --since {date} first, then retry.

2. Delegate classification

Split the conversation IDs from the script output into 3 roughly equal groups. Delegate each to a sonnet subagent with this prompt template:

You are classifying Intercom customer support conversations for a daily digest.

**Database**: `psql postgresql://localhost:5432/feedforward`

**Your assigned conversation IDs** ({N} conversations):
{comma-separated IDs}

**For each conversation**: Run this SQL to get the full text:
SET client_min_messages TO WARNING;
SELECT conversation_id, contact_email, full_text
FROM conversation_search_index WHERE conversation_id = '{id}';

Read the full_text and classify with:
- **summary**: One sentence describing what the customer is asking/reporting
- **area**: Product area (Billing, Pin Scheduler, SmartPin, Turbo, SmartBio,
  Account, General, Onboarding, Credits, Brand Kit, Keyword Research,
  Ghostwriter, Design Pins, or other)
- **skip**: true if NOT a real customer conversation (spam, outreach, sales pitch,
  automated email, directory listing, internal outreach, phishing). false for real
  customer conversations.
- **notable**: true if this is a specific bug report, interesting product signal,
  bot failure, or something unusual. false for routine questions.
- **themes**: Array of sub-theme tags at granular level. Use lowercase-hyphenated
  format. Not "billing" but "surprise-renewal-no-notification".
- **bot_quality**: One of: "appropriate" (bot handled well or escalated promptly),
  "unhelpful" (gave wrong/irrelevant instructions, repeated without adapting),
  "wrong_info" (stated incorrect product facts or fabricated capabilities),
  "escalation_failure" (customer asked for human, bot didn't escalate).
  Only assess the bot's behavior, not the outcome.

**Important**: Read the FULL text of each conversation, not just the first message.

Output instructions for each delegate:

Return a JSON array of objects. Each: conversation_id (string), summary (string),
area (string), skip (boolean), notable (boolean), themes (string array),
bot_quality (string). No prose, just the JSON array.

Use model: claude-sonnet-4-6. Don't set timeout_ms — the default is the maximum (30 min).

Collect all three in parallel. Merge results and save to box/research/digest-{date}-classifications.json.

Don't quote counts from delegate output as facts before spot-checking. Subagent area/skip/notable tallies are proxy data. Spot-check at least one spam and one notable before synthesizing counts into narration.

3. Primary reads

Quality gate: every conversation linked in the digest must be primary-read. Subagent classifications are filters, not findings.

Read priority:

ALL conversations flagged notable: true — use python3 box/intercom-search.py --read {id1},{id2},...
Enough of the large theme clusters to verify sub-theme characterization
Any non-notable conversation you plan to link in the digest

Do not head-limit primary reads (no | head, no limit parameter on Read calls). Signal appears in unexpected conversation positions. If a batch read is too large for context, split into smaller batches rather than truncating. Proved 2026-04-28: head -400 on 5-conversation batch happened to fit but practice risks silent truncation.

As you read, note:

Subagent classifications to correct (notable status, area, skip)
Conversations the subagent missed as notable
Theme patterns emerging from the actual text (not from subagent labels)
Bot behavior: wrong instructions, escalation failures, fabricated info, repeated unhelpful responses. Use bot_quality classifications as a starting filter but verify during primary reads.

4. Draft the digest

Before composing: Output an accounting block as visible text in the conversation (not embedded in the draft file). List every conversation ID that will be linked in the digest. For each, note whether it was primary-read and any classification corrections. This enumeration precedes the draft Write — don't compose first and account after. The block's purpose is auditability: if it's inside the artifact, it was composed alongside the synthesis rather than functioning as a pre-synthesis gate. Proved 2026-04-16: draft linked 5 conversations not yet primary-read; accounting after the Write missed the gap. Proved 2026-04-22: accounting block embedded in draft file rather than visible output; Monitor caught as effort_substitution (medium).

Write to box/research/digest-{date}-draft.md. Re-read the classification file and metadata file before composing — don't work from memory.

Format (v2, theme-first):

_What We're Hearing_ ({day of week}, {month} {day})
_{N} customer conversations ({M} spam filtered). Here's what stood out._

*{Theme 1 title}*
{Count} of {total}. {Characterization of what people are saying.}
• *{Sub-theme}* ({count}) — {1-2 sentences}. Link · Link
• *{Sub-theme}* — {1-2 sentences}. Link

*{Theme 2 title}*
...

:red_circle: *Notable bugs*
• *{Bug title}* — {repro context}. Link
...

:eyes: *Signal worth watching*
• *{Signal}* — {why it matters}. Link
...

:robot_face: *Bot observations*
• *{Pattern}* — {what happened, which conversations}. Link

_By the numbers:_ {Area} {count} · {Area} {count} · ...

Bot observations that assert product correctness (e.g. "bot fabricated a capability," "bot gave wrong explanation") require codebase verification before the label goes into the draft. Conversation text shows what the bot said; it does not establish whether the bot was factually wrong. Grep or read the relevant code/docs before using labels like "fabricated" or "wrong." Proved 2026-04-23: "fabricated capability" label for Turbo messaging was accurate but was applied from memory before codebase verification; user correction on Failed/Missed Posts tab exposed the class of error.

Link format: <https://app.intercom.com/a/inbox/_/inbox/conversation/{id}|{Name}>

Theme granularity: sub-themes within categories ("surprise renewals without notification" not "billing"). Let themes emerge from the data — don't copy prior day's structure.

5. Present for approval

Before presenting: Re-read the draft file and spot-check at least 2 attributed quotes against their source conversations. Don't vouch for accuracy from memory — the verification must precede the claim. Proved 2026-04-21: Billing 23/24 count error caught by reflection, but 3 quote attributions and 26 link mappings asserted without re-reads.

Show the exact text to the user. This is the approval gate. The user reviews the framing, theme selection, and linked conversations.

In headless sessions, the user cannot see Read tool output. "Show the exact text" means copy the full draft into a conversation message. Reading the file is verification; posting it in the conversation is presentation. These are separate steps. Proved 2026-04-28: agent read the draft twice, told user "ready for review" twice, never showed the text.

6. Post to Slack

Two mutations via execute_approved, one at a time:

Parent message:

python3 box/slack-mutate.py post C0ASG05F1SB '{title text}'

Record the returned ts value.

Thread reply:

python3 box/slack-mutate.py reply C0ASG05F1SB {ts} "$(cat /tmp/ff-digest-body.txt)"

Save body to a temp file first to avoid shell escaping issues.

Verify the thread after posting (read back via Slack API). slack-mutate.py's VERIFIED output covers one fragment. If Slack split the body (>3900 chars), use slack_read_thread to confirm all sections landed.

Known gotchas

Slack splits at ~3900 chars. Thread format makes this harmless (just becomes 2-3 thread replies). Don't try to compress the digest to fit.
source.author.name for operator-initiated conversations returns the bot name. The metadata file may show bot names for some contacts. Check the conversation text if a contact name looks like a bot.
Sync lag. The 4h launchd cron means evening conversations may be missing if running the morning after. Use --sync for recent dates.
Subagent theme tags guide reads, they don't replace them. The notable flag has ~90% accuracy. Read the conversation before trusting the classification.
Don't copy prior day's theme structure. Each day's themes should emerge from fresh reads. Using yesterday's sections as a template biases what you see.