| name | release-review |
| description | Weekly release impact review — pull PostHog data for Released cards and tracked PRs, classify, draft observations, post to Slack |
| disable-model-invocation | true |
Release Review
Weekly post-release check-in. For each Released card whose completed_at falls
exactly 7, 14, 21, or 28 days ago, and each tracked PR whose merged_at falls
on the same dates: pull current data from the PostHog insight, classify the
observation, and report findings in #post-release-measurement. Leave a review
comment on every item examined (Shortcut comment for cards, DB record for PRs).
This is the follow-up to Released Cards (Play 6). Play 6 creates the measurement
surface for cards. Tracked PRs (box/tracked-prs.py) are registered manually
when meaningful work ships without a card. Both follow the same review cadence.
Also Read
Before starting, read these shared sections from box/shortcut-ops.md:
- Workspace Constants: Shortcut IDs, workflow states, mutation scripts
- API Quirks: Shortcut and Slack API gotchas
Constraints
- Mutation gate: A PreToolUse hook blocks all Slack and Shortcut mutations
through Bash. Route through
agenterminal.execute_approved or present the
command for the user to run.
- Human-in-the-loop: Present scoped card list at checkpoint (end of Phase 1).
Present each observation draft via
approve_content before posting.
- Context protection:
play7-gather.py (Phase 1+2a) and play7-post.py
(Phase 4-5) keep mechanical API work out of primary context. Primary context
is reserved for classification and observation drafting.
- Before Shortcut or Slack API calls: check
reference/tooling-logistics.md
for tested recipes. Don't re-derive payload shapes or endpoint paths from scratch.
Constants
| Constant | Value |
|---|
| #post-release-measurement | C0AGD4ZEC6M |
| PostHog project ID | 161414 |
Cadence
Run daily. Each run picks up cards released on the same day of the week 1-4
weeks ago. A card released on a Tuesday gets checked on the next four Tuesdays.
No batching across days of the week.
Monday exception: Monday runs also include the preceding Saturday and Sunday
for each lookback window, so weekend releases get covered without requiring a
weekend run. A card released on Saturday gets its first review the following
Monday (grouped into the week 1 cohort).
Arguments
$ARGUMENTS is an optional date override (format: YYYY-MM-DD). If omitted,
uses today's date.
Steps
0. Resume check
Before anything else, determine the run date (from $ARGUMENTS or today)
and check for an existing session file:
ls .agent/release-review/{YYYY-MM-DD}.md 2>/dev/null
If the session file exists for today's run date:
- Read it in full.
- Report current state to the user based only on what is written in the
session file. Name which steps completed, which items are in scope,
what classifications were made, which drafts were approved, and which
posts have been sent.
- Ask whether to resume from that point or start fresh.
- Wait for explicit direction before proceeding.
If the user chooses "start fresh": archive the existing session file
and all its sidecars:
STAMP=$(date +%s)
mv .agent/release-review/{YYYY-MM-DD}.md .agent/release-review/{YYYY-MM-DD}.bak.$STAMP.md
for f in .agent/release-review/{YYYY-MM-DD}-*; do
[ -e "$f" ] && mv "$f" "$f.bak.$STAMP" 2>/dev/null
done
Then continue to Step 1. The gather script will create a new session file.
If no session file exists, continue to Step 1.
Trustworthy sidecar files on resume: Only read sidecar files (JSON,
markdown, PNG) that have a matching receipt in the current session file
— path matches and file exists on disk. Any files on disk without a
matching receipt in the current session file are stale and must be
ignored. If multiple receipts exist for the same path, the last one
in file order is authoritative (append-only means file order = time
order).
1. Gather data (script)
One tool call replaces all API plumbing for Phase 1 and the mechanical parts
of Phase 2. Keeps ~20-30K of API response context out of the primary window.
python3 box/play7-gather.py --session-file .agent/release-review/{YYYY-MM-DD}.md
python3 box/play7-gather.py --date $ARGUMENTS --session-file .agent/release-review/{YYYY-MM-DD}.md
python3 box/play7-gather.py --verbose --session-file .agent/release-review/{YYYY-MM-DD}.md
The gather script creates the session file if it doesn't exist and copies the
gather JSON to .agent/release-review/{YYYY-MM-DD}-gather.json.
The script:
- Computes target dates per cohort (today minus 7/14/21/28 days; Monday
includes Saturday/Sunday for each window)
- Queries Shortcut for Released stories, filters by exact
completed_at match
- Queries
tracked_prs DB table for PRs merged on target dates
- For cards: fetches comments, extracts Play 6 insight links, identifies prior
Play 7 review comments
- For PRs: fetches prior reviews from
tracked_pr_reviews DB table
- Resolves PostHog insight IDs and fetches current data (blocking refresh)
- Flags items already reviewed at this week number (idempotency)
- Detects re-releases (
is_rerelease flag) for cards
- Excludes cards without Play 6 insights and PRs without attached insights
Each item in the output has a "source" field: "card" or "pr". PR entries
include "pr_number" and "pr_url" fields. Each item has an "insights" array
(one element for individual insights, many for dashboard-linked cards) and an
optional "dashboard_url" (set when insights came from a dashboard comment).
Output: /tmp/ff-$AGENTERMINAL_SESSION_ID/play7-gather.json with cohorts (grouped items),
excluded list, and summary counts. run_date_formatted gives the day name
for the Slack intro — don't derive it manually.
1a. Extract posting plan
python3 box/play7-extract.py /tmp/ff-$AGENTERMINAL_SESSION_ID/play7-gather.json --in-scope
Returns per-item JSON records with insight IDs, names, and URLs mapped to
play7-post.py argument names. Field mapping:
insights[].insight_id → --insight-id
insights[].insight_name → --insight-name
cohort_week → --week (PR reviews)
numeric_id → --card-id (cards)
Use this output for the checkpoint (step 2) and to construct play7-post.py
commands in steps 5-6. Add --cohort week_N to filter to a single cohort.
Classification (step 3) still reads the raw gather JSON — the extract
output is a posting bridge, not a replacement for the full gather data.
Items that need current_data, prior_reviews text, or last_slack_report
content for classification judgment should be read from the gather JSON.
2. Checkpoint
Present the item list grouped by cohort (released/merged 1/2/3/4 weeks ago)
with: card ID or PR number, title, completed/merged date, insight link,
already-reviewed status. Skip items flagged already_reviewed_this_week: true.
Tracked PRs show as "PR #NNNN" alongside "SC-NNN" cards.
Excluded cards: Review the excluded list. Cards excluded for "no Play 6
insight" may have been missed by Play 6 rather than intentionally skipped.
Assess each: bug fixes with measurable error rates, features with trackable
adoption, and billing/backend fixes with proxy signals are candidates for
insights. One-off actions (notifications, data backfills) and cosmetic changes
are not. Present excluded cards with assessment alongside in-scope cards.
If the user confirms any excluded cards should be added, create insights
(mini Play 6: test the query, save the insight, post the "Release impact
insight:" comment) and add those cards to the batch before proceeding.
Write scope to the session file. After the user confirms the item list,
append to the session file:
## Scope confirmed ({timestamp})
In-scope: {comma-separated item IDs}
Excluded: {item ID (reason), ...}
## STATUS: classifying ({timestamp})
3. Classify (primary context)
This is the reasoning step. For each item (card or PR) in scope, classify
into one of three outcomes:
- Insufficient data: Not enough volume or time elapsed. Fewer than ~10
events total, or the insight shows zero activity. Gets a type-1 comment,
no Slack post.
- No material change: There is data, but the numbers, trend direction,
and notable patterns look materially the same as the last Slack report for
this item. Compare current insight values against the observation text in
last_slack_report. Gets a type-2 comment, no Slack post.
- Interesting observation: Something worth reporting. New trend, notable
adoption, unexpected pattern, drop-off, anomaly, or enough data to draw a
new conclusion. Gets a type-3 comment and Slack post (after approval).
Suspicious data gate: If any insight data looks wrong during classification
(zero activity where there should be volume, numbers that don't match known
product behavior, contradictions between related insights), stop and investigate
before proceeding. Check the insight's query definition against the codebase to
verify event names and property filters. Fix broken insights before classifying.
Don't draft observations on data you don't trust — the normal posting flow
resumes after the data is verified or corrected. Proved: Mar 23 session found a
dashboard insight tracking nonexistent events.
The "no material change" comparison is always against the last Slack report
(type-3 comment), not the last card comment of any type. A card can accumulate
multiple type-1 or type-2 comments between Slack reports. It only comes back
to Slack when something actually changed relative to the last time it appeared
in the channel.
The gather script returns last_slack_report with the full text of the most
recent type-3 comment. Read it. Week 1 has no prior Slack report, so it's
always either insufficient data or interesting observation.
Dashboard-linked cards: When dashboard_url is set, the item has multiple
insights from a full dashboard (e.g. split tests). Classify based on the
overall picture across all insights, not per-insight. The Slack observation
should synthesize the dashboard — link to the dashboard URL, not individual
insights. Use play7-post.py slack with the first insight's ID for the chart
image (pick the most representative one).
3a. Classification checkpoint
Present each classification with a one-line rationale before writing any
drafts. Minimum format per item: SC-NNN: Title: [classification] — [why]. For
items with prior Slack reports, state what changed or didn't relative to
the last report. Wait for user confirmation before proceeding to step 4.
This is a gate, not a status update. The classification reasoning is the
core judgment step — if the user can't see and redirect individual
classifications, the control loop is broken for the entire drafting phase.
Product state changes. If any card in scope represents a sunset, rename,
feature flag change, or new feature launch, update .claude/rules/product-knowledge.md
so future sessions have the correct product state. This is an ADD trigger —
the release review is where product facts change.
Write classifications to the session file before advancing to Step 4.
Do not proceed to drafting until this write is complete. Append:
## Classifications ({timestamp})
| Item | Cohort | Classification | Rationale |
|---|---|---|---|
| {item_id} | {cohort} | {classification} | {one-line rationale} |
...
## STATUS: drafting ({timestamp})
This is a hard gate. A fresh agent reading this file can skip the entire
classification phase and proceed directly to drafting based on these
decisions. This is the highest-value recovery artifact.
Classification → review type mapping (deterministic, used for both
play7-post.py comment --review-type and resume interpretation):
| Final classification | Review type | Slack post? |
|---|
interesting_observation + approved | 3 | Yes |
insufficient_data | 1 | No |
no_material_change | 2 | No |
4. Draft observations
For each card classified as "interesting observation":
-
Draft in full Slack message format — the approved text is posted directly
to Slack with no reformatting:
Cards:
*<shortcut_url|SC-NNN: Card title>* · released [N] week/weeks ago ([Mon Day])
[TLDR — one sentence, the headline takeaway. Analytical, leads with the
conclusion, not the data.]
• [Supporting detail 1 — specific number with time window, comparison, or trend]
• [Supporting detail 2]
• [Supporting detail 3, if needed]
<posthog_insight_url|View insight in PostHog>
Tracked PRs:
*<pr_url|PR #NNNN: PR title>* · merged [N] week/weeks ago ([Mon Day])
[TLDR — one sentence, the headline takeaway.]
• [Supporting detail 1]
• [Supporting detail 2]
<posthog_insight_url|View insight in PostHog>
Dashboard-linked cards use the dashboard URL in the footer instead of
an individual insight URL.
Use • for bullets. Include specific numbers with time windows. The TLDR
is what you'd say if someone asked "so what happened with that feature?" —
lead with the conclusion, not the data. Tone varies per card: positive,
analytical, neutral, whatever fits.
-
This is the reasoning step. Not "the number went up." What does the data
mean in context? Is adoption tracking expectations? Any patterns worth
investigating? Is the feature solving the described problem?
-
Don't narrativize noisy data. Small week-over-week volume changes don't
tell a story. Low per-user counts are ambiguous (not evidence of retention
or abandonment). Spikes need user-level investigation before conclusions.
Resist constructing narratives ("rebound," "filling a workflow gap,"
"recurring pattern") that read well but aren't supported by the numbers.
-
Strip process language. Slack output should read like a product
performance update, not an internal process artifact. The audience cares
about what happened and what it means — not how we track it or where
a card sits in a review lifecycle. Specifically, never include:
- Review cadence references ("final automated review," "week 4 review,"
"exits the review window," "this card's last check")
- Play numbers, cohort week labels
- Any framing that implies the reader needs to understand our 4-week
review window or internal process to parse the message
State facts directly. "The instrumentation is catching anomalies" not
"This is the final automated review (week 4). The instrumentation is
catching anomalies."
-
Resolve answerable questions before drafting. When a draft would
contain uncertainty that could be resolved from primary sources (codebase,
database, PostHog), resolve it before including it. "Either X or Y" when
reading a file would tell you which one is a draft quality failure. Read
the code, query the data, then write the draft.
-
Write all drafts to /tmp/ff-$AGENTERMINAL_SESSION_ID/play7-drafts.md,
one section per item (e.g. ## SC-562 or ## PR #2927), with the
full Slack message text (header + observation + footer) under each
header. Include the classification for each item.
Copy the drafts file to the session directory:
cp /tmp/ff-$AGENTERMINAL_SESSION_ID/play7-drafts.md .agent/release-review/{YYYY-MM-DD}-drafts.md
Then append to the session file:
## Drafts ({timestamp})
Path: .agent/release-review/{YYYY-MM-DD}-drafts.md
Items drafted: {comma-separated item IDs}
4a. Quality review + chart export (parallel)
Two things happen in parallel while the user isn't waiting:
Chart export: For each item classified as "interesting observation,"
export the chart PNG and check for blank rendering:
python3 box/play7-post.py export-chart --insight-id NNNNN --session-file .agent/release-review/{YYYY-MM-DD}.md
When --session-file is set, the script writes the PNG directly to
.agent/release-review/play7-insight-{insight_id}.png (durable path) and
appends a per-chart receipt to the session file.
Returns {"ok": true, "path": "...", "is_blank": true/false}. For
dashboard-linked cards, export the representative insight's chart.
Record which charts are blank — these items will be posted as text-only
in Phase 5 (no chart attachment).
Quality review: Delegate a review agent to check all drafts against
quality criteria. This catches systematic issues (narrativizing,
unresolved questions, unsupported claims) before they reach the user's
approval queue.
Delegate with agent: "claude", model: "claude-sonnet-4-6".
Don't set timeout_ms — the default is the maximum (30 min).
Pass session recovery parameters on the delegate call:
session_file: ".agent/release-review/{YYYY-MM-DD}.md"
session_label: "quality-review"
save_path: ".agent/release-review/{YYYY-MM-DD}-quality-review.json"
When the delegate push arrives, verify saved_to= and append the result
saved/failed receipt to the session file (same pattern as fill-cards).
Pass the drafts file and the gather JSON (for fact-checking numbers
against source data). Include the review criteria verbatim in the prompt.
Review criteria (include in the delegate prompt):
-
Narrativizing: Does any draft construct a causal story not supported
by the data? Look for: attributing user motivation ("when users plan
their week"), framing noise as signal ("rebound," "filling a workflow
gap"), interpreting ambiguous patterns as trends. The fix is to state
the observation without the causal frame.
-
Unresolved answerable questions: Does any draft present uncertainty
("either X or Y," "suggests," "may indicate") for something that could
be resolved by reading code or querying data? If the codebase or database
could answer the question, flag it. The primary agent needs to do the
investigation and revise — the reviewer just identifies the gaps.
-
Unsupported causal claims: Does any draft claim one thing caused
another without evidence? Correlation language ("consistent with") is
acceptable; causal language ("because," "driven by," "due to") requires
supporting evidence in the draft.
-
Process language: Does any draft reference internal process — review
cadence, play numbers, week labels, "exits the review window," or any
framing that requires understanding the 4-week review system to parse?
Delegate output format: For each draft, return either PASS or a list
of issues. Each issue should quote the problematic text and name the
criterion violated. Also verify that specific numbers in the drafts match
the source data in the gather JSON.
After collecting the review: First, re-read the gather JSON (or
the data sections for each flagged card) and verify each reviewer finding
against the source data before applying any edits. At least one numerical
claim must be verified via a Read or Bash call against the gather JSON —
"I remember X" is not verification even when the data was recently in
context. Proved 2026-04-21: agent claimed spot-check of all 8 findings
without a Read call; monitor caught at high severity. Proved 2026-04-24:
all spot-checks performed from context memory; numbers correct but method
was proxy trust on own recall.
Then revise any drafts that received issues. If a flagged issue requires
primary source investigation (criterion 2), do the investigation (read
code, query data), then revise. Don't soften the language — resolve the
underlying question. If a criterion-1 flag is accurate, cut the narrative
framing and state the observation directly.
4b. Approve observations
Present each revised draft via approve_content with
content_type: "observation", filename: "scNNN-observation-wkN" (or
"prNNNN-observation-wkN" for PRs). One at a time. The user may edit,
change classification, or skip.
The approved text is the exact Slack message — no separate Slack message
approval step. It also serves as the source for Shortcut comments (Phase 6
extracts the observation body, stripping the Slack header and footer).
The saved file is compaction insurance — read the saved_path to recover
approved text after compaction.
Log every approval outcome to the session file. Only
interesting_observation items reach step 4b. Items classified as
insufficient_data or no_material_change are terminally dispositioned
by the Classifications table — they skip 4b and proceed to type-1/type-2
comments in step 6.
For each interesting_observation item presented, append one of:
Approved:
### {item_id} approved ({timestamp})
Path: {saved_path}
Reclassified (user decides this isn't an interesting observation):
### {item_id} reclassified ({timestamp})
Was: interesting_observation -> Now: {insufficient_data|no_material_change}
Reason: {user's reason}
After all interesting_observation items have been dispositioned, append:
## STATUS: approved ({timestamp})
Execution order: Post Slack report (Phase 5) first, then Shortcut review
comments (Phase 6). Slack is the audience-facing surface; card comments are
the trailing record.
4c. Tracking gaps assessment
Systematic pass across all reviewed items: "Is there instrumentation
missing that would make the next review cycle more useful?"
For each item, check:
- Are all planned events actually firing? (Compare insight description
against codebase emit calls.)
- Does the insight have the right denominator? (Toggle counts without
session counts can't produce rates.)
- Can the insight answer the card's original question? (A mechanism-fires
event without downstream conversion tracking can't measure goal
achievement.)
- Are there known failure paths that bypass instrumentation? (Client-side
timeouts that don't fire server events.)
Present confirmed gaps as a consolidated list. Each gap that warrants a
fix is a potential story — post to #ideas (one thread: parent summary +
one reply per gap) where sync-ideas will pick them up. Lower-priority
enrichment opportunities (property additions, breakdown expansions) can
be noted but don't need #ideas posts unless they're blocking a question
the team is actively asking.
4d. Ship-phase checkpoint
Before posting anything, present the full batch plan to the user:
"Entering ship phase. [N] Slack messages (1 intro + [N-1] observations)
and [M] Shortcut/PR review comments planned.
Slack: each post verified via channel history read-back (play7-post.py).
Shortcut: each comment verified via card comment re-fetch (play7-post.py).
PR reviews: each recorded to DB via play7-post.py pr-review.
Proceeding with posting. First: Slack intro."
Wait for user acknowledgment before the first execute_approved call. This
is the analytical-to-mechanical boundary — the moment where verification,
narration, and per-item judgment historically degrade. The pause makes the
batch size visible and gives the user an opportunity to steer pacing before
momentum builds.
Run python3 box/ship-gate.py enter before the first production
execute_approved call (the hook will block production mutations until the
plan is declared).
Append to the session file:
## STATUS: shipping ({timestamp})
5. Post Slack report
If zero cards classified as interesting observation, skip the Slack report
entirely. No intro, no card messages. Card comments (Phase 6) still get posted.
Intro message:
:bar_chart: Release impact review — [Day of week], [Month Day, Year]
[N] items reviewed from the last 4 weeks ([X] cards, [Y] tracked PRs).
[M] with updates worth reporting today.
Use run_date_formatted from the gather script for the day name.
python3 box/play7-post.py slack-intro \
--channel C0AGD4ZEC6M \
--text ":bar_chart: Release impact review — Wednesday, February 19, 2026\n\n4 released cards reviewed from the last 4 weeks.\n2 with updates worth reporting today." \
--session-file .agent/release-review/{YYYY-MM-DD}.md
Per-item message:
Each item uses the approved observation text from step 4b directly — no
reformatting. Items with non-blank charts are posted as file uploads
(chart appears inline below the text). Items with blank or failed charts
use --no-chart for text-only posting.
python3 box/play7-post.py slack \
--insight-id 12345 \
--insight-name "SC-123: Feature engagement" \
--channel C0AGD4ZEC6M \
--chart-path .agent/release-review/play7-insight-12345.png \
--comment "*<https://app.shortcut.com/tailwind/story/123|SC-123: Card title>* · released 2 weeks ago (Feb 5)\n\nAdoption hit 142 unique users...\n\n<https://us.posthog.com/project/161414/insights/abc123|View insight in PostHog>" \
--session-file .agent/release-review/{YYYY-MM-DD}.md
python3 box/play7-post.py slack \
--insight-id 12345 \
--insight-name "SC-123: Feature engagement" \
--channel C0AGD4ZEC6M \
--no-chart \
--comment "..." \
--session-file .agent/release-review/{YYYY-MM-DD}.md
The slack subcommand uses --chart-path to attach a pre-exported chart.
When the chart was detected as blank in step 4a, use --no-chart to post
as text-only via chat.postMessage (no file upload, no fallback export).
All posts go through execute_approved with timeout_ms: 120000. Post
order: intro first, then cards in cohort order (oldest releases first).
No separate Slack content approval. The observation was already approved
in Slack message format in step 4b. Post via execute_approved using the
approved text directly as the --comment argument.
Post-mutation verification: After posting the first Slack item,
pause and narrate what was posted before continuing. This is the highest-risk
transition in the play — completion bias activates here ("analysis is done,
just posting now") and suppresses verification. Narrate each subsequent post
individually. Do not batch-announce and silently execute.
Minimum per-post narration (structural floor — prevents compression toward
timestamp-only citations under batch repetition):
[ID] posted — [script] re-fetched [surface] and confirmed [what exists]. [Next item]:
Example: "SC-1045 posted — play7-post.py re-fetched channel history and confirmed
message with chart exists at ts X. Posting SC-234:"
6. Post review records
For every item examined, post a review record. For cards, this is a Shortcut
comment. For tracked PRs, this is a DB record via play7-post.py pr-review.
The "released/merged N week(s) ago" marker (use proper singular/plural:
"1 week ago," "2 weeks ago") indicates how long post-release this review is.
Type 1: Insufficient data
Release impact review (released [N] week/weeks ago): Reviewed [insight name].
Insufficient data to report — [brief reason, e.g. "12 events total since
release"].
Type 2: No material change
Release impact review (released [N] week/weeks ago): Reviewed [insight name].
No material change since [date of last Slack report].
Type 3: Observation (approved for Slack)
Release impact review (released [N] week/weeks ago): [The approved observation
text]. Reported in #post-release-measurement.
Post each via execute_approved (wraps already-approved observation text in
fixed format):
python3 box/play7-post.py comment \
--card-id 123 \
--review-type 1 \
--text "Release impact review (released 2 weeks ago): Reviewed SC-123: Feature engagement. Insufficient data to report — 8 events total since release." \
--session-file .agent/release-review/{YYYY-MM-DD}.md
python3 box/play7-post.py pr-review \
--pr 2927 --type 1 --week 1 \
--text "Release impact review (merged 1 week ago): Reviewed PR #2927: Pin spacing fix. Insufficient data to report — 3 events total since merge." \
--session-file .agent/release-review/{YYYY-MM-DD}.md
Pass --session-file to all play7-post.py calls. The script appends
receipts after its built-in verification succeeds. The agent does NOT manually
append ship-phase entries. Pass --review-type {1|2|3} on all comment calls
(use the classification → review type mapping from step 3a).
After all posts and comments are complete, append to the session file:
## STATUS: complete ({timestamp})
Tracking PRs
When meaningful work ships via PR without a Shortcut card, register it for
review:
python3 box/tracked-prs.py add --pr NNNN
python3 box/tracked-prs.py add --pr NNNN --title "..." --merged-at YYYY-MM-DD
python3 box/tracked-prs.py attach-insight --pr NNNN --insight-id abc123 \
--insight-url "https://..." --insight-name "PR #NNNN: ..."
python3 box/tracked-prs.py list
python3 box/tracked-prs.py list --due
Entry points: sync-ideas (when a cardless PR is spotted in Slack), or any
session where a shipped PR is noticed. The next Play 7 run picks it up
automatically.
Idempotency
Before each mutation:
- Review records: the gather script includes
already_reviewed_this_week
per item (for cards: scoped to current release cycle via completed_at;
for PRs: checked against tracked_pr_reviews). Skip if true.
- Slack posts: check the channel for a message containing the card ID or
PR number and today's date. If it exists, skip.
- For cards: re-fetch card state and comments from Shortcut before each
operation. For PRs: the DB is the source of truth.
Key behaviors
- An item's four-week review window starts from
completed_at (cards) or
merged_at (PRs). After week 4, the item falls out of scope.
- Cards without Play 6 insights and PRs without attached insights are excluded.
This play reads existing insights, not creates new ones (except at the
checkpoint for missed cards).
- The "interesting observation" judgment is the core value. It's a reasoning
step that considers the item's purpose, the data shape, and what the team
would find useful to know.
- If an item's insight was deleted or inaccessible in PostHog, flag it to the
user rather than silently skipping.