一键导入
release-review
Weekly release impact review — pull PostHog data for Released cards and tracked PRs, classify, draft observations, post to Slack
用 Codex 或 Claude 帮你安装 复制这段 Prompt,粘贴到 Codex、Claude 或其他助手里,让它检查 Skill 页面并帮你完成安装。
菜单
Weekly release impact review — pull PostHog data for Released cards and tracked PRs, classify, draft observations, post to Slack
用 Codex 或 Claude 帮你安装 复制这段 Prompt,粘贴到 Codex、Claude 或其他助手里,让它检查 Skill 页面并帮你完成安装。
Refresh priority signals, manage active epics, and refresh Near Term pool
Multi-session deliverable play for projects spanning 3+ sessions with concrete outputs (proposals, strategies, wireframes). Provides project-level structure, evidence provenance, and cross-session handoff.
Investigation-driven card grooming — investigate across data sources, synthesize findings into card content, present for approval
Use when acting as a reviewer in an AgenTerminal review conversation. Handles both code reviews (REVIEW_APPROVED) and plan reviews (PLAN_APPROVED).
Match Slack
Verify instrumentation, build measurement insights, close Slack loop for Released cards
基于 SOC 职业分类
| name | release-review |
| description | Weekly release impact review — pull PostHog data for Released cards and tracked PRs, classify, draft observations, post to Slack |
| disable-model-invocation | true |
Weekly post-release check-in. For each Released card whose completed_at falls
exactly 7, 14, 21, or 28 days ago, and each tracked PR whose merged_at falls
on the same dates: pull current data from the PostHog insight, classify the
observation, and report findings in #post-release-measurement. Leave a review
comment on every item examined (Shortcut comment for cards, DB record for PRs).
This is the follow-up to Released Cards (Play 6). Play 6 creates the measurement
surface for cards. Tracked PRs (box/tracked-prs.py) are registered manually
when meaningful work ships without a card. Both follow the same review cadence.
Before starting, read these shared sections from box/shortcut-ops.md:
agenterminal.execute_approved or present the
command for the user to run.approve_content before posting.play7-gather.py (Phase 1+2a) and play7-post.py
(Phase 4-5) keep mechanical API work out of primary context. Primary context
is reserved for classification and observation drafting.reference/tooling-logistics.md
for tested recipes. Don't re-derive payload shapes or endpoint paths from scratch.| Constant | Value |
|---|---|
| #post-release-measurement | C0AGD4ZEC6M |
| PostHog project ID | 161414 |
Run daily. Each run picks up cards released on the same day of the week 1-4 weeks ago. A card released on a Tuesday gets checked on the next four Tuesdays. No batching across days of the week.
Monday exception: Monday runs also include the preceding Saturday and Sunday for each lookback window, so weekend releases get covered without requiring a weekend run. A card released on Saturday gets its first review the following Monday (grouped into the week 1 cohort).
$ARGUMENTS is an optional date override (format: YYYY-MM-DD). If omitted,
uses today's date.
Before anything else, determine the run date (from $ARGUMENTS or today)
and check for an existing session file:
ls .agent/release-review/{YYYY-MM-DD}.md 2>/dev/null
If the session file exists for today's run date:
If the user chooses "start fresh": archive the existing session file and all its sidecars:
STAMP=$(date +%s)
mv .agent/release-review/{YYYY-MM-DD}.md .agent/release-review/{YYYY-MM-DD}.bak.$STAMP.md
for f in .agent/release-review/{YYYY-MM-DD}-*; do
[ -e "$f" ] && mv "$f" "$f.bak.$STAMP" 2>/dev/null
done
Then continue to Step 1. The gather script will create a new session file.
If no session file exists, continue to Step 1.
Trustworthy sidecar files on resume: Only read sidecar files (JSON, markdown, PNG) that have a matching receipt in the current session file — path matches and file exists on disk. Any files on disk without a matching receipt in the current session file are stale and must be ignored. If multiple receipts exist for the same path, the last one in file order is authoritative (append-only means file order = time order).
One tool call replaces all API plumbing for Phase 1 and the mechanical parts of Phase 2. Keeps ~20-30K of API response context out of the primary window.
# No arguments: use today's date
python3 box/play7-gather.py --session-file .agent/release-review/{YYYY-MM-DD}.md
# With date override from $ARGUMENTS
python3 box/play7-gather.py --date $ARGUMENTS --session-file .agent/release-review/{YYYY-MM-DD}.md
# Debug mode
python3 box/play7-gather.py --verbose --session-file .agent/release-review/{YYYY-MM-DD}.md
The gather script creates the session file if it doesn't exist and copies the
gather JSON to .agent/release-review/{YYYY-MM-DD}-gather.json.
The script:
completed_at matchtracked_prs DB table for PRs merged on target datestracked_pr_reviews DB tableis_rerelease flag) for cardsEach item in the output has a "source" field: "card" or "pr". PR entries
include "pr_number" and "pr_url" fields. Each item has an "insights" array
(one element for individual insights, many for dashboard-linked cards) and an
optional "dashboard_url" (set when insights came from a dashboard comment).
Output: /tmp/ff-$AGENTERMINAL_SESSION_ID/play7-gather.json with cohorts (grouped items),
excluded list, and summary counts. run_date_formatted gives the day name
for the Slack intro — don't derive it manually.
python3 box/play7-extract.py /tmp/ff-$AGENTERMINAL_SESSION_ID/play7-gather.json --in-scope
Returns per-item JSON records with insight IDs, names, and URLs mapped to
play7-post.py argument names. Field mapping:
insights[].insight_id → --insight-idinsights[].insight_name → --insight-namecohort_week → --week (PR reviews)numeric_id → --card-id (cards)Use this output for the checkpoint (step 2) and to construct play7-post.py
commands in steps 5-6. Add --cohort week_N to filter to a single cohort.
Classification (step 3) still reads the raw gather JSON — the extract
output is a posting bridge, not a replacement for the full gather data.
Items that need current_data, prior_reviews text, or last_slack_report
content for classification judgment should be read from the gather JSON.
Present the item list grouped by cohort (released/merged 1/2/3/4 weeks ago)
with: card ID or PR number, title, completed/merged date, insight link,
already-reviewed status. Skip items flagged already_reviewed_this_week: true.
Tracked PRs show as "PR #NNNN" alongside "SC-NNN" cards.
Excluded cards: Review the excluded list. Cards excluded for "no Play 6
insight" may have been missed by Play 6 rather than intentionally skipped.
Assess each: bug fixes with measurable error rates, features with trackable
adoption, and billing/backend fixes with proxy signals are candidates for
insights. One-off actions (notifications, data backfills) and cosmetic changes
are not. Present excluded cards with assessment alongside in-scope cards.
If the user confirms any excluded cards should be added, create insights (mini Play 6: test the query, save the insight, post the "Release impact insight:" comment) and add those cards to the batch before proceeding.
Write scope to the session file. After the user confirms the item list, append to the session file:
## Scope confirmed ({timestamp})
In-scope: {comma-separated item IDs}
Excluded: {item ID (reason), ...}
## STATUS: classifying ({timestamp})
This is the reasoning step. For each item (card or PR) in scope, classify into one of three outcomes:
last_slack_report. Gets a type-2 comment, no Slack post.Suspicious data gate: If any insight data looks wrong during classification (zero activity where there should be volume, numbers that don't match known product behavior, contradictions between related insights), stop and investigate before proceeding. Check the insight's query definition against the codebase to verify event names and property filters. Fix broken insights before classifying. Don't draft observations on data you don't trust — the normal posting flow resumes after the data is verified or corrected. Proved: Mar 23 session found a dashboard insight tracking nonexistent events.
The "no material change" comparison is always against the last Slack report (type-3 comment), not the last card comment of any type. A card can accumulate multiple type-1 or type-2 comments between Slack reports. It only comes back to Slack when something actually changed relative to the last time it appeared in the channel.
The gather script returns last_slack_report with the full text of the most
recent type-3 comment. Read it. Week 1 has no prior Slack report, so it's
always either insufficient data or interesting observation.
Dashboard-linked cards: When dashboard_url is set, the item has multiple
insights from a full dashboard (e.g. split tests). Classify based on the
overall picture across all insights, not per-insight. The Slack observation
should synthesize the dashboard — link to the dashboard URL, not individual
insights. Use play7-post.py slack with the first insight's ID for the chart
image (pick the most representative one).
Present each classification with a one-line rationale before writing any
drafts. Minimum format per item: SC-NNN: Title: [classification] — [why]. For
items with prior Slack reports, state what changed or didn't relative to
the last report. Wait for user confirmation before proceeding to step 4.
This is a gate, not a status update. The classification reasoning is the core judgment step — if the user can't see and redirect individual classifications, the control loop is broken for the entire drafting phase.
Product state changes. If any card in scope represents a sunset, rename,
feature flag change, or new feature launch, update .claude/rules/product-knowledge.md
so future sessions have the correct product state. This is an ADD trigger —
the release review is where product facts change.
Write classifications to the session file before advancing to Step 4. Do not proceed to drafting until this write is complete. Append:
## Classifications ({timestamp})
| Item | Cohort | Classification | Rationale |
|---|---|---|---|
| {item_id} | {cohort} | {classification} | {one-line rationale} |
...
## STATUS: drafting ({timestamp})
This is a hard gate. A fresh agent reading this file can skip the entire classification phase and proceed directly to drafting based on these decisions. This is the highest-value recovery artifact.
Classification → review type mapping (deterministic, used for both
play7-post.py comment --review-type and resume interpretation):
| Final classification | Review type | Slack post? |
|---|---|---|
interesting_observation + approved | 3 | Yes |
insufficient_data | 1 | No |
no_material_change | 2 | No |
For each card classified as "interesting observation":
Draft in full Slack message format — the approved text is posted directly to Slack with no reformatting:
Cards:
*<shortcut_url|SC-NNN: Card title>* · released [N] week/weeks ago ([Mon Day])
[TLDR — one sentence, the headline takeaway. Analytical, leads with the
conclusion, not the data.]
• [Supporting detail 1 — specific number with time window, comparison, or trend]
• [Supporting detail 2]
• [Supporting detail 3, if needed]
<posthog_insight_url|View insight in PostHog>
Tracked PRs:
*<pr_url|PR #NNNN: PR title>* · merged [N] week/weeks ago ([Mon Day])
[TLDR — one sentence, the headline takeaway.]
• [Supporting detail 1]
• [Supporting detail 2]
<posthog_insight_url|View insight in PostHog>
Dashboard-linked cards use the dashboard URL in the footer instead of an individual insight URL.
Use • for bullets. Include specific numbers with time windows. The TLDR
is what you'd say if someone asked "so what happened with that feature?" —
lead with the conclusion, not the data. Tone varies per card: positive,
analytical, neutral, whatever fits.
This is the reasoning step. Not "the number went up." What does the data mean in context? Is adoption tracking expectations? Any patterns worth investigating? Is the feature solving the described problem?
Don't narrativize noisy data. Small week-over-week volume changes don't tell a story. Low per-user counts are ambiguous (not evidence of retention or abandonment). Spikes need user-level investigation before conclusions. Resist constructing narratives ("rebound," "filling a workflow gap," "recurring pattern") that read well but aren't supported by the numbers.
Strip process language. Slack output should read like a product performance update, not an internal process artifact. The audience cares about what happened and what it means — not how we track it or where a card sits in a review lifecycle. Specifically, never include:
Resolve answerable questions before drafting. When a draft would contain uncertainty that could be resolved from primary sources (codebase, database, PostHog), resolve it before including it. "Either X or Y" when reading a file would tell you which one is a draft quality failure. Read the code, query the data, then write the draft.
Write all drafts to /tmp/ff-$AGENTERMINAL_SESSION_ID/play7-drafts.md,
one section per item (e.g. ## SC-562 or ## PR #2927), with the
full Slack message text (header + observation + footer) under each
header. Include the classification for each item.
Copy the drafts file to the session directory:
cp /tmp/ff-$AGENTERMINAL_SESSION_ID/play7-drafts.md .agent/release-review/{YYYY-MM-DD}-drafts.md
Then append to the session file:
## Drafts ({timestamp})
Path: .agent/release-review/{YYYY-MM-DD}-drafts.md
Items drafted: {comma-separated item IDs}
Two things happen in parallel while the user isn't waiting:
Chart export: For each item classified as "interesting observation," export the chart PNG and check for blank rendering:
python3 box/play7-post.py export-chart --insight-id NNNNN --session-file .agent/release-review/{YYYY-MM-DD}.md
When --session-file is set, the script writes the PNG directly to
.agent/release-review/play7-insight-{insight_id}.png (durable path) and
appends a per-chart receipt to the session file.
Returns {"ok": true, "path": "...", "is_blank": true/false}. For
dashboard-linked cards, export the representative insight's chart.
Record which charts are blank — these items will be posted as text-only
in Phase 5 (no chart attachment).
Quality review: Delegate a review agent to check all drafts against quality criteria. This catches systematic issues (narrativizing, unresolved questions, unsupported claims) before they reach the user's approval queue.
Delegate with agent: "claude", model: "claude-sonnet-4-6".
Don't set timeout_ms — the default is the maximum (30 min).
Pass session recovery parameters on the delegate call:
session_file: ".agent/release-review/{YYYY-MM-DD}.md"
session_label: "quality-review"
save_path: ".agent/release-review/{YYYY-MM-DD}-quality-review.json"
When the delegate push arrives, verify saved_to= and append the result
saved/failed receipt to the session file (same pattern as fill-cards).
Pass the drafts file and the gather JSON (for fact-checking numbers
against source data). Include the review criteria verbatim in the prompt.
Review criteria (include in the delegate prompt):
Narrativizing: Does any draft construct a causal story not supported by the data? Look for: attributing user motivation ("when users plan their week"), framing noise as signal ("rebound," "filling a workflow gap"), interpreting ambiguous patterns as trends. The fix is to state the observation without the causal frame.
Unresolved answerable questions: Does any draft present uncertainty ("either X or Y," "suggests," "may indicate") for something that could be resolved by reading code or querying data? If the codebase or database could answer the question, flag it. The primary agent needs to do the investigation and revise — the reviewer just identifies the gaps.
Unsupported causal claims: Does any draft claim one thing caused another without evidence? Correlation language ("consistent with") is acceptable; causal language ("because," "driven by," "due to") requires supporting evidence in the draft.
Process language: Does any draft reference internal process — review cadence, play numbers, week labels, "exits the review window," or any framing that requires understanding the 4-week review system to parse?
Delegate output format: For each draft, return either PASS or a list
of issues. Each issue should quote the problematic text and name the
criterion violated. Also verify that specific numbers in the drafts match
the source data in the gather JSON.
After collecting the review: First, re-read the gather JSON (or the data sections for each flagged card) and verify each reviewer finding against the source data before applying any edits. At least one numerical claim must be verified via a Read or Bash call against the gather JSON — "I remember X" is not verification even when the data was recently in context. Proved 2026-04-21: agent claimed spot-check of all 8 findings without a Read call; monitor caught at high severity. Proved 2026-04-24: all spot-checks performed from context memory; numbers correct but method was proxy trust on own recall.
Then revise any drafts that received issues. If a flagged issue requires primary source investigation (criterion 2), do the investigation (read code, query data), then revise. Don't soften the language — resolve the underlying question. If a criterion-1 flag is accurate, cut the narrative framing and state the observation directly.
Present each revised draft via approve_content with
content_type: "observation", filename: "scNNN-observation-wkN" (or
"prNNNN-observation-wkN" for PRs). One at a time. The user may edit,
change classification, or skip.
The approved text is the exact Slack message — no separate Slack message approval step. It also serves as the source for Shortcut comments (Phase 6 extracts the observation body, stripping the Slack header and footer).
The saved file is compaction insurance — read the saved_path to recover
approved text after compaction.
Log every approval outcome to the session file. Only
interesting_observation items reach step 4b. Items classified as
insufficient_data or no_material_change are terminally dispositioned
by the Classifications table — they skip 4b and proceed to type-1/type-2
comments in step 6.
For each interesting_observation item presented, append one of:
Approved:
### {item_id} approved ({timestamp})
Path: {saved_path}
Reclassified (user decides this isn't an interesting observation):
### {item_id} reclassified ({timestamp})
Was: interesting_observation -> Now: {insufficient_data|no_material_change}
Reason: {user's reason}
After all interesting_observation items have been dispositioned, append:
## STATUS: approved ({timestamp})
Execution order: Post Slack report (Phase 5) first, then Shortcut review comments (Phase 6). Slack is the audience-facing surface; card comments are the trailing record.
Systematic pass across all reviewed items: "Is there instrumentation missing that would make the next review cycle more useful?"
For each item, check:
Present confirmed gaps as a consolidated list. Each gap that warrants a fix is a potential story — post to #ideas (one thread: parent summary + one reply per gap) where sync-ideas will pick them up. Lower-priority enrichment opportunities (property additions, breakdown expansions) can be noted but don't need #ideas posts unless they're blocking a question the team is actively asking.
Before posting anything, present the full batch plan to the user:
"Entering ship phase. [N] Slack messages (1 intro + [N-1] observations) and [M] Shortcut/PR review comments planned.
Slack: each post verified via channel history read-back (play7-post.py). Shortcut: each comment verified via card comment re-fetch (play7-post.py). PR reviews: each recorded to DB via play7-post.py pr-review.
Proceeding with posting. First: Slack intro."
Wait for user acknowledgment before the first execute_approved call. This
is the analytical-to-mechanical boundary — the moment where verification,
narration, and per-item judgment historically degrade. The pause makes the
batch size visible and gives the user an opportunity to steer pacing before
momentum builds.
Run python3 box/ship-gate.py enter before the first production
execute_approved call (the hook will block production mutations until the
plan is declared).
Append to the session file:
## STATUS: shipping ({timestamp})
If zero cards classified as interesting observation, skip the Slack report entirely. No intro, no card messages. Card comments (Phase 6) still get posted.
Intro message:
:bar_chart: Release impact review — [Day of week], [Month Day, Year]
[N] items reviewed from the last 4 weeks ([X] cards, [Y] tracked PRs).
[M] with updates worth reporting today.
Use run_date_formatted from the gather script for the day name.
python3 box/play7-post.py slack-intro \
--channel C0AGD4ZEC6M \
--text ":bar_chart: Release impact review — Wednesday, February 19, 2026\n\n4 released cards reviewed from the last 4 weeks.\n2 with updates worth reporting today." \
--session-file .agent/release-review/{YYYY-MM-DD}.md
Per-item message:
Each item uses the approved observation text from step 4b directly — no
reformatting. Items with non-blank charts are posted as file uploads
(chart appears inline below the text). Items with blank or failed charts
use --no-chart for text-only posting.
# With pre-exported chart (non-blank) — use durable path from export-chart:
python3 box/play7-post.py slack \
--insight-id 12345 \
--insight-name "SC-123: Feature engagement" \
--channel C0AGD4ZEC6M \
--chart-path .agent/release-review/play7-insight-12345.png \
--comment "*<https://app.shortcut.com/tailwind/story/123|SC-123: Card title>* · released 2 weeks ago (Feb 5)\n\nAdoption hit 142 unique users...\n\n<https://us.posthog.com/project/161414/insights/abc123|View insight in PostHog>" \
--session-file .agent/release-review/{YYYY-MM-DD}.md
# Without chart (blank or export failed — use --no-chart):
python3 box/play7-post.py slack \
--insight-id 12345 \
--insight-name "SC-123: Feature engagement" \
--channel C0AGD4ZEC6M \
--no-chart \
--comment "..." \
--session-file .agent/release-review/{YYYY-MM-DD}.md
The slack subcommand uses --chart-path to attach a pre-exported chart.
When the chart was detected as blank in step 4a, use --no-chart to post
as text-only via chat.postMessage (no file upload, no fallback export).
All posts go through execute_approved with timeout_ms: 120000. Post
order: intro first, then cards in cohort order (oldest releases first).
No separate Slack content approval. The observation was already approved
in Slack message format in step 4b. Post via execute_approved using the
approved text directly as the --comment argument.
Post-mutation verification: After posting the first Slack item, pause and narrate what was posted before continuing. This is the highest-risk transition in the play — completion bias activates here ("analysis is done, just posting now") and suppresses verification. Narrate each subsequent post individually. Do not batch-announce and silently execute.
Minimum per-post narration (structural floor — prevents compression toward timestamp-only citations under batch repetition):
[ID] posted — [script] re-fetched [surface] and confirmed [what exists]. [Next item]:
Example: "SC-1045 posted — play7-post.py re-fetched channel history and confirmed message with chart exists at ts X. Posting SC-234:"
For every item examined, post a review record. For cards, this is a Shortcut
comment. For tracked PRs, this is a DB record via play7-post.py pr-review.
The "released/merged N week(s) ago" marker (use proper singular/plural:
"1 week ago," "2 weeks ago") indicates how long post-release this review is.
Type 1: Insufficient data
Release impact review (released [N] week/weeks ago): Reviewed [insight name].
Insufficient data to report — [brief reason, e.g. "12 events total since
release"].
Type 2: No material change
Release impact review (released [N] week/weeks ago): Reviewed [insight name].
No material change since [date of last Slack report].
Type 3: Observation (approved for Slack)
Release impact review (released [N] week/weeks ago): [The approved observation
text]. Reported in #post-release-measurement.
Post each via execute_approved (wraps already-approved observation text in
fixed format):
# For cards:
python3 box/play7-post.py comment \
--card-id 123 \
--review-type 1 \
--text "Release impact review (released 2 weeks ago): Reviewed SC-123: Feature engagement. Insufficient data to report — 8 events total since release." \
--session-file .agent/release-review/{YYYY-MM-DD}.md
# For tracked PRs:
python3 box/play7-post.py pr-review \
--pr 2927 --type 1 --week 1 \
--text "Release impact review (merged 1 week ago): Reviewed PR #2927: Pin spacing fix. Insufficient data to report — 3 events total since merge." \
--session-file .agent/release-review/{YYYY-MM-DD}.md
Pass --session-file to all play7-post.py calls. The script appends
receipts after its built-in verification succeeds. The agent does NOT manually
append ship-phase entries. Pass --review-type {1|2|3} on all comment calls
(use the classification → review type mapping from step 3a).
After all posts and comments are complete, append to the session file:
## STATUS: complete ({timestamp})
When meaningful work ships via PR without a Shortcut card, register it for review:
# Auto-fetches title and merge date from GitHub:
python3 box/tracked-prs.py add --pr NNNN
# With manual values (e.g. PR not yet merged):
python3 box/tracked-prs.py add --pr NNNN --title "..." --merged-at YYYY-MM-DD
# Attach a PostHog insight:
python3 box/tracked-prs.py attach-insight --pr NNNN --insight-id abc123 \
--insight-url "https://..." --insight-name "PR #NNNN: ..."
# List all tracked PRs:
python3 box/tracked-prs.py list
# List PRs due for review today:
python3 box/tracked-prs.py list --due
Entry points: sync-ideas (when a cardless PR is spotted in Slack), or any session where a shipped PR is noticed. The next Play 7 run picks it up automatically.
Before each mutation:
already_reviewed_this_week
per item (for cards: scoped to current release cycle via completed_at;
for PRs: checked against tracked_pr_reviews). Skip if true.completed_at (cards) or
merged_at (PRs). After week 4, the item falls out of scope.