一键在 Manus 中运行任何 Skill

release-review

星标2

分支0

更新时间2026年5月19日 13:47

Weekly release impact review — pull PostHog data for Released cards and tracked PRs, classify, draft observations, post to Slack

安装

用 Codex 或 Claude 帮你安装复制这段 Prompt，粘贴到 Codex、Claude 或其他助手里，让它检查 Skill 页面并帮你完成安装。

在 Manus 中运行

来源

paulyokota

paulyokota/FeedForward

打开 GitHub 仓库查看创作者相关仓库

下载

在 Manus 中运行

Release Review

Weekly post-release check-in. For each Released card whose completed_at falls exactly 7, 14, 21, or 28 days ago, and each tracked PR whose merged_at falls on the same dates: pull current data from the PostHog insight, classify the observation, and report findings in #post-release-measurement. Leave a review comment on every item examined (Shortcut comment for cards, DB record for PRs).

This is the follow-up to Released Cards (Play 6). Play 6 creates the measurement surface for cards. Tracked PRs (box/tracked-prs.py) are registered manually when meaningful work ships without a card. Both follow the same review cadence.

Constraints

Mutation gate: A PreToolUse hook blocks all Slack and Shortcut mutations through Bash. Route through agenterminal.execute_approved or present the command for the user to run.
Human-in-the-loop: Present scoped card list at checkpoint (end of Phase 1). Present each observation draft via approve_content before posting.
Context protection: play7-gather.py (Phase 1+2a) and play7-post.py (Phase 4-5) keep mechanical API work out of primary context. Primary context is reserved for classification and observation drafting.
Before Shortcut or Slack API calls: check reference/tooling-logistics.md for tested recipes. Don't re-derive payload shapes or endpoint paths from scratch.

Constants

Constant	Value
#post-release-measurement	`C0AGD4ZEC6M`
PostHog project ID	`161414`

Cadence

Run daily. Each run picks up cards released on the same day of the week 1-4 weeks ago. A card released on a Tuesday gets checked on the next four Tuesdays. No batching across days of the week.

Monday exception: Monday runs also include the preceding Saturday and Sunday for each lookback window, so weekend releases get covered without requiring a weekend run. A card released on Saturday gets its first review the following Monday (grouped into the week 1 cohort).

Arguments

$ARGUMENTS is an optional date override (format: YYYY-MM-DD). If omitted, uses today's date.

Steps

0. Resume check

Before anything else, determine the run date (from $ARGUMENTS or today) and check for an existing session file:

ls .agent/release-review/{YYYY-MM-DD}.md 2>/dev/null

If the session file exists for today's run date:

Read it in full.
Report current state to the user based only on what is written in the session file. Name which steps completed, which items are in scope, what classifications were made, which drafts were approved, and which posts have been sent.
Ask whether to resume from that point or start fresh.
Wait for explicit direction before proceeding.

If the user chooses "start fresh": archive the existing session file and all its sidecars:

STAMP=$(date +%s)
mv .agent/release-review/{YYYY-MM-DD}.md .agent/release-review/{YYYY-MM-DD}.bak.$STAMP.md
for f in .agent/release-review/{YYYY-MM-DD}-*; do
  [ -e "$f" ] && mv "$f" "$f.bak.$STAMP" 2>/dev/null
done

Then continue to Step 1. The gather script will create a new session file.

If no session file exists, continue to Step 1.

Trustworthy sidecar files on resume: Only read sidecar files (JSON, markdown, PNG) that have a matching receipt in the current session file — path matches and file exists on disk. Any files on disk without a matching receipt in the current session file are stale and must be ignored. If multiple receipts exist for the same path, the last one in file order is authoritative (append-only means file order = time order).

1. Gather data (script)

One tool call replaces all API plumbing for Phase 1 and the mechanical parts of Phase 2. Keeps ~20-30K of API response context out of the primary window.

# No arguments: use today's date
python3 box/play7-gather.py --session-file .agent/release-review/{YYYY-MM-DD}.md

# With date override from $ARGUMENTS
python3 box/play7-gather.py --date $ARGUMENTS --session-file .agent/release-review/{YYYY-MM-DD}.md

# Debug mode
python3 box/play7-gather.py --verbose --session-file .agent/release-review/{YYYY-MM-DD}.md

The gather script creates the session file if it doesn't exist and copies the gather JSON to .agent/release-review/{YYYY-MM-DD}-gather.json.

The script:

Computes target dates per cohort (today minus 7/14/21/28 days; Monday includes Saturday/Sunday for each window)
Queries Shortcut for Released stories, filters by exact completed_at match
Queries tracked_prs DB table for PRs merged on target dates
For cards: fetches comments, extracts Play 6 insight links, identifies prior Play 7 review comments
For PRs: fetches prior reviews from tracked_pr_reviews DB table
Resolves PostHog insight IDs and fetches current data (blocking refresh)
Flags items already reviewed at this week number (idempotency)
Detects re-releases (is_rerelease flag) for cards
Excludes cards without Play 6 insights and PRs without attached insights

Each item in the output has a "source" field: "card" or "pr". PR entries include "pr_number" and "pr_url" fields. Each item has an "insights" array (one element for individual insights, many for dashboard-linked cards) and an optional "dashboard_url" (set when insights came from a dashboard comment).

Output: /tmp/ff-$AGENTERMINAL_SESSION_ID/play7-gather.json with cohorts (grouped items), excluded list, and summary counts. run_date_formatted gives the day name for the Slack intro — don't derive it manually.

1a. Extract posting plan

python3 box/play7-extract.py /tmp/ff-$AGENTERMINAL_SESSION_ID/play7-gather.json --in-scope

Returns per-item JSON records with insight IDs, names, and URLs mapped to play7-post.py argument names. Field mapping:

insights[].insight_id → --insight-id
insights[].insight_name → --insight-name
cohort_week → --week (PR reviews)
numeric_id → --card-id (cards)

Use this output for the checkpoint (step 2) and to construct play7-post.py commands in steps 5-6. Add --cohort week_N to filter to a single cohort.

Classification (step 3) still reads the raw gather JSON — the extract output is a posting bridge, not a replacement for the full gather data. Items that need current_data, prior_reviews text, or last_slack_report content for classification judgment should be read from the gather JSON.

2. Checkpoint

Present the item list grouped by cohort (released/merged 1/2/3/4 weeks ago) with: card ID or PR number, title, completed/merged date, insight link, already-reviewed status. Skip items flagged already_reviewed_this_week: true. Tracked PRs show as "PR #NNNN" alongside "SC-NNN" cards.

Excluded cards: Review the excluded list. Cards excluded for "no Play 6 insight" may have been missed by Play 6 rather than intentionally skipped. Assess each: bug fixes with measurable error rates, features with trackable adoption, and billing/backend fixes with proxy signals are candidates for insights. One-off actions (notifications, data backfills) and cosmetic changes are not. Present excluded cards with assessment alongside in-scope cards.

If the user confirms any excluded cards should be added, create insights (mini Play 6: test the query, save the insight, post the "Release impact insight:" comment) and add those cards to the batch before proceeding.

Write scope to the session file. After the user confirms the item list, append to the session file:

## Scope confirmed ({timestamp})
In-scope: {comma-separated item IDs}
Excluded: {item ID (reason), ...}

## STATUS: classifying ({timestamp})

3. Classify (primary context)

This is the reasoning step. For each item (card or PR) in scope, classify into one of three outcomes:

Insufficient data: Not enough volume or time elapsed. Fewer than ~10 events total, or the insight shows zero activity. Gets a type-1 comment, no Slack post.
No material change: There is data, but the numbers, trend direction, and notable patterns look materially the same as the last Slack report for this item. Compare current insight values against the observation text in last_slack_report. Gets a type-2 comment, no Slack post.
Interesting observation: Something worth reporting. New trend, notable adoption, unexpected pattern, drop-off, anomaly, or enough data to draw a new conclusion. Gets a type-3 comment and Slack post (after approval).

Suspicious data gate: If any insight data looks wrong during classification (zero activity where there should be volume, numbers that don't match known product behavior, contradictions between related insights), stop and investigate before proceeding. Check the insight's query definition against the codebase to verify event names and property filters. Fix broken insights before classifying. Don't draft observations on data you don't trust — the normal posting flow resumes after the data is verified or corrected. Proved: Mar 23 session found a dashboard insight tracking nonexistent events.

The "no material change" comparison is always against the last Slack report (type-3 comment), not the last card comment of any type. A card can accumulate multiple type-1 or type-2 comments between Slack reports. It only comes back to Slack when something actually changed relative to the last time it appeared in the channel.

The gather script returns last_slack_report with the full text of the most recent type-3 comment. Read it. Week 1 has no prior Slack report, so it's always either insufficient data or interesting observation.

Dashboard-linked cards: When dashboard_url is set, the item has multiple insights from a full dashboard (e.g. split tests). Classify based on the overall picture across all insights, not per-insight. The Slack observation should synthesize the dashboard — link to the dashboard URL, not individual insights. Use play7-post.py slack with the first insight's ID for the chart image (pick the most representative one).

3a. Classification checkpoint

Present each classification with a one-line rationale before writing any drafts. Minimum format per item: SC-NNN: Title: [classification] — [why]. For items with prior Slack reports, state what changed or didn't relative to the last report. Wait for user confirmation before proceeding to step 4.

This is a gate, not a status update. The classification reasoning is the core judgment step — if the user can't see and redirect individual classifications, the control loop is broken for the entire drafting phase.

Product state changes. If any card in scope represents a sunset, rename, feature flag change, or new feature launch, update .claude/rules/product-knowledge.md so future sessions have the correct product state. This is an ADD trigger — the release review is where product facts change.

Write classifications to the session file before advancing to Step 4. Do not proceed to drafting until this write is complete. Append:

## Classifications ({timestamp})
| Item | Cohort | Classification | Rationale |
|---|---|---|---|
| {item_id} | {cohort} | {classification} | {one-line rationale} |
...

## STATUS: drafting ({timestamp})

This is a hard gate. A fresh agent reading this file can skip the entire classification phase and proceed directly to drafting based on these decisions. This is the highest-value recovery artifact.

Classification → review type mapping (deterministic, used for both play7-post.py comment --review-type and resume interpretation):

Final classification	Review type	Slack post?
`interesting_observation` + approved	3	Yes
`insufficient_data`	1	No
`no_material_change`	2	No

4. Draft observations

For each card classified as "interesting observation":

Draft in full Slack message format — the approved text is posted directly to Slack with no reformatting:

Cards:

*<shortcut_url|SC-NNN: Card title>*  · released [N] week/weeks ago ([Mon Day])

[TLDR — one sentence, the headline takeaway. Analytical, leads with the
conclusion, not the data.]

• [Supporting detail 1 — specific number with time window, comparison, or trend]
• [Supporting detail 2]
• [Supporting detail 3, if needed]

<posthog_insight_url|View insight in PostHog>

Tracked PRs:

*<pr_url|PR #NNNN: PR title>*  · merged [N] week/weeks ago ([Mon Day])

[TLDR — one sentence, the headline takeaway.]

• [Supporting detail 1]
• [Supporting detail 2]

<posthog_insight_url|View insight in PostHog>

Dashboard-linked cards use the dashboard URL in the footer instead of an individual insight URL.

Use • for bullets. Include specific numbers with time windows. The TLDR is what you'd say if someone asked "so what happened with that feature?" — lead with the conclusion, not the data. Tone varies per card: positive, analytical, neutral, whatever fits.

This is the reasoning step. Not "the number went up." What does the data mean in context? Is adoption tracking expectations? Any patterns worth investigating? Is the feature solving the described problem?
Don't narrativize noisy data. Small week-over-week volume changes don't tell a story. Low per-user counts are ambiguous (not evidence of retention or abandonment). Spikes need user-level investigation before conclusions. Resist constructing narratives ("rebound," "filling a workflow gap," "recurring pattern") that read well but aren't supported by the numbers.
Strip process language. Slack output should read like a product performance update, not an internal process artifact. The audience cares about what happened and what it means — not how we track it or where a card sits in a review lifecycle. Specifically, never include:
- Review cadence references ("final automated review," "week 4 review," "exits the review window," "this card's last check")
- Play numbers, cohort week labels
- Any framing that implies the reader needs to understand our 4-week review window or internal process to parse the message State facts directly. "The instrumentation is catching anomalies" not "This is the final automated review (week 4). The instrumentation is catching anomalies."
Resolve answerable questions before drafting. When a draft would contain uncertainty that could be resolved from primary sources (codebase, database, PostHog), resolve it before including it. "Either X or Y" when reading a file would tell you which one is a draft quality failure. Read the code, query the data, then write the draft.
Write all drafts to /tmp/ff-$AGENTERMINAL_SESSION_ID/play7-drafts.md, one section per item (e.g. ## SC-562 or ## PR #2927), with the full Slack message text (header + observation + footer) under each header. Include the classification for each item.

Copy the drafts file to the session directory:

cp /tmp/ff-$AGENTERMINAL_SESSION_ID/play7-drafts.md .agent/release-review/{YYYY-MM-DD}-drafts.md

Then append to the session file:

## Drafts ({timestamp})
Path: .agent/release-review/{YYYY-MM-DD}-drafts.md
Items drafted: {comma-separated item IDs}

4a. Quality review + chart export (parallel)

Two things happen in parallel while the user isn't waiting:

Chart export: For each item classified as "interesting observation," export the chart PNG and check for blank rendering:

python3 box/play7-post.py export-chart --insight-id NNNNN --session-file .agent/release-review/{YYYY-MM-DD}.md

When --session-file is set, the script writes the PNG directly to .agent/release-review/play7-insight-{insight_id}.png (durable path) and appends a per-chart receipt to the session file.

Returns {"ok": true, "path": "...", "is_blank": true/false}. For dashboard-linked cards, export the representative insight's chart. Record which charts are blank — these items will be posted as text-only in Phase 5 (no chart attachment).

Quality review: Delegate a review agent to check all drafts against quality criteria. This catches systematic issues (narrativizing, unresolved questions, unsupported claims) before they reach the user's approval queue.

Delegate with agent: "claude", model: "claude-sonnet-4-6". Don't set timeout_ms — the default is the maximum (30 min).

Pass session recovery parameters on the delegate call:

session_file: ".agent/release-review/{YYYY-MM-DD}.md"
session_label: "quality-review"
save_path: ".agent/release-review/{YYYY-MM-DD}-quality-review.json"

When the delegate push arrives, verify saved_to= and append the result saved/failed receipt to the session file (same pattern as fill-cards). Pass the drafts file and the gather JSON (for fact-checking numbers against source data). Include the review criteria verbatim in the prompt.

Review criteria (include in the delegate prompt):

Narrativizing: Does any draft construct a causal story not supported by the data? Look for: attributing user motivation ("when users plan their week"), framing noise as signal ("rebound," "filling a workflow gap"), interpreting ambiguous patterns as trends. The fix is to state the observation without the causal frame.
Unresolved answerable questions: Does any draft present uncertainty ("either X or Y," "suggests," "may indicate") for something that could be resolved by reading code or querying data? If the codebase or database could answer the question, flag it. The primary agent needs to do the investigation and revise — the reviewer just identifies the gaps.
Unsupported causal claims: Does any draft claim one thing caused another without evidence? Correlation language ("consistent with") is acceptable; causal language ("because," "driven by," "due to") requires supporting evidence in the draft.
Process language: Does any draft reference internal process — review cadence, play numbers, week labels, "exits the review window," or any framing that requires understanding the 4-week review system to parse?

Delegate output format: For each draft, return either PASS or a list of issues. Each issue should quote the problematic text and name the criterion violated. Also verify that specific numbers in the drafts match the source data in the gather JSON.

After collecting the review: First, re-read the gather JSON (or the data sections for each flagged card) and verify each reviewer finding against the source data before applying any edits. At least one numerical claim must be verified via a Read or Bash call against the gather JSON — "I remember X" is not verification even when the data was recently in context. Proved 2026-04-21: agent claimed spot-check of all 8 findings without a Read call; monitor caught at high severity. Proved 2026-04-24: all spot-checks performed from context memory; numbers correct but method was proxy trust on own recall.

Then revise any drafts that received issues. If a flagged issue requires primary source investigation (criterion 2), do the investigation (read code, query data), then revise. Don't soften the language — resolve the underlying question. If a criterion-1 flag is accurate, cut the narrative framing and state the observation directly.

4b. Approve observations

Present each revised draft via approve_content with content_type: "observation", filename: "scNNN-observation-wkN" (or "prNNNN-observation-wkN" for PRs). One at a time. The user may edit, change classification, or skip.

The approved text is the exact Slack message — no separate Slack message approval step. It also serves as the source for Shortcut comments (Phase 6 extracts the observation body, stripping the Slack header and footer).

The saved file is compaction insurance — read the saved_path to recover approved text after compaction.

Log every approval outcome to the session file. Only interesting_observation items reach step 4b. Items classified as insufficient_data or no_material_change are terminally dispositioned by the Classifications table — they skip 4b and proceed to type-1/type-2 comments in step 6.

For each interesting_observation item presented, append one of:

Approved:

### {item_id} approved ({timestamp})
Path: {saved_path}

Reclassified (user decides this isn't an interesting observation):

### {item_id} reclassified ({timestamp})
Was: interesting_observation -> Now: {insufficient_data|no_material_change}
Reason: {user's reason}

After all interesting_observation items have been dispositioned, append:

## STATUS: approved ({timestamp})

Execution order: Post Slack report (Phase 5) first, then Shortcut review comments (Phase 6). Slack is the audience-facing surface; card comments are the trailing record.

4c. Tracking gaps assessment

Systematic pass across all reviewed items: "Is there instrumentation missing that would make the next review cycle more useful?"

For each item, check:

Are all planned events actually firing? (Compare insight description against codebase emit calls.)
Does the insight have the right denominator? (Toggle counts without session counts can't produce rates.)
Can the insight answer the card's original question? (A mechanism-fires event without downstream conversion tracking can't measure goal achievement.)
Are there known failure paths that bypass instrumentation? (Client-side timeouts that don't fire server events.)

Present confirmed gaps as a consolidated list. Each gap that warrants a fix is a potential story — post to #ideas (one thread: parent summary + one reply per gap) where sync-ideas will pick them up. Lower-priority enrichment opportunities (property additions, breakdown expansions) can be noted but don't need #ideas posts unless they're blocking a question the team is actively asking.

4d. Ship-phase checkpoint

Before posting anything, present the full batch plan to the user:

"Entering ship phase. [N] Slack messages (1 intro + [N-1] observations) and [M] Shortcut/PR review comments planned.

Slack: each post verified via channel history read-back (play7-post.py). Shortcut: each comment verified via card comment re-fetch (play7-post.py). PR reviews: each recorded to DB via play7-post.py pr-review.

Proceeding with posting. First: Slack intro."

Wait for user acknowledgment before the first execute_approved call. This is the analytical-to-mechanical boundary — the moment where verification, narration, and per-item judgment historically degrade. The pause makes the batch size visible and gives the user an opportunity to steer pacing before momentum builds.

Run python3 box/ship-gate.py enter before the first production execute_approved call (the hook will block production mutations until the plan is declared).

Append to the session file:

## STATUS: shipping ({timestamp})

5. Post Slack report

If zero cards classified as interesting observation, skip the Slack report entirely. No intro, no card messages. Card comments (Phase 6) still get posted.

Intro message:

:bar_chart: Release impact review — [Day of week], [Month Day, Year]

[N] items reviewed from the last 4 weeks ([X] cards, [Y] tracked PRs).
[M] with updates worth reporting today.

Use run_date_formatted from the gather script for the day name.

python3 box/play7-post.py slack-intro \
  --channel C0AGD4ZEC6M \
  --text ":bar_chart: Release impact review — Wednesday, February 19, 2026\n\n4 released cards reviewed from the last 4 weeks.\n2 with updates worth reporting today." \
  --session-file .agent/release-review/{YYYY-MM-DD}.md

Per-item message:

Each item uses the approved observation text from step 4b directly — no reformatting. Items with non-blank charts are posted as file uploads (chart appears inline below the text). Items with blank or failed charts use --no-chart for text-only posting.

# With pre-exported chart (non-blank) — use durable path from export-chart:
python3 box/play7-post.py slack \
  --insight-id 12345 \
  --insight-name "SC-123: Feature engagement" \
  --channel C0AGD4ZEC6M \
  --chart-path .agent/release-review/play7-insight-12345.png \
  --comment "*<https://app.shortcut.com/tailwind/story/123|SC-123: Card title>*  · released 2 weeks ago (Feb 5)\n\nAdoption hit 142 unique users...\n\n<https://us.posthog.com/project/161414/insights/abc123|View insight in PostHog>" \
  --session-file .agent/release-review/{YYYY-MM-DD}.md

# Without chart (blank or export failed — use --no-chart):
python3 box/play7-post.py slack \
  --insight-id 12345 \
  --insight-name "SC-123: Feature engagement" \
  --channel C0AGD4ZEC6M \
  --no-chart \
  --comment "..." \
  --session-file .agent/release-review/{YYYY-MM-DD}.md

The slack subcommand uses --chart-path to attach a pre-exported chart. When the chart was detected as blank in step 4a, use --no-chart to post as text-only via chat.postMessage (no file upload, no fallback export).

All posts go through execute_approved with timeout_ms: 120000. Post order: intro first, then cards in cohort order (oldest releases first).

No separate Slack content approval. The observation was already approved in Slack message format in step 4b. Post via execute_approved using the approved text directly as the --comment argument.

Post-mutation verification: After posting the first Slack item, pause and narrate what was posted before continuing. This is the highest-risk transition in the play — completion bias activates here ("analysis is done, just posting now") and suppresses verification. Narrate each subsequent post individually. Do not batch-announce and silently execute.

Minimum per-post narration (structural floor — prevents compression toward timestamp-only citations under batch repetition):

[ID] posted — [script] re-fetched [surface] and confirmed [what exists]. [Next item]:

Example: "SC-1045 posted — play7-post.py re-fetched channel history and confirmed message with chart exists at ts X. Posting SC-234:"

6. Post review records

For every item examined, post a review record. For cards, this is a Shortcut comment. For tracked PRs, this is a DB record via play7-post.py pr-review. The "released/merged N week(s) ago" marker (use proper singular/plural: "1 week ago," "2 weeks ago") indicates how long post-release this review is.

Type 1: Insufficient data

Release impact review (released [N] week/weeks ago): Reviewed [insight name].
Insufficient data to report — [brief reason, e.g. "12 events total since
release"].

Type 2: No material change

Release impact review (released [N] week/weeks ago): Reviewed [insight name].
No material change since [date of last Slack report].

Type 3: Observation (approved for Slack)

Release impact review (released [N] week/weeks ago): [The approved observation
text]. Reported in #post-release-measurement.

Post each via execute_approved (wraps already-approved observation text in fixed format):

# For cards:
python3 box/play7-post.py comment \
  --card-id 123 \
  --review-type 1 \
  --text "Release impact review (released 2 weeks ago): Reviewed SC-123: Feature engagement. Insufficient data to report — 8 events total since release." \
  --session-file .agent/release-review/{YYYY-MM-DD}.md

# For tracked PRs:
python3 box/play7-post.py pr-review \
  --pr 2927 --type 1 --week 1 \
  --text "Release impact review (merged 1 week ago): Reviewed PR #2927: Pin spacing fix. Insufficient data to report — 3 events total since merge." \
  --session-file .agent/release-review/{YYYY-MM-DD}.md

Pass --session-file to all play7-post.py calls. The script appends receipts after its built-in verification succeeds. The agent does NOT manually append ship-phase entries. Pass --review-type {1|2|3} on all comment calls (use the classification → review type mapping from step 3a).

After all posts and comments are complete, append to the session file:

## STATUS: complete ({timestamp})

Tracking PRs

When meaningful work ships via PR without a Shortcut card, register it for review:

# Auto-fetches title and merge date from GitHub:
python3 box/tracked-prs.py add --pr NNNN

# With manual values (e.g. PR not yet merged):
python3 box/tracked-prs.py add --pr NNNN --title "..." --merged-at YYYY-MM-DD

# Attach a PostHog insight:
python3 box/tracked-prs.py attach-insight --pr NNNN --insight-id abc123 \
  --insight-url "https://..." --insight-name "PR #NNNN: ..."

# List all tracked PRs:
python3 box/tracked-prs.py list

# List PRs due for review today:
python3 box/tracked-prs.py list --due

Entry points: sync-ideas (when a cardless PR is spotted in Slack), or any session where a shipped PR is noticed. The next Play 7 run picks it up automatically.

Idempotency

Before each mutation:

Review records: the gather script includes already_reviewed_this_week per item (for cards: scoped to current release cycle via completed_at; for PRs: checked against tracked_pr_reviews). Skip if true.
Slack posts: check the channel for a message containing the card ID or PR number and today's date. If it exists, skip.
For cards: re-fetch card state and comments from Shortcut before each operation. For PRs: the DB is the source of truth.

Key behaviors

An item's four-week review window starts from completed_at (cards) or merged_at (PRs). After week 4, the item falls out of scope.
Cards without Play 6 insights and PRs without attached insights are excluded. This play reads existing insights, not creates new ones (except at the checkpoint for missed cards).
The "interesting observation" judgment is the core value. It's a reasoning step that considers the item's purpose, the data shape, and what the team would find useful to know.
If an item's insight was deleted or inaccessible in PostHog, flag it to the user rather than silently skipping.

name	release-review
description	Weekly release impact review — pull PostHog data for Released cards and tracked PRs, classify, draft observations, post to Slack
disable-model-invocation	true

Release Review

Constraints

Mutation gate: A PreToolUse hook blocks all Slack and Shortcut mutations through Bash. Route through agenterminal.execute_approved or present the command for the user to run.
Human-in-the-loop: Present scoped card list at checkpoint (end of Phase 1). Present each observation draft via approve_content before posting.
Context protection: play7-gather.py (Phase 1+2a) and play7-post.py (Phase 4-5) keep mechanical API work out of primary context. Primary context is reserved for classification and observation drafting.
Before Shortcut or Slack API calls: check reference/tooling-logistics.md for tested recipes. Don't re-derive payload shapes or endpoint paths from scratch.

Constants

Constant	Value
#post-release-measurement	`C0AGD4ZEC6M`
PostHog project ID	`161414`

Cadence

Run daily. Each run picks up cards released on the same day of the week 1-4 weeks ago. A card released on a Tuesday gets checked on the next four Tuesdays. No batching across days of the week.

Arguments

$ARGUMENTS is an optional date override (format: YYYY-MM-DD). If omitted, uses today's date.

Steps

0. Resume check

Before anything else, determine the run date (from $ARGUMENTS or today) and check for an existing session file:

ls .agent/release-review/{YYYY-MM-DD}.md 2>/dev/null

If the session file exists for today's run date:

Read it in full.
Report current state to the user based only on what is written in the session file. Name which steps completed, which items are in scope, what classifications were made, which drafts were approved, and which posts have been sent.
Ask whether to resume from that point or start fresh.
Wait for explicit direction before proceeding.

If the user chooses "start fresh": archive the existing session file and all its sidecars:

STAMP=$(date +%s)
mv .agent/release-review/{YYYY-MM-DD}.md .agent/release-review/{YYYY-MM-DD}.bak.$STAMP.md
for f in .agent/release-review/{YYYY-MM-DD}-*; do
  [ -e "$f" ] && mv "$f" "$f.bak.$STAMP" 2>/dev/null
done

Then continue to Step 1. The gather script will create a new session file.

If no session file exists, continue to Step 1.

1. Gather data (script)

One tool call replaces all API plumbing for Phase 1 and the mechanical parts of Phase 2. Keeps ~20-30K of API response context out of the primary window.

# No arguments: use today's date
python3 box/play7-gather.py --session-file .agent/release-review/{YYYY-MM-DD}.md

# With date override from $ARGUMENTS
python3 box/play7-gather.py --date $ARGUMENTS --session-file .agent/release-review/{YYYY-MM-DD}.md

# Debug mode
python3 box/play7-gather.py --verbose --session-file .agent/release-review/{YYYY-MM-DD}.md

The gather script creates the session file if it doesn't exist and copies the gather JSON to .agent/release-review/{YYYY-MM-DD}-gather.json.

The script:

Computes target dates per cohort (today minus 7/14/21/28 days; Monday includes Saturday/Sunday for each window)
Queries Shortcut for Released stories, filters by exact completed_at match
Queries tracked_prs DB table for PRs merged on target dates
For cards: fetches comments, extracts Play 6 insight links, identifies prior Play 7 review comments
For PRs: fetches prior reviews from tracked_pr_reviews DB table
Resolves PostHog insight IDs and fetches current data (blocking refresh)
Flags items already reviewed at this week number (idempotency)
Detects re-releases (is_rerelease flag) for cards
Excludes cards without Play 6 insights and PRs without attached insights

1a. Extract posting plan

python3 box/play7-extract.py /tmp/ff-$AGENTERMINAL_SESSION_ID/play7-gather.json --in-scope

Returns per-item JSON records with insight IDs, names, and URLs mapped to play7-post.py argument names. Field mapping:

insights[].insight_id → --insight-id
insights[].insight_name → --insight-name
cohort_week → --week (PR reviews)
numeric_id → --card-id (cards)

Use this output for the checkpoint (step 2) and to construct play7-post.py commands in steps 5-6. Add --cohort week_N to filter to a single cohort.

2. Checkpoint

Write scope to the session file. After the user confirms the item list, append to the session file:

## Scope confirmed ({timestamp})
In-scope: {comma-separated item IDs}
Excluded: {item ID (reason), ...}

## STATUS: classifying ({timestamp})

3. Classify (primary context)

This is the reasoning step. For each item (card or PR) in scope, classify into one of three outcomes:

Insufficient data: Not enough volume or time elapsed. Fewer than ~10 events total, or the insight shows zero activity. Gets a type-1 comment, no Slack post.
No material change: There is data, but the numbers, trend direction, and notable patterns look materially the same as the last Slack report for this item. Compare current insight values against the observation text in last_slack_report. Gets a type-2 comment, no Slack post.
Interesting observation: Something worth reporting. New trend, notable adoption, unexpected pattern, drop-off, anomaly, or enough data to draw a new conclusion. Gets a type-3 comment and Slack post (after approval).

3a. Classification checkpoint

Write classifications to the session file before advancing to Step 4. Do not proceed to drafting until this write is complete. Append:

## Classifications ({timestamp})
| Item | Cohort | Classification | Rationale |
|---|---|---|---|
| {item_id} | {cohort} | {classification} | {one-line rationale} |
...

## STATUS: drafting ({timestamp})

This is a hard gate. A fresh agent reading this file can skip the entire classification phase and proceed directly to drafting based on these decisions. This is the highest-value recovery artifact.

Classification → review type mapping (deterministic, used for both play7-post.py comment --review-type and resume interpretation):

Final classification	Review type	Slack post?
`interesting_observation` + approved	3	Yes
`insufficient_data`	1	No
`no_material_change`	2	No

4. Draft observations

For each card classified as "interesting observation":

Draft in full Slack message format — the approved text is posted directly to Slack with no reformatting:

Cards:

*<shortcut_url|SC-NNN: Card title>*  · released [N] week/weeks ago ([Mon Day])

[TLDR — one sentence, the headline takeaway. Analytical, leads with the
conclusion, not the data.]

• [Supporting detail 1 — specific number with time window, comparison, or trend]
• [Supporting detail 2]
• [Supporting detail 3, if needed]

<posthog_insight_url|View insight in PostHog>

Tracked PRs:

*<pr_url|PR #NNNN: PR title>*  · merged [N] week/weeks ago ([Mon Day])

[TLDR — one sentence, the headline takeaway.]

• [Supporting detail 1]
• [Supporting detail 2]

<posthog_insight_url|View insight in PostHog>

Dashboard-linked cards use the dashboard URL in the footer instead of an individual insight URL.

This is the reasoning step. Not "the number went up." What does the data mean in context? Is adoption tracking expectations? Any patterns worth investigating? Is the feature solving the described problem?
Don't narrativize noisy data. Small week-over-week volume changes don't tell a story. Low per-user counts are ambiguous (not evidence of retention or abandonment). Spikes need user-level investigation before conclusions. Resist constructing narratives ("rebound," "filling a workflow gap," "recurring pattern") that read well but aren't supported by the numbers.
Strip process language. Slack output should read like a product performance update, not an internal process artifact. The audience cares about what happened and what it means — not how we track it or where a card sits in a review lifecycle. Specifically, never include:
- Review cadence references ("final automated review," "week 4 review," "exits the review window," "this card's last check")
- Play numbers, cohort week labels
- Any framing that implies the reader needs to understand our 4-week review window or internal process to parse the message State facts directly. "The instrumentation is catching anomalies" not "This is the final automated review (week 4). The instrumentation is catching anomalies."
Resolve answerable questions before drafting. When a draft would contain uncertainty that could be resolved from primary sources (codebase, database, PostHog), resolve it before including it. "Either X or Y" when reading a file would tell you which one is a draft quality failure. Read the code, query the data, then write the draft.
Write all drafts to /tmp/ff-$AGENTERMINAL_SESSION_ID/play7-drafts.md, one section per item (e.g. ## SC-562 or ## PR #2927), with the full Slack message text (header + observation + footer) under each header. Include the classification for each item.

Copy the drafts file to the session directory:

cp /tmp/ff-$AGENTERMINAL_SESSION_ID/play7-drafts.md .agent/release-review/{YYYY-MM-DD}-drafts.md

Then append to the session file:

## Drafts ({timestamp})
Path: .agent/release-review/{YYYY-MM-DD}-drafts.md
Items drafted: {comma-separated item IDs}

4a. Quality review + chart export (parallel)

Two things happen in parallel while the user isn't waiting:

Chart export: For each item classified as "interesting observation," export the chart PNG and check for blank rendering:

python3 box/play7-post.py export-chart --insight-id NNNNN --session-file .agent/release-review/{YYYY-MM-DD}.md

When --session-file is set, the script writes the PNG directly to .agent/release-review/play7-insight-{insight_id}.png (durable path) and appends a per-chart receipt to the session file.

Delegate with agent: "claude", model: "claude-sonnet-4-6". Don't set timeout_ms — the default is the maximum (30 min).

Pass session recovery parameters on the delegate call:

session_file: ".agent/release-review/{YYYY-MM-DD}.md"
session_label: "quality-review"
save_path: ".agent/release-review/{YYYY-MM-DD}-quality-review.json"

Review criteria (include in the delegate prompt):

Narrativizing: Does any draft construct a causal story not supported by the data? Look for: attributing user motivation ("when users plan their week"), framing noise as signal ("rebound," "filling a workflow gap"), interpreting ambiguous patterns as trends. The fix is to state the observation without the causal frame.
Unresolved answerable questions: Does any draft present uncertainty ("either X or Y," "suggests," "may indicate") for something that could be resolved by reading code or querying data? If the codebase or database could answer the question, flag it. The primary agent needs to do the investigation and revise — the reviewer just identifies the gaps.
Unsupported causal claims: Does any draft claim one thing caused another without evidence? Correlation language ("consistent with") is acceptable; causal language ("because," "driven by," "due to") requires supporting evidence in the draft.
Process language: Does any draft reference internal process — review cadence, play numbers, week labels, "exits the review window," or any framing that requires understanding the 4-week review system to parse?

4b. Approve observations

The saved file is compaction insurance — read the saved_path to recover approved text after compaction.

For each interesting_observation item presented, append one of:

Approved:

### {item_id} approved ({timestamp})
Path: {saved_path}

Reclassified (user decides this isn't an interesting observation):

### {item_id} reclassified ({timestamp})
Was: interesting_observation -> Now: {insufficient_data|no_material_change}
Reason: {user's reason}

After all interesting_observation items have been dispositioned, append:

## STATUS: approved ({timestamp})

Execution order: Post Slack report (Phase 5) first, then Shortcut review comments (Phase 6). Slack is the audience-facing surface; card comments are the trailing record.

4c. Tracking gaps assessment

Systematic pass across all reviewed items: "Is there instrumentation missing that would make the next review cycle more useful?"

For each item, check:

Are all planned events actually firing? (Compare insight description against codebase emit calls.)
Does the insight have the right denominator? (Toggle counts without session counts can't produce rates.)
Can the insight answer the card's original question? (A mechanism-fires event without downstream conversion tracking can't measure goal achievement.)
Are there known failure paths that bypass instrumentation? (Client-side timeouts that don't fire server events.)

4d. Ship-phase checkpoint

Before posting anything, present the full batch plan to the user:

"Entering ship phase. [N] Slack messages (1 intro + [N-1] observations) and [M] Shortcut/PR review comments planned.

Proceeding with posting. First: Slack intro."

Run python3 box/ship-gate.py enter before the first production execute_approved call (the hook will block production mutations until the plan is declared).

Append to the session file:

## STATUS: shipping ({timestamp})

5. Post Slack report

If zero cards classified as interesting observation, skip the Slack report entirely. No intro, no card messages. Card comments (Phase 6) still get posted.

Intro message:

:bar_chart: Release impact review — [Day of week], [Month Day, Year]

[N] items reviewed from the last 4 weeks ([X] cards, [Y] tracked PRs).
[M] with updates worth reporting today.

Use run_date_formatted from the gather script for the day name.

python3 box/play7-post.py slack-intro \
  --channel C0AGD4ZEC6M \
  --text ":bar_chart: Release impact review — Wednesday, February 19, 2026\n\n4 released cards reviewed from the last 4 weeks.\n2 with updates worth reporting today." \
  --session-file .agent/release-review/{YYYY-MM-DD}.md

Per-item message:

# With pre-exported chart (non-blank) — use durable path from export-chart:
python3 box/play7-post.py slack \
  --insight-id 12345 \
  --insight-name "SC-123: Feature engagement" \
  --channel C0AGD4ZEC6M \
  --chart-path .agent/release-review/play7-insight-12345.png \
  --comment "*<https://app.shortcut.com/tailwind/story/123|SC-123: Card title>*  · released 2 weeks ago (Feb 5)\n\nAdoption hit 142 unique users...\n\n<https://us.posthog.com/project/161414/insights/abc123|View insight in PostHog>" \
  --session-file .agent/release-review/{YYYY-MM-DD}.md

# Without chart (blank or export failed — use --no-chart):
python3 box/play7-post.py slack \
  --insight-id 12345 \
  --insight-name "SC-123: Feature engagement" \
  --channel C0AGD4ZEC6M \
  --no-chart \
  --comment "..." \
  --session-file .agent/release-review/{YYYY-MM-DD}.md

All posts go through execute_approved with timeout_ms: 120000. Post order: intro first, then cards in cohort order (oldest releases first).

Minimum per-post narration (structural floor — prevents compression toward timestamp-only citations under batch repetition):

[ID] posted — [script] re-fetched [surface] and confirmed [what exists]. [Next item]:

Example: "SC-1045 posted — play7-post.py re-fetched channel history and confirmed message with chart exists at ts X. Posting SC-234:"

6. Post review records

Type 1: Insufficient data

Release impact review (released [N] week/weeks ago): Reviewed [insight name].
Insufficient data to report — [brief reason, e.g. "12 events total since
release"].

Type 2: No material change

Release impact review (released [N] week/weeks ago): Reviewed [insight name].
No material change since [date of last Slack report].

Type 3: Observation (approved for Slack)

Release impact review (released [N] week/weeks ago): [The approved observation
text]. Reported in #post-release-measurement.

Post each via execute_approved (wraps already-approved observation text in fixed format):

# For cards:
python3 box/play7-post.py comment \
  --card-id 123 \
  --review-type 1 \
  --text "Release impact review (released 2 weeks ago): Reviewed SC-123: Feature engagement. Insufficient data to report — 8 events total since release." \
  --session-file .agent/release-review/{YYYY-MM-DD}.md

# For tracked PRs:
python3 box/play7-post.py pr-review \
  --pr 2927 --type 1 --week 1 \
  --text "Release impact review (merged 1 week ago): Reviewed PR #2927: Pin spacing fix. Insufficient data to report — 3 events total since merge." \
  --session-file .agent/release-review/{YYYY-MM-DD}.md

After all posts and comments are complete, append to the session file:

## STATUS: complete ({timestamp})

Tracking PRs

When meaningful work ships via PR without a Shortcut card, register it for review:

# Auto-fetches title and merge date from GitHub:
python3 box/tracked-prs.py add --pr NNNN

# With manual values (e.g. PR not yet merged):
python3 box/tracked-prs.py add --pr NNNN --title "..." --merged-at YYYY-MM-DD

# Attach a PostHog insight:
python3 box/tracked-prs.py attach-insight --pr NNNN --insight-id abc123 \
  --insight-url "https://..." --insight-name "PR #NNNN: ..."

# List all tracked PRs:
python3 box/tracked-prs.py list

# List PRs due for review today:
python3 box/tracked-prs.py list --due

Entry points: sync-ideas (when a cardless PR is spotted in Slack), or any session where a shipped PR is noticed. The next Play 7 run picks it up automatically.

Idempotency

Before each mutation:

Review records: the gather script includes already_reviewed_this_week per item (for cards: scoped to current release cycle via completed_at; for PRs: checked against tracked_pr_reviews). Skip if true.
Slack posts: check the channel for a message containing the card ID or PR number and today's date. If it exists, skip.
For cards: re-fetch card state and comments from Shortcut before each operation. For PRs: the DB is the source of truth.

Key behaviors

An item's four-week review window starts from completed_at (cards) or merged_at (PRs). After week 4, the item falls out of scope.
Cards without Play 6 insights and PRs without attached insights are excluded. This play reads existing insights, not creates new ones (except at the checkpoint for missed cards).
The "interesting observation" judgment is the core value. It's a reasoning step that considers the item's purpose, the data shape, and what the team would find useful to know.
If an item's insight was deleted or inaccessible in PostHog, flag it to the user rather than silently skipping.

release-review

同仓库更多 Skills

同仓库更多 Skills

Release Review

Also Read

Constraints

Constants

Cadence

Arguments

Steps

0. Resume check

1. Gather data (script)

1a. Extract posting plan

2. Checkpoint

3. Classify (primary context)

3a. Classification checkpoint

4. Draft observations

4a. Quality review + chart export (parallel)

4b. Approve observations

4c. Tracking gaps assessment

4d. Ship-phase checkpoint

5. Post Slack report

6. Post review records

Tracking PRs

Idempotency

Key behaviors

Release Review

Also Read

Constraints

Constants

Cadence

Arguments

Steps

0. Resume check

1. Gather data (script)

1a. Extract posting plan

2. Checkpoint

3. Classify (primary context)

3a. Classification checkpoint

4. Draft observations

4a. Quality review + chart export (parallel)

4b. Approve observations

4c. Tracking gaps assessment

4d. Ship-phase checkpoint

5. Post Slack report

6. Post review records

Tracking PRs

Idempotency

Key behaviors