一键导入
bug-discovery
Find untracked product bugs by searching Intercom for symptom keywords, cross-referencing PostHog failure events, and tracing codebase failure paths
用 Codex 或 Claude 帮你安装 复制这段 Prompt,粘贴到 Codex、Claude 或其他助手里,让它检查 Skill 页面并帮你完成安装。
菜单
Find untracked product bugs by searching Intercom for symptom keywords, cross-referencing PostHog failure events, and tracing codebase failure paths
用 Codex 或 Claude 帮你安装 复制这段 Prompt,粘贴到 Codex、Claude 或其他助手里,让它检查 Skill 页面并帮你完成安装。
基于 SOC 职业分类
Refresh priority signals, manage active epics, and refresh Near Term pool
Multi-session deliverable play for projects spanning 3+ sessions with concrete outputs (proposals, strategies, wireframes). Provides project-level structure, evidence provenance, and cross-session handoff.
Investigation-driven card grooming — investigate across data sources, synthesize findings into card content, present for approval
Use when acting as a reviewer in an AgenTerminal review conversation. Handles both code reviews (REVIEW_APPROVED) and plan reviews (PLAN_APPROVED).
Match Slack
Weekly release impact review — pull PostHog data for Released cards and tracked PRs, classify, draft observations, post to Slack
| name | bug-discovery |
| description | Find untracked product bugs by searching Intercom for symptom keywords, cross-referencing PostHog failure events, and tracing codebase failure paths |
| disable-model-invocation | true |
Find untracked product bugs by searching Intercom for symptom keywords, cross-referencing reporters against PostHog failure events, and tracing failure paths in the codebase. Produces a lean bug card in Backlog.
This is the bug counterpart to Recurring Request (Play 4). Play 4 hunts for untracked feature requests using topic nouns. This play hunts for untracked bugs using symptom language.
$ARGUMENTS is optional:
{keyword or symptom}: Skip broad keyword scoring — focus the search on
the specified symptom term(s).Before starting, read these shared sections from box/shortcut-ops.md:
agenterminal.execute_approved or present the
command for the user to run.reference/tooling-logistics.md for
tested recipes. Don't re-derive payload shapes or endpoint paths.python3 box/intercom-search.py "keyword"
python3 box/shortcut-cards.py --query 'title:"symptom keyword"' --summary
A title match is a candidate, not confirmation. If the search returns
results, read the matched card (shortcut-cards.py --id) and verify the
card's scope actually covers the discovered bug. Title keyword overlap
between different-scope bugs ("stuck in queue" vs "stuck spinning") silently
kills valid discoveries. Proved 2026-04-29 in sweep-channels: symptom
similarity ("scrape failure") assumed to match a specific card (Lambda
timeout) without verifying the mechanism.
Gate: Mark each cluster as tracked/untracked before proceeding to Step 2.
Do not read Intercom conversations for tracked clusters. Proved twice
(2026-03-30, 2026-04-07): reading conversations then discovering existing
Shortcut cards wastes 10-20 min per cluster.python3 box/intercom-search.py --read ID1,ID2,ID3
Search snippets are not evidence. Read the actual conversation text
before citing it.jam.dev/c/...), pull
structured debug data via Jam MCP: getDetails first (overview +
investigation guide), then getNetworkRequests (filter by statusCode="4xx"
or "5xx"), getConsoleLogs for errors, getMetadata for browser/OS
context, and getUserEvents for the reproduction timeline. This can surface
the technical root cause directly from the user's session.Delegate exploration using agenterminal.delegate with the verified explore
prompt (box/verified-explore-prompt.md) for broad traces. For single-file
lookups ("does route X exist", "what does function Y do"), grep or direct read
is faster.
approve_content with content_type: "card-draft",
filename: "scNNN".Dispatch two independent verification agents in parallel. State which verifiers you're dispatching and why — if skipping one, state the rationale explicitly. Silent omission is batch momentum.
a) Codebase verification:
agenterminal.delegate with the prompt from box/card-verification-prompt.md,
filling {card_draft_path} with the saved file path.b) Intercom evidence audit (multi-pass):
agenterminal.delegate with the prompt from box/intercom-evidence-audit-prompt.md,
filling {card_what_section} with ONLY the card's What section. Do not pass
the Evidence section — the auditor must reason independently about search terms.Dispatch both in the same message (they have no dependency). Then collect both:
agenterminal.collect both results (checkpoints will fire).all_verified: true — pick 1-2 claim numbers from the
delegate's report and run a fresh Read/Grep against the delegate's specific
source_file and exact_evidence. Pre-delegation reads that confirm the
same underlying facts do not count — the spot-check verifies the delegate's
work product, not the facts themselves. Proved 2026-04-23: SC-1760.
If all_verified: false —
present the verification report. User decides whether to fix or ship as-is.total_relevant > 0 — present the found conversations
alongside existing evidence. Read any promising ones via --read. User
decides whether to add them to the card. If total_relevant == 0 — note
"Multi-pass audit: no additional signal" in the checklist.Do not auto-fix failed codebase claims or auto-add Intercom conversations. Present findings, let the user decide.
If the card was modified after the initial approve_content — verification fixes,
evidence additions, any edits — re-present via approve_content before proceeding
to ship. The initial approval covered the pre-verification draft, not the post-fix
version. The user needs to see and approve what will actually be pushed.
Re-approval cycle: Read → Edit → Read → Submit. Read the saved file at the
saved_path from the previous approval. Edit the specific change. Re-read to
verify no other instances of the problem remain (e.g., grep for all local file
paths, not just the one flagged). Then submit. Do not reconstruct the card
content from context memory — context memory is a proxy for the file, and proxy
trust activates on your own prior output. Proved 2026-04-08: SC-1311.
agenterminal.execute_approved:
python3 box/shortcut-ship.py SC-NNN <saved_path> --severity N
The script executes the PUT, re-fetches the card, and verifies inline
(state, owners, section headers, description length). Exit 0 = verified,
1 = verification failed, 2 = mutation failed. Set the Severity custom
field and record DIC severity via
python3 box/framework-rank.py score SC-NNN --severity N.box/posthog-events.md.When $ARGUMENTS is --telemetry, start from PostHog failure dashboards
instead of Intercom symptom searches.
Play 5 starts from Intercom (what users say). This mode starts from PostHog (what the product does). They share Steps 4-8 but differ in discovery. Play 5 catches bugs users report in words we guessed. This mode catches bugs the telemetry records whether or not anyone reports them.
"Bug Discovery: Failure Event Monitor" (ID 1465582) https://us.posthog.com/project/161414/dashboard/1465582
Each insight tracks one failure event as daily unique users over 90 days.
Insight descriptions carry machine-readable thresholds in this format:
MEDIAN:N/d P90:N/d SPIKE:N/d TREND:up|down|stable|METHOD:...|UPDATED:...|WINDOW:90d
mcp__posthog__dashboard-get (ID 1465582).mcp__posthog__insight-query.$exception by
$exception_types), break down by type even if the top-line is stable.
A declining category can mask a rising one. Proved 2026-04-14: $exception
at stable ~1/d was actually ChunkLoadError disappearing while DOMException
spiked 3x.mcp__posthog__event-definitions-list with q=fail
and q=error, compare against dashboard insights. Flag failure-shaped
events not yet on the dashboard.After scanning (whether or not any anomaly was found), recalculate thresholds for all insights:
mcp__posthog__insight-updateSame as Step 4 above:
Delegate exploration using agenterminal.delegate with the verified explore
prompt (box/verified-explore-prompt.md) for broad traces.
Reverse of the standard flow — check Intercom after the telemetry signal:
Same as Step 5 above:
reference/severity-framework.md.approve_content with content_type: "card-draft".sc-create-story.py;
for existing cards use shortcut-ship.py. Both via execute_approved.Verify each mutation independently before moving to the next. Completion bias activates here ("card is done, rest is cleanup"):
shortcut-cards.py --id SC-NNN --descriptionframework-rank.py score output shows correct valuesinsight-get shows description with thresholdslog-cli.pyEach step: check --help before first invocation of any tool not yet used.