一键在 Manus 中运行任何 Skill

bug-discovery

星标2

分支0

更新时间2026年5月1日 22:16

Find untracked product bugs by searching Intercom for symptom keywords, cross-referencing PostHog failure events, and tracing codebase failure paths

安装

用 Codex 或 Claude 帮你安装复制这段 Prompt，粘贴到 Codex、Claude 或其他助手里，让它检查 Skill 页面并帮你完成安装。

在 Manus 中运行

来源

paulyokota

paulyokota/FeedForward

打开 GitHub 仓库查看创作者相关仓库

下载

在 Manus 中运行

Bug Discovery

Find untracked product bugs by searching Intercom for symptom keywords, cross-referencing reporters against PostHog failure events, and tracing failure paths in the codebase. Produces a lean bug card in Backlog.

This is the bug counterpart to Recurring Request (Play 4). Play 4 hunts for untracked feature requests using topic nouns. This play hunts for untracked bugs using symptom language.

Arguments

$ARGUMENTS is optional:

(no args): Start from Phase 1 — broad symptom keyword search across the Intercom index.
{keyword or symptom}: Skip broad keyword scoring — focus the search on the specified symptom term(s).

Constraints

Mutation gate: A PreToolUse hook blocks all Slack and Shortcut mutations through Bash. Route through agenterminal.execute_approved or present the command for the user to run.
Human-in-the-loop: Present one bug card at a time with full content rendered. Wait for Paul's decision. Don't batch-execute without review.
Context protection: Delegate codebase exploration and high-volume Intercom filtering to subagents. Primary context is for targeted verification and synthesis, not broad exploration.
Before Shortcut API calls: check reference/tooling-logistics.md for tested recipes. Don't re-derive payload shapes or endpoint paths.
Don't stack unreliable methods. DB theme classifications + keyword matching against Shortcut titles compounds error. Go to primary sources and use reasoning.

Steps

1. Signal hunting

Refresh the Intercom search index (pre-flight step 6).
Search the local Intercom index for symptom keywords: "disappeared", "lost", "broken", "stuck", "failed", "error", "crash", "missing", "gone", "won't load":
```
python3 box/intercom-search.py "keyword"
```
Score each keyword by volume and signal-to-noise. Low volume + high signal ("disappeared": 7 hits, all real bugs) beats high volume + noise ("error": 98 hits, mostly billing/spam).
Check recency: are there instances in the last 30 days? If the most recent instance is 6+ months old, flag as historical and deprioritize. Bugs that stopped being reported may already be fixed.
Check Shortcut to confirm the bug isn't already tracked:
```
python3 box/shortcut-cards.py --query 'title:"symptom keyword"' --summary
```
A title match is a candidate, not confirmation. If the search returns results, read the matched card (shortcut-cards.py --id) and verify the card's scope actually covers the discovered bug. Title keyword overlap between different-scope bugs ("stuck in queue" vs "stuck spinning") silently kills valid discoveries. Proved 2026-04-29 in sweep-channels: symptom similarity ("scrape failure") assumed to match a specific card (Lambda timeout) without verifying the mechanism. Gate: Mark each cluster as tracked/untracked before proceeding to Step 2. Do not read Intercom conversations for tracked clusters. Proved twice (2026-03-30, 2026-04-07): reading conversations then discovering existing Shortcut cards wastes 10-20 min per cluster.

2. Read and cluster

Read 8-12 actual conversations behind the best keyword hits. Confirm they describe the same symptom:
```
python3 box/intercom-search.py --read ID1,ID2,ID3
```
Search snippets are not evidence. Read the actual conversation text before citing it.
Cluster into candidate bugs. One keyword search may surface multiple distinct issues.
If any conversation references a Jam recording URL (jam.dev/c/...), pull structured debug data via Jam MCP: getDetails first (overview + investigation guide), then getNetworkRequests (filter by statusCode="4xx" or "5xx"), getConsoleLogs for errors, getMetadata for browser/OS context, and getUserEvents for the reproduction timeline. This can surface the technical root cause directly from the user's session.

3. Cross-reference to PostHog

Pull contact emails from the Intercom conversations.
Query PostHog for failure/error events filtered by those emails. The Intercom-to-PostHog email cross-reference is the strongest evidence technique: it turns "users say X happens" into "users who say X happens have Y failure events."
Check volume: query the failure event unfiltered for total count, unique users, and weekly trend over 90 days. Spiky patterns suggest infrastructure issues. Steady patterns suggest code bugs.

4. Codebase trace

Grep the aero codebase for the failure reason or error string.
Check whether the failure has user-facing UI copy (error messages, status indicators). Unmapped failure reasons that fall through to generic fallbacks are higher severity.
Trace the failure path: where does the error originate, how does it propagate, what does the user see?

Delegate exploration using agenterminal.delegate with the verified explore prompt (box/verified-explore-prompt.md) for broad traces. For single-file lookups ("does route X exist", "what does function Y do"), grep or direct read is faster.

5. Card

Re-read before citing. If the card will reference specific line numbers or code structure, re-read those lines now (targeted offset/limit). In-context memory of code is not sufficient for line-number claims on a durable surface. Proved 2026-04-27: monitor had to prompt re-read that should have been triggered by the card-drafting transition itself.
Use the lean bug template: What, Evidence, Architecture Context. Skip Monetization, UI Representation, Reporting, Release Strategy unless they have real content.
Evidence should include:
- Intercom conversation links with verbatim quotes
- PostHog event counts with saved insight links
- The email cross-reference results
- If Jam recordings provided debug data, include the key findings (specific errors, failed requests)
Architecture Context for bugs can be more prescriptive than for features: root cause location, failure mechanism, specific code paths. The fix path is typically more deterministic.
Present the completed checklist inline, then present the card description via approve_content with content_type: "card-draft", filename: "scNNN".

6. Verification gate (after approve_content, before ship)

Dispatch two independent verification agents in parallel. State which verifiers you're dispatching and why — if skipping one, state the rationale explicitly. Silent omission is batch momentum.

a) Codebase verification:

agenterminal.delegate with the prompt from box/card-verification-prompt.md, filling {card_draft_path} with the saved file path.

b) Intercom evidence audit (multi-pass):

agenterminal.delegate with the prompt from box/intercom-evidence-audit-prompt.md, filling {card_what_section} with ONLY the card's What section. Do not pass the Evidence section — the auditor must reason independently about search terms.

Dispatch both in the same message (they have no dependency). Then collect both:

agenterminal.collect both results (checkpoints will fire).
Codebase: If all_verified: true — pick 1-2 claim numbers from the delegate's report and run a fresh Read/Grep against the delegate's specific source_file and exact_evidence. Pre-delegation reads that confirm the same underlying facts do not count — the spot-check verifies the delegate's work product, not the facts themselves. Proved 2026-04-23: SC-1760. If all_verified: false — present the verification report. User decides whether to fix or ship as-is.
Intercom audit: If total_relevant > 0 — present the found conversations alongside existing evidence. Read any promising ones via --read. User decides whether to add them to the card. If total_relevant == 0 — note "Multi-pass audit: no additional signal" in the checklist.

Do not auto-fix failed codebase claims or auto-add Intercom conversations. Present findings, let the user decide.

7. Re-approval (if the card changed after initial approve_content)

If the card was modified after the initial approve_content — verification fixes, evidence additions, any edits — re-present via approve_content before proceeding to ship. The initial approval covered the pre-verification draft, not the post-fix version. The user needs to see and approve what will actually be pushed.

Re-approval cycle: Read → Edit → Read → Submit. Read the saved file at the saved_path from the previous approval. Edit the specific change. Re-read to verify no other instances of the problem remain (e.g., grep for all local file paths, not just the one flagged). Then submit. Do not reconstruct the card content from context memory — context memory is a proxy for the file, and proxy trust activates on your own prior output. Proved 2026-04-08: SC-1311.

8. Ship (after verification passes or user accepts gaps)

Submit via agenterminal.execute_approved:
```
python3 box/shortcut-ship.py SC-NNN <saved_path> --severity N
```
The script executes the PUT, re-fetches the card, and verifies inline (state, owners, section headers, description length). Exit 0 = verified, 1 = verification failed, 2 = mutation failed. Set the Severity custom field and record DIC severity via python3 box/framework-rank.py score SC-NNN --severity N.
If any new PostHog event names were discovered during investigation, add them to the appropriate section in box/posthog-events.md.

Telemetry-First Mode (--telemetry)

When $ARGUMENTS is --telemetry, start from PostHog failure dashboards instead of Intercom symptom searches.

Play 5 starts from Intercom (what users say). This mode starts from PostHog (what the product does). They share Steps 4-8 but differ in discovery. Play 5 catches bugs users report in words we guessed. This mode catches bugs the telemetry records whether or not anyone reports them.

Dashboard

"Bug Discovery: Failure Event Monitor" (ID 1465582) https://us.posthog.com/project/161414/dashboard/1465582

T1. Dashboard scan

Pull the dashboard insights via mcp__posthog__dashboard-get (ID 1465582).
For each insight, query current data via mcp__posthog__insight-query.
Parse the description thresholds. Flag any insight where:
- Spike: any day in the last 7 exceeds the SPIKE threshold
- Trend: 14-day trailing mean exceeds prior-14-day mean by >25%
- Composition shift: for events with sub-types (e.g. $exception by $exception_types), break down by type even if the top-line is stable. A declining category can mask a rising one. Proved 2026-04-14: $exception at stable ~1/d was actually ChunkLoadError disappearing while DOMException spiked 3x.
- New event: pull mcp__posthog__event-definitions-list with q=fail and q=error, compare against dashboard insights. Flag failure-shaped events not yet on the dashboard.
Present flagged events to the user with: event name, current rate vs threshold, trend direction, and whether it's a spike or a slow climb. Wait for user to select which to investigate. Clustering is analysis; choosing which failure to pursue is product judgment.

T2. Threshold maintenance

After scanning (whether or not any anomaly was found), recalculate thresholds for all insights:

BASELINE: median of 90-day daily values (exclude partial today)
P90: 90th percentile
SPIKE: P90 x 1.5 (round up)
TREND: compare mean of last 14 complete days vs prior 14 days; >25% = up/down
Write updated descriptions via mcp__posthog__insight-update
Add any newly discovered failure events as new insights on the dashboard

T3. Codebase trace

Same as Step 4 above:

Grep the aero codebase for the failure event name, error string, or failure reason.
Check whether the failure has user-facing UI copy (error messages, status indicators). Unmapped failure reasons that fall through to generic fallbacks are higher severity.
Trace the failure path: where does the error originate, how does it propagate, what does the user see?

Delegate exploration using agenterminal.delegate with the verified explore prompt (box/verified-explore-prompt.md) for broad traces.

T4. Cross-reference to Intercom

Reverse of the standard flow — check Intercom after the telemetry signal:

Search the local Intercom index for symptom language matching the failure. Use mechanism knowledge from the codebase trace to sharpen search terms (the code tells you what words users would use).
If conversations exist, read them. This adds qualitative context to the quantitative signal: how users describe it, how severe it feels to them, whether CS has a workaround.
If no conversations exist, that's itself a finding — users are hitting the failure but not reporting it. Note this on the card.

T5. Card

Same as Step 5 above:

Lean bug template: What, Evidence, Architecture Context.
Evidence should include: PostHog insight links with daily unique user counts, trend description, the threshold that triggered investigation. Intercom conversation links if any exist (with note if none found). Codebase file paths for the failure mechanism.
Assess severity per reference/severity-framework.md.
Present via approve_content with content_type: "card-draft".
After approval, create/ship the card. For new cards use sc-create-story.py; for existing cards use shortcut-ship.py. Both via execute_approved.

T6. Closing checklist

Verify each mutation independently before moving to the next. Completion bias activates here ("card is done, rest is cleanup"):

Card exists with correct content: shortcut-cards.py --id SC-NNN --description
Severity custom field set: read via Shortcut API
DIC score recorded: framework-rank.py score output shows correct values
Insight thresholds updated: insight-get shows description with thresholds
Investigation log entry: entry number returned by log-cli.py

Each step: check --help before first invocation of any tool not yet used.

name	bug-discovery
description	Find untracked product bugs by searching Intercom for symptom keywords, cross-referencing PostHog failure events, and tracing codebase failure paths
disable-model-invocation	true

bug-discovery

同仓库更多 Skills

同仓库更多 Skills

Bug Discovery

Arguments

Also Read

Constraints

Steps

1. Signal hunting

2. Read and cluster

3. Cross-reference to PostHog

4. Codebase trace

5. Card

6. Verification gate (after approve_content, before ship)

7. Re-approval (if the card changed after initial approve_content)

8. Ship (after verification passes or user accepts gaps)

Telemetry-First Mode (--telemetry)

Also Read (additional)

Dashboard

T1. Dashboard scan

T2. Threshold maintenance

T3. Codebase trace

T4. Cross-reference to Intercom

T5. Card

T6. Closing checklist

Bug Discovery

Arguments

Also Read

Constraints

Steps

1. Signal hunting

2. Read and cluster

3. Cross-reference to PostHog

4. Codebase trace

5. Card

6. Verification gate (after approve_content, before ship)

7. Re-approval (if the card changed after initial approve_content)

8. Ship (after verification passes or user accepts gaps)

Telemetry-First Mode (--telemetry)

Also Read (additional)

Dashboard

T1. Dashboard scan

T2. Threshold maintenance

T3. Codebase trace

T4. Cross-reference to Intercom

T5. Card

T6. Closing checklist