Exécutez n'importe quel Skill dans Manus
en un clic

Exécutez n'importe quel Skill dans Manus en un clic

kb-audit

Étoiles2

Forks0

Mis à jour30 avril 2026 à 18:15

Find and fix Intercom knowledge base gaps — either wrong article content (accuracy) or missing coverage for a product area (coverage). Verify against the codebase, ship corrected articles.

Installation

Installer avec Codex ou Claude Copiez ce prompt, collez-le dans Codex, Claude ou un autre assistant, puis laissez-le vérifier la page du skill et l'installer pour vous.

Exécuter dans Manus

Source

paulyokota

paulyokota/FeedForward

Ouvrir le dépôt GitHub Voir les dépôts du créateur

Téléchargement

Exécuter dans Manus

Métiers associésSOC

Basé sur la classification professionnelle SOC

Analystes en études de marché et spécialistes en marketingProfessions des affaires et des opérations financières·SOC 13-1161

SKILL.md

readonly

Plus depuis ce dépôt

même dépôt

priority-review

paulyokota/FeedForward

Refresh priority signals, manage active epics, and refresh Near Term pool

2026-05-292

deliverable

paulyokota/FeedForward

Multi-session deliverable play for projects spanning 3+ sessions with concrete outputs (proposals, strategies, wireframes). Provides project-level structure, evidence provenance, and cross-session handoff.

2026-05-272

fill-cards

paulyokota/FeedForward

Investigation-driven card grooming — investigate across data sources, synthesize findings into card content, present for approval

2026-05-222

agenterminal-code-reviewer

paulyokota/FeedForward

Use when acting as a reviewer in an AgenTerminal review conversation. Handles both code reviews (REVIEW_APPROVED) and plan reviews (PLAN_APPROVED).

2026-05-212

sync-ideas

paulyokota/FeedForward

Match Slack

2026-05-192

release-review

paulyokota/FeedForward

Weekly release impact review — pull PostHog data for Released cards and tracked PRs, classify, draft observations, post to Slack

2026-05-192

Exécutez n'importe quel Skill en un clic

name	kb-audit
description	Find and fix Intercom knowledge base gaps — either wrong article content (accuracy) or missing coverage for a product area (coverage). Verify against the codebase, ship corrected articles.
disable-model-invocation	true

/kb-audit

Find and fix knowledge base gaps that cause Fin (Gabby) to give users wrong or missing information. Two modes depending on the signal:

Accuracy audit: Fin gave a wrong answer traceable to an article claim.
Coverage audit: A product area has inadequate KB coverage — Fin has nothing to draw from, so it fabricates or falls back to irrelevant articles.

Both modes share the same fix engine (steps 3-9).

Quick reference

Tracker: box/research/kb-audit-tracker.json (triaged conversations, audited articles, last search date)
Per-run artifacts: box/research/kb-audit-article-comparison-run{N}.md, box/research/kb-audit-corrected-article-run{N}.html
Run log / development history: box/research/kb-accuracy-audit-brief.md
Delivery: PUT /articles/{id} via Intercom API (production mutation — execute_approved)
Deliverable: one updated Intercom help center article per run

Error taxonomy

Four classes of article error, requiring different fixes:

Commission: Article says something factually wrong (e.g., "go to Settings → Pin Spacing Rules" when the path doesn't exist, or "Most Popular button in the top right corner" when the button has been removed). Fix: correct or remove the wrong claim.
Omission: Article is correct for what it covers but missing context that causes Fin to misapply it (e.g., image crop article applied to Reels because it never says "images only"). Fix: add clarifying section, preserve existing content.
Omission enabling hallucination: Article's framing is vague enough that Fin infers capabilities that don't exist (e.g., "community" framing led Fin to fabricate messaging features). Fix: add explicit "what this does NOT include" clarification.
Missing coverage: No article covers the topic at all, forcing Fin to fabricate or fall back to irrelevant articles. Fix: add a section to the most relevant existing article, or flag for new article creation. This is the typical finding in coverage audits.

Mode 1: Accuracy Audit

Signal: Fin gave a specific wrong answer. Goal: trace it to the article, fix the article.

1a. Find Fin wrong answers

Load the tracker (box/research/kb-audit-tracker.json) to get the last search date and skip already-triaged conversations.

Two discovery paths:

Path A — Daily digest bot observations. Check the #daily-digest channel for recent digests. The :robot_face: Bot observations section contains pre-classified candidates. Check those conversation IDs against the tracker's triaged_conversations to skip already-processed items.

Important: digest bot observations are intermediaries written by a prior Claude session. They are leads, not findings — read the actual conversation before classifying bot behavior as incorrect or article-traceable.

After triaging a digest's bot observations, add ALL conversations to the tracker (fixed, deferred, not_actionable) — not just the ones that produce fixes. This prevents re-triaging the same digest items in future sessions.

Path B — Structural database query. Query the conversation search index for conversations matching structural signals of KB errors. This catches candidates the daily digest missed or that predate digest coverage.

SELECT conversation_id, part_count, created_at::date
FROM conversation_search_index
WHERE created_at >= '{since_date}'
  AND part_count >= 10
  AND (
    full_text LIKE '%Sources:%'                    -- Fin cited articles
    OR full_text LIKE '%connect you with a human%' -- human escalation
    OR full_text ~* '(can.t find|doesn.t exist|not there|where is the|that.s not|I don.t see)'
                                                   -- user describing missing/wrong UI
  )
  AND conversation_id NOT IN ({triaged_ids})
ORDER BY part_count DESC;

Why these signals: Validated against 8 fixed conversations from runs 1-5. No single signal catches all errors — they're complementary (OR, not AND). The 10-part floor filters for conversations long enough that the error cycle plays out (user asks → Fin answers wrong → user pushes back → human takes over). Below 10, signals match incidental text in short exchanges and marketing emails.

Signal behavior by conversation length:

20+ parts: all three signals reliable, high hit rate
15-19 parts: signals still work (42% hit rate in run 5 validation sample)
10-14 parts: escalation signal degrades (matches Fin auto-follow-up on abandoned conversations, not real failure), other signals still valid but lower hit rate
Below 10: signals match noise

Processing Path B results: Delegate a subagent to read candidates and classify as KB_ERROR / BOT_BEHAVIOR / NOT_FIN. Read each KB_ERROR candidate yourself before accepting the classification — subagent output is a filter, not a finding.

Triage criteria for each conversation (both paths):

Is this Fin giving wrong info (vs. user confusion, vs. product bug)?
Can the wrong claim be traced to a specific KB article?
Is Fin fabricating because no article covers the topic? (missing coverage — this is actionable, not a reason to close)
Is the article still published and unchanged since the incident?

Present triaged candidates to user for selection before proceeding.

Then proceed to step 3 (shared fix engine).

Mode 2: Coverage Audit

Signal: a product area may have inadequate KB coverage. Goal: understand the product, check what the KB covers, fix the gaps.

Triggers: CS pattern in Slack, feature release/migration, proactive review, customer-comms tracker, or an accuracy audit that reveals missing coverage rather than a wrong claim.

Hallucination risk on the proactive path. Coverage audits triggered by releases or the comms tracker (rather than wrong Fin answers) lack a grounding conversation. Without user messages to constrain the content, there's more room for assumption-based claims: definitions, attributions ("Pinterest requires X"), and behavioral descriptions that sound plausible but aren't code-verified. Every factual claim in new article content needs verification against a primary source (codebase or platform docs), not just plausibility. (Run 14: fabricated Simplified Pin definition survived to draft review.)

1b. Understand the product area

Build context on how the feature actually works. The depth here determines the quality of the article content downstream.

From a CS pattern (Slack thread, Intercom cluster): Read the thread/ conversations to understand what users are experiencing. Then trace the mechanism in the codebase — how the feature works, not just what the symptom is. Search Intercom for the conversation cluster to gauge scope.
From a feature release: Read the PR(s) or commit history to understand what changed. Trace the user-facing behavior in the codebase.
From a proactive review: Start from the product area's code. Understand the key user-facing behaviors, settings, and edge cases.
From customer-comms tracker: Pick a recently released or communicated feature from reference/customer-comms.md. Search the article API for existing coverage, then read the codebase to check completeness. Focus on things users would encounter and ask about: limits, requirements, interactions with other features, and what happens when something goes wrong. Not a comprehensive technical audit -- the question is "would Fin have an answer when a user asks about this?"

2b. Check existing KB coverage

Search the Intercom article API to find what coverage exists. Use the API, not the public help center — the public site shows collections, not the full article corpus Fin draws from.

INTERCOM_TOKEN=$(grep 'INTERCOM_ACCESS_TOKEN' .env | cut -d= -f2-)
curl -s "https://api.intercom.io/articles/search?phrase={url_encoded_phrase}" \
  -H "Authorization: Bearer $INTERCOM_TOKEN" \
  -H "Accept: application/json" \
  -H "Intercom-Version: 2.11"

Search multiple phrases — feature name, user symptom language, related concepts. The search is fuzzy and returns ranked results; different phrasings surface different articles.

For each relevant article found, fetch the full body via GET /articles/{id} and check whether it covers the behaviors identified in step 1b. Map the gaps: what does the product do that no article explains?

If coverage is adequate: report findings and stop. Not every audit produces a fix.

If gaps exist: identify the best article to add coverage to (usually the most relevant existing article in the same product area), or flag that a new article is needed. Optionally search Intercom conversations for user-impact evidence to strengthen the case and inform back-testing (step 7).

Then proceed to step 3 (shared fix engine).

Shared Fix Engine (Steps 3-9)

Both modes converge here. You have: an article to fix, an understanding of what's wrong (or missing), and codebase context.

3. Trace to source article

Search Intercom article API: GET /articles/search?phrase=...
Pull the full article: GET /articles/{id}
For accuracy audits: map Fin's specific claims to specific article text
For coverage audits: confirm the gap — the topic is genuinely absent
Save original article to /tmp/ff-article-{id}-original.html

Don't make article-content claims before reading. During triage, describe what the bot said and what the user experienced. Claims about what articles do or don't contain belong here, after reading the article.

4. Verify against codebase

This is where the value is. Don't shortcut it.

For each claim in the article (existing or planned new content), trace to the actual code. Use git -C /Users/paulyokota/Dev/aero show origin/main:path/to/file (local checkout may be behind remote).

4a. Name the exact surface AND the rendering template. The article describes a user workflow. Identify which UI surface it's about, then find the specific template/component that renders it. A component existing in a different view doesn't confirm the article's claims. If the article has screenshots, download and inspect them for specific UI text, button labels, or layout patterns to search the codebase for.

Aero codebase domains:

destination-posts — new publisher / "Upload or Create a Post" flow
scheduler — legacy Pin Scheduler
advanced-scheduler — advanced scheduling
draft-gallery — draft management
create — Tailwind Create (design tool, NOT the scheduler)
smartpin / smartpin-v2 — SmartPin features (check which is GA)
turbo — Turbo features

Note: packages/extension/ is the Turbo browser extension, NOT the scheduling extension. The scheduling extension loads its UI from the legacy app's views (e.g., draft_pin.blade.php for the draft card, NOT post_preview_editable.blade.php).

Browser screenshots are zero-trust for article content. If Chrome tools were used to view the product, treat the screenshot as showing one account's flag-gated state. Every UI element, layout, and navigation path visible in a screenshot must be verified against code — especially feature flags on the rendering component and its parent layout. (Run 17: screenshot showed sidebar nav from product_focused_nav_enabled flag, nearly shipped nav-specific instructions to an article read by users without that flag.)

4b. Feature flag gate. Grep for feature flags in the file AND parent components. A behavior behind a flag may not be available to all users.

featureFlag / feature_flag / isEnabled / useFeature
-v2 directories or component names (may be beta-only)
Seed data for flag defaults (default '1' = GA)

For each flag found, write a GA verdict before proceeding: flag name, code location, rollout evidence (PostHog pageview split, seed data, or plan cascade default). If any documented behavior is behind a non-GA flag: STOP and present the verdict to the user before proceeding to step 5. The user decides whether to scope content to flagged users, wait for GA, or adjust approach. Step 5 cannot start with unresolved flag questions.

The verifier (step 6) cannot catch this. A feature behind a flag IS factually correct code — the verifier will return all_verified: true because the claims are true for users who have the flag. GA status is orthogonal to claim accuracy. (Run 1: SmartPin v2 beta-only. Run 18: SmartPin CSV import v2-only, ~43% rollout. Both passed verification. Two sessions wasted on Run 18.)

4c. Check all surfaces. If the article describes a workflow that could happen in multiple places, verify behavior in each. An article claim might be true in one surface and false in another.

4d. Distinguish article error from Fin error. (Accuracy audits only.) Is the article itself wrong, or is Fin hallucinating beyond what the article says? If the article is correct for its scope but Fin misapplied it, the fix is adding clarification (omission), not changing existing content.

When codebase verification is inconclusive: code may live in a different repo or service, or be configured via ops/infra. Defer with explicit note in tracker, don't guess.

Product knowledge. Check .claude/rules/product-knowledge.md for known facts about this product area before investigating — it may already have the answer. If codebase verification reveals undocumented product mechanics (score formulas, feature interactions, UI label mappings), document in .claude/rules/product-knowledge.md and reference/tailwind-product.md. Show content to user before writing — .claude/ files are sensitive.

5. Write corrected article

Identify error type (commission / omission / omission enabling hallucination / missing coverage)
For commission: change only the wrong sections, preserve everything else
For omission: add clarifying section, don't change existing correct content
For missing coverage: add section to most relevant existing article
If linking to another article, verify the URL resolves before including
Show the new section text before writing — don't compose in a single Write call without previewing the content
Save corrected HTML to box/research/kb-audit-corrected-article-run{N}.html
Optimize for LLM retrieval. Fin retrieves article sections independently — reason about the likely retrieval path and ensure each section works as a standalone answer. Explicitly state what the feature does (including "schedule," "publish," "create") rather than relying on context from other sections to imply it. Ground claims to the product surface name ("in Pin Scheduler") so Fin doesn't interpret them as statements about the external platform. The article still needs to read well for humans browsing the help center, but explicit capability statements serve both audiences. Proved Run 15: article described carousel creation workflow without stating carousels publish; Fin inferred publishing wasn't supported.

6. Codebase verification delegate

Before shipping, delegate an independent codebase verification of every factual claim in the new/changed sections. Use the card verification prompt pattern (box/card-verification-prompt.md) adapted for article content:

Scope: new/changed sections only (identify by HTML heading IDs)
Claim types: behavioral claims, UI claims, feature claims, mechanism claims, conditional claims, negative claims
Model: claude-sonnet-4-6
Gate: all_verified: true → proceed. all_verified: false → present report to user, user decides.

After collecting, spot-check at least 1-2 claims from the verifier's results against primary sources. Structured JSON with all_verified: true suppresses the instinct to check.

6b. External platform verification

Articles often make claims about external platforms (Pinterest, Instagram, Facebook). These are not verifiable from the codebase -- the code shows what Tailwind does, not what the platform requires or supports.

For each claim in the new/changed sections, tag as codebase (verifiable in aero) or external_platform (requires platform docs). For external claims, check the platform's help center or developer docs via Claude in Chrome (preferred for JS-rendered or auth-gated docs), tavily_extract, or WebFetch. If a claim can't be verified externally, flag it explicitly rather than shipping as fact. Don't attribute requirements to external platforms based on Tailwind's implementation choices -- "Tailwind forces X" ≠ "Pinterest requires X." (Run 14: Tailwind's code forced Simplified Pin on product-tagged Pins; this was nearly shipped as "This is a Pinterest requirement.")

7. Back-test against user input

Check whether user messages that describe the problem contain enough signal to route Fin to the new content. Compare:

User's original messages (exact words, error strings, feature names)
The new section's heading and key phrases

For coverage audits: if Intercom conversations were read, use those messages. If the trigger was a release (no conversations yet), imagine likely user phrasings based on the product behavior and write headings accordingly.

If the user's language doesn't overlap with how the fix is written, adjust the section heading or opening sentence to match how users describe the problem. Codebase verification confirms the content is correct; this step confirms it's findable.

8. Pre-ship checklist

Before presenting the corrected article for approval:

☐ Feature flag check on all verified behaviors
☐ Multi-surface check — article's workflow verified in the specific surface(s) it describes, not just "this code exists somewhere"
☐ Diff only the changed sections against original — verify unchanged sections are byte-identical
☐ Cross-article links verified (HTTP 200)
☐ Error type identified and fix matches type
☐ Back-test passed — user's original message matches new section language
☐ Codebase verifier passed (or failures reviewed with user)
☐ Any unverified UI claims in the article noted explicitly — don't declare them unverifiable without exhausting leads
☐ External platform claims verified against platform docs (or flagged as unverifiable)
☐ Independent verifier ran as delegate (not inline reads) — "Was the verifier a fresh agent with no access to my session's reasoning?"

9. Ship

Show exact proposed HTML changes to user (before/after for changed sections, summary of added sections). Get explicit approval.
Ship via PUT /articles/{id} through execute_approved
- Include Accept: application/json header (406 without it)
- Auth: Authorization: Bearer {INTERCOM_ACCESS_TOKEN}
- Version: Intercom-Version: 2.11
Post-mutation verification: Re-fetch the article via GET /articles/{id}. Check that key strings from each change are present in the live body. Do not rely solely on the PUT response status.

10. Update tracker

Update box/research/kb-audit-tracker.json:

Add fixed article to audited_articles with article_updated_at from the post-mutation re-fetch (not the PUT response)
Add all triaged conversations from this run to triaged_conversations (fixed, deferred, not_actionable) — all items, not just fixes
- fixed: Article was updated or created this run to address this conversation
- not_actionable: Read and triaged; not a KB content error (bot behavior, product bug, hallucination not traceable to article)
- deferred: Identified as a candidate but not fully investigated this run — pick up in a future session
Update last_search_date
For coverage audits: record discovery_source (e.g., slack_thread_CK4TLM9TR_p1777298260939679, release_pr_3426)

Write comparison doc to box/research/kb-audit-article-comparison-run{N}.md.

11. Continue or end

If untriaged or deferred candidates remain from this session's discovery (digest observations, structural query results, or coverage gaps identified during investigation), present the remaining list. Do not suggest ending — present the choice neutrally.

Known gotchas

Intercom search freshness: intercom-search.py has a 36h freshness gate. If stale, sync must run first.
ai_agent_participated is on conversation metadata, not searchable via full-text index — need API or DB queries for this filter.
Article update is a production mutation — blocked by mutation gate, must route through execute_approved.
Intercom API version 2.11. Requires Accept: application/json on PUT.
Use the article API, not WebFetch, to enumerate KB coverage. The public help center shows collections; the API shows everything Fin can draw from. Search multiple phrasings — different terms surface different articles. (Run 6)
Fin draws from multiple articles for a single answer — tracing to one article requires matching specific claims, not just topic.
Feature flags: Code behind a flag may not be GA. SmartPin v2 code looked like article errors but was beta-only. (Run 1)
Multiple surfaces: A feature may exist in multiple UI surfaces with different behavior. Crop existed in Create but not scheduler Drafts. (Run 2)
Conversation evidence is a pointer, not a finding. "This doesn't exist" tells you the user's experience, not what the system does.
Intercom article url field may be null. Construct URL as support.tailwindapp.com/en/articles/{id}-{slug} and verify with HTTP request.
Digest bot observations are intermediaries. Written by a prior Claude session. Read the actual conversation before classifying. (Run 5)
Escalation signal degrades below 15 parts. "Connect you with a human" matches Fin's auto-follow-up on abandoned conversations, not just real failure-to-resolve. (Run 5 threshold validation)
Hallucination often signals missing coverage. When Fin fabricates, check whether any article covers the topic — absence of coverage is an actionable KB gap, not a reason to mark not_actionable. (Run 5)
Match article language to product's own wording. Check in-app announcements, tooltips, and labels for how the product describes the feature. Use that language in the article, not your own phrasing. (Run 6: "navigation bar" matched the in-app announcement, not "Turbo navigation bar")
When wrapping mutations in scripts, show the script contents before execute_approved, not just the command path. execute_approved displays the command but not the script body — if the user approved curl commands and you switch to Python, re-show. Consent is to a specific artifact, not a category of action. (Run 7)
Intercom normalizes HTML on PUT: <h2> → <h1>, removes empty <p> tags, converts straight apostrophes to curly quotes. Post-mutation verification scripts must normalize before comparing, or they'll show false MISSING results. (Run 7)
Don't define external concepts inline without checking. When an article references a platform concept (Simplified Pin, Rich Pin, carousel), check the existing KB or platform docs for the definition. Inventing a plausible definition is the highest-risk hallucination pattern on the proactive path. (Run 14)
Implied capabilities don't work for LLM retrieval. An article that describes a creation workflow without explicitly stating the feature publishes will be interpreted by Fin as "you can build it but not publish it." Each article section must answer the question "can Tailwind do this?" directly, not by implication. (Run 15)

name	kb-audit
description	Find and fix Intercom knowledge base gaps — either wrong article content (accuracy) or missing coverage for a product area (coverage). Verify against the codebase, ship corrected articles.
disable-model-invocation	true

/kb-audit

Find and fix knowledge base gaps that cause Fin (Gabby) to give users wrong or missing information. Two modes depending on the signal:

Accuracy audit: Fin gave a wrong answer traceable to an article claim.
Coverage audit: A product area has inadequate KB coverage — Fin has nothing to draw from, so it fabricates or falls back to irrelevant articles.

Both modes share the same fix engine (steps 3-9).

Quick reference

Tracker: box/research/kb-audit-tracker.json (triaged conversations, audited articles, last search date)
Per-run artifacts: box/research/kb-audit-article-comparison-run{N}.md, box/research/kb-audit-corrected-article-run{N}.html
Run log / development history: box/research/kb-accuracy-audit-brief.md
Delivery: PUT /articles/{id} via Intercom API (production mutation — execute_approved)
Deliverable: one updated Intercom help center article per run

Error taxonomy

Four classes of article error, requiring different fixes:

Commission: Article says something factually wrong (e.g., "go to Settings → Pin Spacing Rules" when the path doesn't exist, or "Most Popular button in the top right corner" when the button has been removed). Fix: correct or remove the wrong claim.
Omission: Article is correct for what it covers but missing context that causes Fin to misapply it (e.g., image crop article applied to Reels because it never says "images only"). Fix: add clarifying section, preserve existing content.
Omission enabling hallucination: Article's framing is vague enough that Fin infers capabilities that don't exist (e.g., "community" framing led Fin to fabricate messaging features). Fix: add explicit "what this does NOT include" clarification.
Missing coverage: No article covers the topic at all, forcing Fin to fabricate or fall back to irrelevant articles. Fix: add a section to the most relevant existing article, or flag for new article creation. This is the typical finding in coverage audits.

Mode 1: Accuracy Audit

Signal: Fin gave a specific wrong answer. Goal: trace it to the article, fix the article.

1a. Find Fin wrong answers

Load the tracker (box/research/kb-audit-tracker.json) to get the last search date and skip already-triaged conversations.

Two discovery paths:

SELECT conversation_id, part_count, created_at::date
FROM conversation_search_index
WHERE created_at >= '{since_date}'
  AND part_count >= 10
  AND (
    full_text LIKE '%Sources:%'                    -- Fin cited articles
    OR full_text LIKE '%connect you with a human%' -- human escalation
    OR full_text ~* '(can.t find|doesn.t exist|not there|where is the|that.s not|I don.t see)'
                                                   -- user describing missing/wrong UI
  )
  AND conversation_id NOT IN ({triaged_ids})
ORDER BY part_count DESC;

Signal behavior by conversation length:

20+ parts: all three signals reliable, high hit rate
15-19 parts: signals still work (42% hit rate in run 5 validation sample)
10-14 parts: escalation signal degrades (matches Fin auto-follow-up on abandoned conversations, not real failure), other signals still valid but lower hit rate
Below 10: signals match noise

Triage criteria for each conversation (both paths):

Is this Fin giving wrong info (vs. user confusion, vs. product bug)?
Can the wrong claim be traced to a specific KB article?
Is Fin fabricating because no article covers the topic? (missing coverage — this is actionable, not a reason to close)
Is the article still published and unchanged since the incident?

Present triaged candidates to user for selection before proceeding.

Then proceed to step 3 (shared fix engine).

Mode 2: Coverage Audit

Signal: a product area may have inadequate KB coverage. Goal: understand the product, check what the KB covers, fix the gaps.

Triggers: CS pattern in Slack, feature release/migration, proactive review, customer-comms tracker, or an accuracy audit that reveals missing coverage rather than a wrong claim.

1b. Understand the product area

Build context on how the feature actually works. The depth here determines the quality of the article content downstream.

From a CS pattern (Slack thread, Intercom cluster): Read the thread/ conversations to understand what users are experiencing. Then trace the mechanism in the codebase — how the feature works, not just what the symptom is. Search Intercom for the conversation cluster to gauge scope.
From a feature release: Read the PR(s) or commit history to understand what changed. Trace the user-facing behavior in the codebase.
From a proactive review: Start from the product area's code. Understand the key user-facing behaviors, settings, and edge cases.
From customer-comms tracker: Pick a recently released or communicated feature from reference/customer-comms.md. Search the article API for existing coverage, then read the codebase to check completeness. Focus on things users would encounter and ask about: limits, requirements, interactions with other features, and what happens when something goes wrong. Not a comprehensive technical audit -- the question is "would Fin have an answer when a user asks about this?"

2b. Check existing KB coverage

Search the Intercom article API to find what coverage exists. Use the API, not the public help center — the public site shows collections, not the full article corpus Fin draws from.

INTERCOM_TOKEN=$(grep 'INTERCOM_ACCESS_TOKEN' .env | cut -d= -f2-)
curl -s "https://api.intercom.io/articles/search?phrase={url_encoded_phrase}" \
  -H "Authorization: Bearer $INTERCOM_TOKEN" \
  -H "Accept: application/json" \
  -H "Intercom-Version: 2.11"

Search multiple phrases — feature name, user symptom language, related concepts. The search is fuzzy and returns ranked results; different phrasings surface different articles.

If coverage is adequate: report findings and stop. Not every audit produces a fix.

Then proceed to step 3 (shared fix engine).

Shared Fix Engine (Steps 3-9)

Both modes converge here. You have: an article to fix, an understanding of what's wrong (or missing), and codebase context.

3. Trace to source article

Search Intercom article API: GET /articles/search?phrase=...
Pull the full article: GET /articles/{id}
For accuracy audits: map Fin's specific claims to specific article text
For coverage audits: confirm the gap — the topic is genuinely absent
Save original article to /tmp/ff-article-{id}-original.html

4. Verify against codebase

This is where the value is. Don't shortcut it.

For each claim in the article (existing or planned new content), trace to the actual code. Use git -C /Users/paulyokota/Dev/aero show origin/main:path/to/file (local checkout may be behind remote).

Aero codebase domains:

destination-posts — new publisher / "Upload or Create a Post" flow
scheduler — legacy Pin Scheduler
advanced-scheduler — advanced scheduling
draft-gallery — draft management
create — Tailwind Create (design tool, NOT the scheduler)
smartpin / smartpin-v2 — SmartPin features (check which is GA)
turbo — Turbo features

4b. Feature flag gate. Grep for feature flags in the file AND parent components. A behavior behind a flag may not be available to all users.

featureFlag / feature_flag / isEnabled / useFeature
-v2 directories or component names (may be beta-only)
Seed data for flag defaults (default '1' = GA)

4c. Check all surfaces. If the article describes a workflow that could happen in multiple places, verify behavior in each. An article claim might be true in one surface and false in another.

When codebase verification is inconclusive: code may live in a different repo or service, or be configured via ops/infra. Defer with explicit note in tracker, don't guess.

5. Write corrected article

Identify error type (commission / omission / omission enabling hallucination / missing coverage)
For commission: change only the wrong sections, preserve everything else
For omission: add clarifying section, don't change existing correct content
For missing coverage: add section to most relevant existing article
If linking to another article, verify the URL resolves before including
Show the new section text before writing — don't compose in a single Write call without previewing the content
Save corrected HTML to box/research/kb-audit-corrected-article-run{N}.html
Optimize for LLM retrieval. Fin retrieves article sections independently — reason about the likely retrieval path and ensure each section works as a standalone answer. Explicitly state what the feature does (including "schedule," "publish," "create") rather than relying on context from other sections to imply it. Ground claims to the product surface name ("in Pin Scheduler") so Fin doesn't interpret them as statements about the external platform. The article still needs to read well for humans browsing the help center, but explicit capability statements serve both audiences. Proved Run 15: article described carousel creation workflow without stating carousels publish; Fin inferred publishing wasn't supported.

6. Codebase verification delegate

Scope: new/changed sections only (identify by HTML heading IDs)
Claim types: behavioral claims, UI claims, feature claims, mechanism claims, conditional claims, negative claims
Model: claude-sonnet-4-6
Gate: all_verified: true → proceed. all_verified: false → present report to user, user decides.

After collecting, spot-check at least 1-2 claims from the verifier's results against primary sources. Structured JSON with all_verified: true suppresses the instinct to check.

6b. External platform verification

7. Back-test against user input

Check whether user messages that describe the problem contain enough signal to route Fin to the new content. Compare:

User's original messages (exact words, error strings, feature names)
The new section's heading and key phrases

8. Pre-ship checklist

Before presenting the corrected article for approval:

☐ Feature flag check on all verified behaviors
☐ Multi-surface check — article's workflow verified in the specific surface(s) it describes, not just "this code exists somewhere"
☐ Diff only the changed sections against original — verify unchanged sections are byte-identical
☐ Cross-article links verified (HTTP 200)
☐ Error type identified and fix matches type
☐ Back-test passed — user's original message matches new section language
☐ Codebase verifier passed (or failures reviewed with user)
☐ Any unverified UI claims in the article noted explicitly — don't declare them unverifiable without exhausting leads
☐ External platform claims verified against platform docs (or flagged as unverifiable)
☐ Independent verifier ran as delegate (not inline reads) — "Was the verifier a fresh agent with no access to my session's reasoning?"

9. Ship

Show exact proposed HTML changes to user (before/after for changed sections, summary of added sections). Get explicit approval.
Ship via PUT /articles/{id} through execute_approved
- Include Accept: application/json header (406 without it)
- Auth: Authorization: Bearer {INTERCOM_ACCESS_TOKEN}
- Version: Intercom-Version: 2.11
Post-mutation verification: Re-fetch the article via GET /articles/{id}. Check that key strings from each change are present in the live body. Do not rely solely on the PUT response status.

10. Update tracker

Update box/research/kb-audit-tracker.json:

Add fixed article to audited_articles with article_updated_at from the post-mutation re-fetch (not the PUT response)
Add all triaged conversations from this run to triaged_conversations (fixed, deferred, not_actionable) — all items, not just fixes
- fixed: Article was updated or created this run to address this conversation
- not_actionable: Read and triaged; not a KB content error (bot behavior, product bug, hallucination not traceable to article)
- deferred: Identified as a candidate but not fully investigated this run — pick up in a future session
Update last_search_date
For coverage audits: record discovery_source (e.g., slack_thread_CK4TLM9TR_p1777298260939679, release_pr_3426)

Write comparison doc to box/research/kb-audit-article-comparison-run{N}.md.

11. Continue or end

Known gotchas

Intercom search freshness: intercom-search.py has a 36h freshness gate. If stale, sync must run first.
ai_agent_participated is on conversation metadata, not searchable via full-text index — need API or DB queries for this filter.
Article update is a production mutation — blocked by mutation gate, must route through execute_approved.
Intercom API version 2.11. Requires Accept: application/json on PUT.
Use the article API, not WebFetch, to enumerate KB coverage. The public help center shows collections; the API shows everything Fin can draw from. Search multiple phrasings — different terms surface different articles. (Run 6)
Fin draws from multiple articles for a single answer — tracing to one article requires matching specific claims, not just topic.
Feature flags: Code behind a flag may not be GA. SmartPin v2 code looked like article errors but was beta-only. (Run 1)
Multiple surfaces: A feature may exist in multiple UI surfaces with different behavior. Crop existed in Create but not scheduler Drafts. (Run 2)
Conversation evidence is a pointer, not a finding. "This doesn't exist" tells you the user's experience, not what the system does.
Intercom article url field may be null. Construct URL as support.tailwindapp.com/en/articles/{id}-{slug} and verify with HTTP request.
Digest bot observations are intermediaries. Written by a prior Claude session. Read the actual conversation before classifying. (Run 5)
Escalation signal degrades below 15 parts. "Connect you with a human" matches Fin's auto-follow-up on abandoned conversations, not just real failure-to-resolve. (Run 5 threshold validation)
Hallucination often signals missing coverage. When Fin fabricates, check whether any article covers the topic — absence of coverage is an actionable KB gap, not a reason to mark not_actionable. (Run 5)
Match article language to product's own wording. Check in-app announcements, tooltips, and labels for how the product describes the feature. Use that language in the article, not your own phrasing. (Run 6: "navigation bar" matched the in-app announcement, not "Turbo navigation bar")
When wrapping mutations in scripts, show the script contents before execute_approved, not just the command path. execute_approved displays the command but not the script body — if the user approved curl commands and you switch to Python, re-show. Consent is to a specific artifact, not a category of action. (Run 7)
Intercom normalizes HTML on PUT: <h2> → <h1>, removes empty <p> tags, converts straight apostrophes to curly quotes. Post-mutation verification scripts must normalize before comparing, or they'll show false MISSING results. (Run 7)
Don't define external concepts inline without checking. When an article references a platform concept (Simplified Pin, Rich Pin, carousel), check the existing KB or platform docs for the definition. Inventing a plausible definition is the highest-risk hallucination pattern on the proactive path. (Run 14)
Implied capabilities don't work for LLM retrieval. An article that describes a creation workflow without explicitly stating the feature publishes will be interpreted by Fin as "you can build it but not publish it." Each article section must answer the question "can Tailwind do this?" directly, not by implication. (Run 15)