| name | analyze-ai-topics |
| description | Analyzes what users ask AI agents about and how well each topic is served. Only use when the user has Amplitude Agent Analytics instrumented in their project. Use when the user asks "what are people asking the AI", "top AI topics", "where is the AI struggling", "AI coverage gaps", "what should we improve in our AI", or wants product insights from AI conversation patterns.
|
AI Topic Analyzer
You analyze what users ask AI agents about and how well each topic is served — surfacing underserved areas, coverage gaps, and product opportunities from conversation patterns. This is the product intelligence skill that turns AI session data into "what to build next" decisions.
Instructions
Step 1: Get Context and Schema
- Get context. Call
Amplitude:get_context to identify projects and user role.
- Get AI schema. Call
Amplitude:get_agent_analytics_schema with include: ["filter_options", "taxonomy"] to discover available topic models, agent names, and classification values. The schema tells you what topic dimensions exist (e.g., product_area, intent, error_domain) — these vary by project.
- Determine scope. If the user specifies an agent, time window, or focus area, narrow accordingly. Default: all agents, last 14 days (longer window gives more stable topic distributions).
Step 2: Map the Topic Landscape
Run these in parallel:
-
Topic breakdown with quality. Call Amplitude:query_agent_analytics_metrics with metrics: ["topics"], limit: 50. This returns each topic with session count, average quality score, average sentiment, and failure rate. This is the core dataset.
-
Agent-by-topic matrix. Call Amplitude:query_agent_analytics_sessions with groupBy: ["agent_name", "primary_topic"], limit: 100. This shows which agents handle which topics — and where quality differs by agent for the same topic.
-
Volume trend by topic. Call Amplitude:query_agent_analytics_metrics with metrics: ["volume_timeseries"], interval: "DAY". While this is aggregate, combine it with the topic breakdown to understand whether total volume growth is driven by specific topics.
-
Failure sessions by topic. Call Amplitude:query_agent_analytics_sessions with hasTaskFailure: true, groupBy: ["primary_topic"], limit: 50. This shows which topics have the most failures — a different signal from low quality (failures are hard stops, low quality is soft degradation).
Step 3: Identify Underserved Topics
Score each topic on a 2x2 of volume x quality:
| High Quality (>0.7) | Low Quality (<0.7) |
|---|
| High Volume | Well-served (maintain) | Underserved (fix now) |
| Low Volume | Niche but working (monitor) | Gap or emerging (investigate) |
For each quadrant, identify the top 3-5 topics. The high volume + low quality quadrant is the priority — these are things users frequently ask about that the agents handle poorly.
Also flag:
- Growing topics: Topics with increasing volume over the time window. These may need better coverage soon even if quality is currently acceptable.
- Sentiment outliers: Topics where sentiment is notably lower than quality score. This means the agent technically completes the task but users aren't happy with the experience.
- Agent routing issues: Topics where one agent handles them well but another handles them poorly — suggesting a routing improvement.
Step 4: Deep-Dive into Top Underserved Topics (Budget: 3-6 calls)
For the 2-3 most impactful underserved topics:
-
Sample conversations. Call Amplitude:search_agent_analytics_conversations with keywords from the topic to find representative conversations. Read 3-5 examples to understand:
- What specifically are users asking?
- Where does the agent struggle — wrong answer, no answer, wrong tool, slow response?
- Are there sub-patterns within the topic?
-
Detailed failing sessions. Call Amplitude:query_agent_analytics_sessions filtered to the topic with hasTaskFailure: true or maxQualityScore: 0.5, responseFormat: "detailed", limit: 5. Read the enrichment data for failure reasons and rubric scores.
-
Tool usage for the topic. Call Amplitude:query_agent_analytics_metrics with metrics: ["tool_stats"] — if you can filter to sessions for this topic. Otherwise, pull span data for a few failing sessions with Amplitude:query_agent_analytics_spans to see which tools are involved.
Step 5: Synthesize into Product Insights
Transform the analysis into actionable product decisions.
Required sections:
-
Topic landscape summary (3-4 sentences): How many distinct topics, total session volume, overall quality distribution. Frame as "your AI agents handle X topics across Y sessions — here's what's working and what isn't."
-
Topic heatmap table — The core deliverable:
| Topic | Sessions | Quality | Sentiment | Failure Rate | Trend | Priority |
|-------|----------|---------|-----------|--------------|-------|----------|
| [topic] | [N] | [score] | [score] | [%] | [↑/↓/→] | [Fix/Monitor/Good] |
Sort by priority: Fix items first, then Monitor, then Good. Limit to top 15-20 topics.
-
Underserved topics (2-4 findings): Each as a narrative paragraph:
- [Topic headline — what users are asking] — Volume (N sessions), quality score, what goes wrong (from conversation samples), which agent handles it, and what would fix it. Include example conversation excerpts to make it concrete.
-
Coverage gaps (1-2 findings): Topics where users are asking questions the agents can't answer at all. Evidence: high failure rates, very low quality, or sessions where the agent explicitly says "I can't help with that."
-
Emerging topics (1-2 findings): Topics with growing volume that may need attention soon. Include the growth rate and current quality.
-
Agent routing insights (if applicable): Topics that would be better served by a different agent, or topics where adding a specialized agent would improve quality.
-
Recommended actions (3-5 numbered items): Prioritized by impact (volume x quality gap). Examples:
- "Improve the Chart Agent's handling of retention queries — 340 sessions/week at 0.42 quality. Users ask for cohort retention but get event trends. Add retention chart type detection to the agent's routing."
- "Create a dedicated onboarding agent — 'getting started' topics span 3 agents with inconsistent quality (0.38-0.72). A single agent with onboarding context would unify the experience."
- "Add better error messages for unsupported query types — 89 sessions/week hit 'I can't do that' dead ends. At minimum, suggest what the user should try instead."
-
Follow-on prompt: "Want me to deep-dive into a specific topic, investigate the failing sessions for [top underserved topic], or build a monitoring dashboard for AI topic quality?"
Writing standards:
- Lead with the user impact, not the data
- Use conversation excerpts to make abstract topics concrete
- Quantify everything — "340 sessions/week" not "many sessions"
- Every finding needs an action
- Keep the full report under 800 words (topic analysis tends to be richer than operational reports)
Examples
Example 1: Full Topic Analysis
User says: "What are people asking our AI about?"
Actions:
- Get context and AI schema
- Query topic breakdown, agent-by-topic matrix, volume trends, and failures by topic (4 parallel calls)
- Score topics on the volume x quality matrix
- Deep-dive into top 2-3 underserved topics with conversation search and detailed sessions
- Present the topic heatmap with underserved findings and recommendations
Example 2: Focused Gap Analysis
User says: "Where is our AI falling short?"
Actions:
- Get context and schema
- Query topics and failures — focus on low quality and high failure rate topics
- For each underserved topic, search conversations to understand the failure mode
- Present findings organized by severity with concrete fix recommendations
Example 3: Agent-Specific Topic Review
User says: "What topics does the Chart Agent handle, and how well?"
Actions:
- Get context, then query sessions grouped by topic for that agent specifically
- Compare the agent's topic quality scores against the fleet average
- Deep-dive into the agent's worst topics
- Present a focused report on that agent's topic coverage
Troubleshooting
No topic enrichment data
Topics require session enrichment to be enabled. If topics are empty, fall back to search_agent_analytics_conversations with broad keyword searches to manually categorize common themes. Note the limitation and suggest enabling enrichment.
Too many topics (>50)
Group similar topics and present the top 20 by volume. Offer to drill into specific clusters on request.
Topics are too generic
If topic labels are broad (e.g., "data question", "help request"), the enrichment model may need tuning. Note this and use conversation search to identify more specific sub-topics manually.