Jeden Skill in Manus ausführen
mit einem Klick

Jeden Skill in Manus mit einem Klick ausführen

$pwd:

monitor-ai-quality

Name: Monitor Ai Quality
Author: amplitude

// Monitors AI agent health across quality, cost, performance, and errors. Only use when the user has Amplitude Agent Analytics instrumented in their project. Use when the user asks "how are our AI agents doing", "AI quality check", "agent health", "AI errors", "agent performance", "LLM cost", or wants a proactive health report on their AI/LLM features.

In Manus ausführen

$ git log --oneline --stat

stars:26

forks:6

updated:1. April 2026 um 19:27

SKILL.md

readonly

related-skills.json

gleiches Repository

taxonomy.md

from "amplitude/mcp-marketplace"

Source of truth for event taxonomy generation, data auditing, and governance best practices in Amplitude. Use when an agent needs to create, validate, audit, score, or recommend improvements to event tracking plans, naming conventions, property standards, data quality, or deprecation workflows. Covers naming rules, property standards, scoring frameworks, safe metadata operations, deprecation procedures, and AI readiness guidance.

2026-05-0126

analyze-ai-topics.md

from "amplitude/mcp-marketplace"

Analyzes what users ask AI agents about and how well each topic is served. Only use when the user has Amplitude Agent Analytics instrumented in their project. Use when the user asks "what are people asking the AI", "top AI topics", "where is the AI struggling", "AI coverage gaps", "what should we improve in our AI", or wants product insights from AI conversation patterns.

2026-04-0126

investigate-ai-session.md

from "amplitude/mcp-marketplace"

Deep-dives into specific AI agent sessions or failure patterns to explain why something went wrong. Only use when the user has Amplitude Agent Analytics instrumented in their project. Use when investigating a specific session ID, debugging agent failures, understanding why quality is low, tracing tool errors, or when monitor-ai-quality surfaces an issue that needs root cause analysis.

2026-04-0126

review-agent-insights.md

from "amplitude/mcp-marketplace"

Retrieves, synthesizes, and prioritizes all recent AI agent results from Amplitude. Queries every agent type available in get_agent_results, validates freshness, and produces a unified narrative ranked by impact. Use when the user asks "what has the AI found", "show me agent insights", "any AI findings", "what did Amplitude discover", "review AI insights", or wants a digest of everything Amplitude's AI agents have surfaced recently.

2026-04-0126

add-analytics-instrumentation.md

from "amplitude/mcp-marketplace"

End-to-end analytics instrumentation workflow for a PR, branch, file, directory, or feature. Reads the code, discovers what events should be tracked, and produces a concrete instrumentation plan — all in one shot. Use this skill whenever a user wants to add analytics to a PR, asks "instrument this PR", "add tracking to this branch", "what analytics does this file need", "instrument the checkout flow", "run the full instrumentation workflow", or any request that implies going from code changes to a tracking plan. Also trigger when the user gives you a PR link, branch name, file path, or feature description and mentions analytics, events, or instrumentation. This is the main entry point for the analytics workflow — prefer it over calling the individual steps (diff-intake, discover-event-surfaces, instrument-events) separately.

2026-03-2626

diff-intake.md

from "amplitude/mcp-marketplace"

Reads a PR or branch diff and produces a structured YAML change brief for downstream analytics instrumentation skills. Use this as the first step whenever a user shares a PR link, branch comparison, or raw diff and wants to understand what changed, what needs tracking, or how to instrument a feature. Trigger on phrases like "review this PR", "what changed in this branch", "help me instrument this diff", "check analytics coverage for this change", or any request to start the analytics review workflow.

2026-03-2626

package.json

"author": "amplitude"

"repository": "amplitude/mcp-marketplace"

GitHub-Repository öffnen Creator-Repositorys ansehen

$ install --global

$ download --local

In Manus ausführen

$ useful --forSOC

DatenwissenschaftlerInformatik- und Mathematikberufe15-2051L4

name	monitor-ai-quality
description	Monitors AI agent health across quality, cost, performance, and errors. Only use when the user has Amplitude Agent Analytics instrumented in their project. Use when the user asks "how are our AI agents doing", "AI quality check", "agent health", "AI errors", "agent performance", "LLM cost", or wants a proactive health report on their AI/LLM features.

AI Agent Quality Monitor

You are a proactive AI operations advisor that delivers a concise, actionable health report on the user's AI agents. Your goal is to surface quality regressions, error spikes, cost anomalies, and performance degradations — then point to the specific sessions that need attention.

Instructions

Phase 1: Get Context and Schema

Get context. Call Amplitude:get_context to identify the user's projects and role.
Get AI schema. Call Amplitude:get_agent_analytics_schema with include: ["filter_options"] to discover available agent names, tool names, topic models, and rubric definitions. This tells you what's in the data before you query it.
Determine scope. If the user specifies an agent, time range, or focus area, narrow accordingly. Otherwise default to all agents over the last 7 days.

Phase 2: Gather the Full Picture

Run these in parallel — this is one batch of calls that gives you the complete health snapshot.

Quality + cost + performance overview. Call Amplitude:query_agent_analytics_metrics with metrics: ["quality", "cost", "performance", "agent_stats", "error_categories", "rubric_scores"]. This gives you success rates, failure rates, sentiment, cost totals, latency percentiles, per-agent breakdowns, and top error categories — all in one call.
Time series trends. Call Amplitude:query_agent_analytics_metrics with metrics: ["quality_timeseries", "volume_timeseries", "cost_timeseries", "success_rate_timeseries", "sentiment_timeseries", "latency_timeseries"] and interval: "DAY". This gives you the trend lines to spot regressions and spikes.
Recent failures. Call Amplitude:query_agent_analytics_sessions with hasTaskFailure: true, limit: 10, orderBy: "-session_start", responseFormat: "concise". This gives you the most recent failed sessions for drill-down examples.
Frustrated users. Call Amplitude:query_agent_analytics_sessions with maxSentimentScore: 0.4, limit: 10, orderBy: "-session_start", responseFormat: "concise". This surfaces sessions where users were unhappy.

Phase 3: Analyze and Triage

With all data in hand, perform these analyses:

Trend detection. Scan the time series for:
- Quality score drops >10% day-over-day
- Volume spikes or drops >25%
- Cost jumps >20%
- Success rate dips below 70%
- Sentiment drops below 0.5 (the neutral baseline)
- Latency P90 increases >50%
Agent comparison. From agent_stats, identify:
- Which agents have the lowest quality scores
- Which agents have the highest error rates
- Which agents cost the most per session
- Any agent with quality diverging from the fleet average
Error triage. From error_categories, rank by frequency and identify:
- New error categories (not present in prior periods)
- Top 3 error categories by volume
- Whether errors concentrate in specific agents
Cost analysis. Flag:
- Total cost trend (growing, stable, declining)
- Agents with disproportionate cost relative to session volume
- Any single-day cost spikes
Cross-reference. Connect findings: Do failing sessions correlate with specific agents? Do sentiment drops align with error spikes? Do cost increases come from a specific agent or model?

Phase 4: Drill Into Top Issues (Budget: 2-4 calls)

For the 2-3 most significant findings, get supporting detail:

For error spikes: Call Amplitude:query_agent_analytics_sessions filtered to the relevant agent or error pattern with responseFormat: "detailed", limit: 5 to get full enrichment data including failure reasons and rubric scores.
For quality regressions: Call Amplitude:query_agent_analytics_sessions with maxQualityScore: 0.4 filtered to the affected agent, responseFormat: "detailed", limit: 5 to understand what's going wrong.
For cost anomalies: Call Amplitude:query_agent_analytics_spans with groupBy: ["model_name"] to see cost breakdown by model, or filter to the expensive agent to see which tools/models drive cost.

Phase 5: Present the Health Report

Structure the output for quick scanning and action.

Required sections:

Health summary (2-3 sentences): The single most important finding, framed as a headline. Include the overall quality score, session volume, and whether things are improving or degrading.
Key metrics table:

| Metric | Current (7d) | Trend | Status |
|--------|-------------|-------|--------|
| Quality Score | [avg] | [↑/↓/→] | [Good/Warning/Critical] |
| Success Rate | [%] | [↑/↓/→] | ... |
| Sentiment | [avg] | [↑/↓/→] | ... |
| Total Sessions | [N] | [↑/↓/→] | ... |
| Total Cost | [$X.XX] | [↑/↓/→] | ... |
| P90 Latency | [Xs] | [↑/↓/→] | ... |
| Task Failure Rate | [%] | [↑/↓/→] | ... |

Agent leaderboard (if multiple agents): A compact table ranking agents by quality score, with session count and error rate. Highlight the best and worst performers.
Top issues (3-5 max): Each as a narrative paragraph:
- [Issue headline] — What's happening, which agent(s), how many sessions affected, since when, and what to do. Include example session IDs for drill-down. Link to /investigate-ai-session for deeper analysis.
What's working (2-3 sentences): Positive signals — agents with improving quality, high satisfaction, low error rates.
Recommended actions (2-4 numbered items): Concrete, actionable. Start each with a verb. Examples: "Investigate the 15 failed Chart Agent sessions from yesterday — they all hit the same tool timeout", "Review the cost spike on Tuesday — claude-opus-4-20250514 usage tripled without a volume increase".
Follow-on prompt: Ask what the user wants to dig into — e.g., "Want me to investigate the Chart Agent failures, analyze what topics are driving low sentiment, or break down cost by model?"

Status thresholds:

Metric	Good	Warning	Critical
Quality Score	>0.7	0.4-0.7	<0.4
Success Rate	>80%	60-80%	<60%
Sentiment	>0.6	0.5-0.6	<0.5
Task Failure Rate	<10%	10-25%	>25%
P90 Latency	<10s	10-30s	>30s

Writing standards:

Lead with the insight, not the data point
Use approximate numbers ("~85%" not "84.7%")
Always state the time window
Every finding must have an action
Keep the full report under 600 words

Examples

Example 1: Routine Health Check

User says: "How are our AI agents doing?"

Actions:

Get context and AI schema
Query analytics overview + time series + recent failures + frustrated users (4 parallel calls)
Identify the agent with the worst quality score and the top error category
Drill into the worst agent's failed sessions for root cause
Present the health report with agent leaderboard and top 3 issues

Example 2: Targeted Agent Check

User says: "How's the Chart Agent performing this week?"

Actions:

Get context, then query analytics with agentNames: ["Chart Agent"]
Query time series for that agent specifically
Pull recent failures and low-quality sessions for that agent
Present a focused report on that single agent's health

Example 3: Cost Investigation

User says: "Our AI costs seem high — what's going on?"

Actions:

Get context, query analytics with metrics: ["cost", "cost_by_model", "agent_stats", "cost_timeseries"]
Identify which agents and models drive the most cost
Query spans grouped by model to see token usage patterns
Pull the most expensive sessions for examples
Present cost-focused report with per-agent and per-model breakdowns

Troubleshooting

No AI session data

The project may not have AI analytics instrumented. Report this clearly and suggest the user check their AI agent SDK integration.

Very few sessions

If <50 sessions in the window, note that sample sizes are small and findings may not be statistically meaningful. Extend the time window if possible.

All metrics look healthy

Frame it positively: "Your AI agents are performing well across the board. Here's the summary and a few minor things to watch." Still surface the lowest-performing areas even if they're above threshold.

monitor-ai-quality

Mehr aus diesem Repository

Mehr aus diesem Repository

AI Agent Quality Monitor

Instructions

Phase 1: Get Context and Schema

Phase 2: Gather the Full Picture

Phase 3: Analyze and Triage

Phase 4: Drill Into Top Issues (Budget: 2-4 calls)

Phase 5: Present the Health Report

Examples

Example 1: Routine Health Check

Example 2: Targeted Agent Check

Example 3: Cost Investigation

Troubleshooting

No AI session data

Very few sessions

All metrics look healthy

AI Agent Quality Monitor

Instructions

Phase 1: Get Context and Schema

Phase 2: Gather the Full Picture

Phase 3: Analyze and Triage

Phase 4: Drill Into Top Issues (Budget: 2-4 calls)

Phase 5: Present the Health Report

Examples

Example 1: Routine Health Check

Example 2: Targeted Agent Check

Example 3: Cost Investigation

Troubleshooting

No AI session data

Very few sessions

All metrics look healthy