一键在 Manus 中运行任何 Skill

$pwd:

apm-service-dependencies

Name: Apm Service Dependencies
Author: elastic

// Map the application topology from APM telemetry — which services call which, over what protocols, with what call volume and latency. Use when the user asks "what calls X", "what depends on X", "show me the topology", "what are the upstream/downstream services", "where does this service fit", or is doing root-cause investigation and needs to trace how a problem propagates through the call graph. Also trigger for "service map", "dependency graph", "blast radius of service X", or "who's the dependency of Y". Requires Elastic APM — do not trigger for log-only or metrics-only customers.

在 Manus 中运行

$ git log --oneline --stat

stars:7

forks:6

updated:2026年5月1日 08:58

SKILL.md

readonly

name

apm-service-dependencies

description

Map the application topology from APM telemetry — which services call which, over what protocols, with what call volume and latency. Use when the user asks "what calls X", "what depends on X", "show me the topology", "what are the upstream/downstream services", "where does this service fit", or is doing root-cause investigation and needs to trace how a problem propagates through the call graph. Also trigger for "service map", "dependency graph", "blast radius of service X", or "who's the dependency of Y". Requires Elastic APM — do not trigger for log-only or metrics-only customers.

APM Service Dependencies

This tool answers "how is my application wired together?" It returns the APM dependency graph — a set of directed edges from caller services to callee services, with protocol (http, grpc, dns, etc.), call volume, and per-service health (span count, latency, errors) when requested.

Prerequisites

Elastic APM with OTel-instrumented services producing span.destination.service.resource values.
Optional: Kubernetes metadata on the spans (for namespace filtering).

If the user is log-only or metrics-only (no APM), this tool won't work. Do not reach for it.

Tools

Tool	Purpose
`apm-service-dependencies`	Fetch the dependency graph (full or focal).
`apm-health-summary`	Prerequisite view: which services are degraded? Then map their neighborhood with this skill.
`ml-anomalies`	Drill into anomalies affecting a service discovered in the graph.
`k8s-blast-radius`	If a service is K8s-deployed and a node is implicated.

How to call apm-service-dependencies

Focal mode (most common)

Use when you know which service is the investigation target. Returns only that service's direct upstream and downstream neighbors — much easier to reason about than the full graph.

{
  "service": "checkout",
  "lookback": "1h",
  "include_health": true
}

Full-graph mode

Use sparingly — only when the user explicitly wants the whole topology, or during initial environment discovery.

{
  "lookback": "1h",
  "include_health": false
}

Parameter-filling guidance:

service: the exact OTel service.name as deployed — typically lowercase and hyphenated for multi-word services (frontend, checkout, product-catalog). Do not concatenate spaces — if the user says "checkout service" pass checkout, not checkoutservice. If the name is ambiguous, ask the user to confirm before calling. If the tool returns no edges for the named focal service, confirm the name with the user before fuzzy-matching.
namespace: only if the user scopes to a K8s namespace AND services are K8s-deployed.
lookback: default 1h for any unqualified prompt. Don't drop to 15m unless the user explicitly says something time-localized ("right now / this second"). Use 24h to smooth transient topology changes; use the user's literal window when they give one ("past 30 minutes" → 30m).
include_health: default true. Set false for a topology-only response when you don't need latency/error data.

After the tool returns

Response shape:

services: list of nodes with optional language/deployment/namespace metadata and health stats.
edges: directed edges with source, target, protocol, port, call_count, avg_latency_us.
focal_service + upstream + downstream (focal mode only).
service_count / edge_count.
data_coverage_note (only on focal mode when the focal service has inbound but zero outbound edges): flags a likely instrumentation gap — don't claim the service is a leaf; relay the note to the user as an advisory.

Ignore _setup_notice if present — it's view-side chrome (welcome banner) that the UI handles. Don't echo or summarize it in chat.

Lead your narrative with:

Focal service: "checkoutservice has 3 upstream callers and 5 downstream dependencies."
Upstream callers (who depends on this service): name them, note call volumes. Outage here cascades up.
Downstream dependencies (what this service relies on): name them. Problems here cascade in.
Hot edges: highest call volume or latency — likely the load-bearing paths.
Follow-ups: suggest ml-anomalies on a specific neighbor if its health shows errors or elevated latency.

Key principles

Prefer focal mode. The full graph is hard to narrate; a focal subgraph is crisp.
Direction matters. Upstream = who calls me (blast radius goes up). Downstream = what I call (problems cascade in). Don't mix them up in explanations.
Protocols and ports are clues. A DNS edge tells you the callee is resolved by name (k8s service?). A high-port HTTP call to a specific target hints at a sidecar or proxy.
Empty or tiny graphs are a signal. If the focal service has zero edges, either the lookback is too narrow, the service isn't instrumented, or the name is wrong. Do not silently report "no dependencies."

Investigation discipline

One tool call per turn. After this tool returns, narrate what the topology shows — root, leaves, edges with abnormal latency or error rates — before firing another tool. Each call adds a widget to the chat; piling 3-4 in a row after a single "yes" looks like a runaway agent.
Sequential offers, not OR. Don't ask "Want me to check pod health or ML anomalies?" — phrase as a chain: "Want me to start with X? If that doesn't explain it, I can follow up with Y." The user's "yes" then maps to one call, not several.
Commit to a plan before "yes." When a follow-up needs multiple tools (e.g. flagd pod resources → flagd traces → upstream caller anomalies), lay out the chain first and execute one step at a time, narrating between each. Don't pre-fire the chain because the user agreed in principle.

related-skills.json

同仓库

observe.md

from "elastic/example-mcp-app-observability"

The agent's Elastic-access primitive. Four modes: wait for an ML anomaly to fire, poll an ES|QL metric (live-sample or wait for a threshold), read a single-instance scalar value, or return a full ES|QL table. Use when the user says "tell me when...", "let me know if...", "wait until X drops below Y", "watch for anything unusual", "monitor for the next N minutes", "poll until stable", "what is X right now", "list …", "which … are …", or wants transient (session-scoped) monitoring or ad-hoc querying without creating a persistent Kibana rule. Also trigger for "keep an eye on" and post-remediation validation.

2026-05-077

mcp-app-dev-setup.md

from "elastic/example-mcp-app-observability"

Bootstrap or repair a development environment for the Elastic Observability MCP App with Forge as the data driver. Use when the user says "set up Forge for me", "get me ready to work on this MCP app", "run the validation suite", "I just cloned this repo, what now", or wants the dev environment refreshed after a long gap. Verifies sibling Forge clone, Python venv, cluster credentials, MCP harness, and runs a smoke test against the canonical validation suite.

2026-05-017

apm-health-summary.md

from "elastic/example-mcp-app-observability"

Get a cluster-level rollup of service health from APM telemetry — the "how's my environment right now?" entry point for observability investigations. Use whenever the user asks about HEALTH, STATUS, or general wellbeing of an environment / cluster / namespace ("how's my cluster", "status of the X env", "what's broken", "any issues", "show me the health of …", "give me a status report", "what should I look at", "things feel slow"). This applies regardless of any time qualifier — "show me the health of X over the past hour" still routes here (with lookback="1h"), NOT to observe. observe is for raw-metric queries; this tool is for the rollup. Gracefully degrades: layers in Kubernetes pod data and ML anomaly context when those backends are present, but still returns useful APM-only output if they aren't. Do not use for log-only or metrics-only customers — this tool requires Elastic APM.

2026-05-017

ml-anomalies.md

from "elastic/example-mcp-app-observability"

Query Elastic ML anomaly detection results to understand what's behaving unusually, why, and how badly. Use when the user asks "what's anomalous", "is anything unusual happening", "why is X slow/spiking", "show me the weirdness", or mentions memory growth, CPU spikes, restart patterns, unusual latency, unexpected error rates, or drift from typical behavior. Also trigger for "ML anomalies", "anomaly detection", "Elastic ML", "what does ML think", or when the user wants to understand behavior that deviates from baseline. The tool opens an inline explainer view with a severity gauge, plain-English narrative, and per-entity deviation breakdown — so the agent should USE the visualization, not just dump JSON.

2026-05-017

manage-alerts.md

from "elastic/example-mcp-app-observability"

CRUD for Kibana alerting rules — create, list, get, or delete custom-threshold rules. Use when the user says "alert me when", "create a rule for", "page me if", "set up an alert", "show me my rules", "what alerts do I have", "delete that alert", "remove the rule". Backend-agnostic — works on any metric field in any index pattern (metrics-*, logs-*, traces-apm*, custom). For transient session-scoped monitoring use `observe` instead. Requires Kibana with the Alerting feature enabled — the tool is auto-disabled when no Kibana URL is configured.

2026-04-307

k8s-blast-radius.md

from "elastic/example-mcp-app-observability"

Assess the impact of a Kubernetes node going offline — which deployments lose all replicas (full outage), which lose partial capacity (degraded), which are unaffected, and whether the cluster has enough spare capacity to reschedule the lost pods. Use when the user asks "what happens if node X goes down", "what's the blast radius of draining this node", "can I safely maintain node Y", "what's running on this node", "if I evict this node what breaks", or is planning node maintenance, a cluster upgrade, or investigating an actual node failure. Requires Kubernetes (kubeletstats metrics) and Elastic APM for downstream service impact — do not trigger for non-K8s deployments.

2026-04-307

package.json

"author": "elastic"

"repository": "elastic/example-mcp-app-observability"

打开 GitHub 仓库查看创作者相关仓库

$ install --global

$ download --local

在 Manus 中运行

$ useful --forSOC

网络与计算机系统管理员计算机与数学类职业15-1244L4

name

apm-service-dependencies

description

APM Service Dependencies

Prerequisites

Elastic APM with OTel-instrumented services producing span.destination.service.resource values.
Optional: Kubernetes metadata on the spans (for namespace filtering).

If the user is log-only or metrics-only (no APM), this tool won't work. Do not reach for it.

Tools

Tool	Purpose
`apm-service-dependencies`	Fetch the dependency graph (full or focal).
`apm-health-summary`	Prerequisite view: which services are degraded? Then map their neighborhood with this skill.
`ml-anomalies`	Drill into anomalies affecting a service discovered in the graph.
`k8s-blast-radius`	If a service is K8s-deployed and a node is implicated.

How to call apm-service-dependencies

Focal mode (most common)

Use when you know which service is the investigation target. Returns only that service's direct upstream and downstream neighbors — much easier to reason about than the full graph.

{
  "service": "checkout",
  "lookback": "1h",
  "include_health": true
}

Full-graph mode

Use sparingly — only when the user explicitly wants the whole topology, or during initial environment discovery.

{
  "lookback": "1h",
  "include_health": false
}

Parameter-filling guidance:

service: the exact OTel service.name as deployed — typically lowercase and hyphenated for multi-word services (frontend, checkout, product-catalog). Do not concatenate spaces — if the user says "checkout service" pass checkout, not checkoutservice. If the name is ambiguous, ask the user to confirm before calling. If the tool returns no edges for the named focal service, confirm the name with the user before fuzzy-matching.
namespace: only if the user scopes to a K8s namespace AND services are K8s-deployed.
lookback: default 1h for any unqualified prompt. Don't drop to 15m unless the user explicitly says something time-localized ("right now / this second"). Use 24h to smooth transient topology changes; use the user's literal window when they give one ("past 30 minutes" → 30m).
include_health: default true. Set false for a topology-only response when you don't need latency/error data.

After the tool returns

Response shape:

services: list of nodes with optional language/deployment/namespace metadata and health stats.
edges: directed edges with source, target, protocol, port, call_count, avg_latency_us.
focal_service + upstream + downstream (focal mode only).
service_count / edge_count.
data_coverage_note (only on focal mode when the focal service has inbound but zero outbound edges): flags a likely instrumentation gap — don't claim the service is a leaf; relay the note to the user as an advisory.

Ignore _setup_notice if present — it's view-side chrome (welcome banner) that the UI handles. Don't echo or summarize it in chat.

Lead your narrative with:

Focal service: "checkoutservice has 3 upstream callers and 5 downstream dependencies."
Upstream callers (who depends on this service): name them, note call volumes. Outage here cascades up.
Downstream dependencies (what this service relies on): name them. Problems here cascade in.
Hot edges: highest call volume or latency — likely the load-bearing paths.
Follow-ups: suggest ml-anomalies on a specific neighbor if its health shows errors or elevated latency.

Key principles

Prefer focal mode. The full graph is hard to narrate; a focal subgraph is crisp.
Direction matters. Upstream = who calls me (blast radius goes up). Downstream = what I call (problems cascade in). Don't mix them up in explanations.
Protocols and ports are clues. A DNS edge tells you the callee is resolved by name (k8s service?). A high-port HTTP call to a specific target hints at a sidecar or proxy.
Empty or tiny graphs are a signal. If the focal service has zero edges, either the lookback is too narrow, the service isn't instrumented, or the name is wrong. Do not silently report "no dependencies."

Investigation discipline

One tool call per turn. After this tool returns, narrate what the topology shows — root, leaves, edges with abnormal latency or error rates — before firing another tool. Each call adds a widget to the chat; piling 3-4 in a row after a single "yes" looks like a runaway agent.
Sequential offers, not OR. Don't ask "Want me to check pod health or ML anomalies?" — phrase as a chain: "Want me to start with X? If that doesn't explain it, I can follow up with Y." The user's "yes" then maps to one call, not several.
Commit to a plan before "yes." When a follow-up needs multiple tools (e.g. flagd pod resources → flagd traces → upstream caller anomalies), lay out the chain first and execute one step at a time, narrating between each. Don't pre-fire the chain because the user agreed in principle.

apm-service-dependencies

APM Service Dependencies

Prerequisites

Tools

How to call apm-service-dependencies

Focal mode (most common)

Full-graph mode

After the tool returns

Key principles

Investigation discipline

同仓库更多 Skills

同仓库更多 Skills

APM Service Dependencies

Prerequisites

Tools

How to call apm-service-dependencies

Focal mode (most common)

Full-graph mode

After the tool returns

Key principles

Investigation discipline