원클릭으로 Manus에서 모든 스킬 실행

$pwd:

diagnose-entity-graph

Name: Diagnose Entity Graph
Author: grafana

// Diagnose Entity Graph problems: missing entities, missing edges, disconnected clusters, or filtering issues. Use when the user reports that Entity Graph doesn't look right, services are missing, edges aren't appearing, or environments can't be filtered. Triggers for: "entity graph is empty", "services missing from entity graph", "no edges in entity graph", "disconnected services", "can't filter entity graph", "entity graph not working", "diagnose entity graph", "debug knowledge graph".

Manus에서 실행

$ git log --oneline --stat

stars:302

forks:21

updated:2026년 5월 22일 20:18

SKILL.md

readonly

related-skills.json

같은 저장소

create-dashboard.md

from "grafana/gcx"

Use when the user wants to create a new Grafana dashboard, design dashboard panels, variables, queries, or layout, or make a material visual redesign of an existing dashboard. This skill uses gcx plus `gcx dashboards snapshot` as a visual feedback loop. Triggers on "create dashboard", "new dashboard", "build dashboard", "dashboard for <service>", "improve this dashboard", or "iterate on a dashboard".

2026-05-22302

generate-resource-stubs.md

from "grafana/gcx"

Use when the user explicitly wants typed Go stub files, generated resource skeletons, or grafana-foundation-sdk builder boilerplate for dashboards or alert rules. This is scaffolding only; for designing or creating a usable dashboard with datasource discovery and snapshot iteration, use the create-dashboard skill instead. Triggers on "generate stub", "dashboard stub", "create alert rule stub", "foundation-sdk builder", or "builder boilerplate".

2026-05-22302

manage-dashboards.md

from "grafana/gcx"

Use for operational management of existing Grafana dashboards: list, get, search, create or update from an already-authored manifest, delete, inspect and restore versions, pull/push/validate/promote dashboard resource files, manage dashboard folders, or render PNG snapshots. For designing or creating a new dashboard, or for material visual/dashboard UX changes, use the create-dashboard skill instead.

2026-05-22302

debug-with-grafana.md

from "grafana/gcx"

Structured diagnostic workflow for debugging application issues using Grafana observability data. Use when the user reports errors, latency spikes, service degradation, HTTP 500s, or wants to investigate why a service is behaving unexpectedly. Triggers for: "my API is returning 500 errors", "latency is spiking", "service seems down", "help me debug using Grafana", "investigate why requests are failing", "something is wrong with my service".

2026-05-22302

oncall-triage.md

from "grafana/gcx"

Use when the user is triaging what is actively paging in Grafana OnCall, or asks about active alert groups, acknowledging or silencing or resolving fires, on-call queue, or "what's paging right now". Trigger on phrases like "what's paging", "on-call alerts", "ack this", "silence the page", "what's firing in OnCall", "show me active pages", or any reference to OnCall alert groups. For root cause of why a Grafana alert rule is evaluating (rule-side, pre-routing) use investigate-alert. For schedules, integrations, or escalation chains use the gcx skill.

2026-05-22302

aio11y.md

from "grafana/gcx"

Use when the user wants to list or search AI Observability conversations, inspect generations, manage evaluators (create, test, delete), set up evaluation rules, check scores, or browse evaluator templates. Trigger on phrases like "list conversations", "search generations", "what did the agent do", "debug LLM conversation", "create evaluator", "set up evaluation rule", "test evaluator", "check scores", "evaluate generation quality", or "set up online evaluation".

2026-05-20302

package.json

"author": "grafana"

"repository": "grafana/gcx"

GitHub 저장소 열기 Creator 저장소 보기

$ install --global

$ download --local

Manus에서 실행

$ useful --forSOC

데이터베이스 아키텍트컴퓨터 및 수학직15-1243L4

name

diagnose-entity-graph

description

Diagnose Entity Graph problems: missing entities, missing edges, disconnected clusters, or filtering issues. Use when the user reports that Entity Graph doesn't look right, services are missing, edges aren't appearing, or environments can't be filtered. Triggers for: "entity graph is empty", "services missing from entity graph", "no edges in entity graph", "disconnected services", "can't filter entity graph", "entity graph not working", "diagnose entity graph", "debug knowledge graph".

Diagnose Entity Graph

Systematic diagnosis of Entity Graph problems using gcx commands. Follow the steps in order — each step narrows the cause. Be direct and report findings concisely.

Prerequisites

gcx must be installed (v0.2.14+) and configured with a valid context. If gcx kg diagnose is available (fork or future release), use it as a shortcut where noted. Otherwise, the individual commands below produce equivalent results.

gcx config view
gcx kg status

If kg status returns an error, use the setup-gcx skill first.

Step 1: Stack Health

gcx kg status

Check: status must be "complete" and enabled must be true. If not, the Knowledge Graph hasn't been onboarded — stop here and direct the user to the Asserts app onboarding flow.

Shortcut: gcx kg diagnose runs this plus all subsequent checks in parallel.

Step 2: Entity Counts and Scopes

gcx kg health --since 1h
gcx kg meta scopes

Check: totalEntities should be > 0. The meta scopes output shows available env, site, and namespace values.

If scoping to a specific environment, note the exact env value — you'll use it in all subsequent queries.

Step 3: Source Metrics in Mimir

Check whether the raw telemetry that feeds Entity Graph exists. Raw Tempo metrics use deployment_environment, not asserts_env.

Note the label shape difference between the two metrics: traces_target_info describes a single service so it has one deployment_environment label; traces_service_graph_request_total describes an edge between two services and exposes the env on both sides as client_deployment_environment and server_deployment_environment — there is no unified deployment_environment label.

# Service identity (OTel traces)
gcx metrics query 'count(traces_target_info)' --since 1h
gcx metrics query 'count(traces_target_info{deployment_environment="ENV"})' --since 1h

# Call data (inter-service HTTP/gRPC)
gcx metrics query 'count(traces_service_graph_request_total)' --since 1h
# Filter on server side (use client_deployment_environment for outbound view):
gcx metrics query 'count(traces_service_graph_request_total{server_deployment_environment="ENV"})' --since 1h

Interpret:

Both have data → traces are flowing. Continue to Step 4.
Both empty → no OTel traces for this environment. Entities may still exist via Prometheus scraping. Continue to Step 4.

For more specific verdicts on this metric pair (Tempo metrics generation disabled, broken trace context propagation, service-name collision via self-loop edges), run gcx kg diagnose --env ENV and read the check results — the command encodes the detection logic and emits a targeted recommendation per case.

Step 4: Recording Rules

Recording rules convert raw metrics into the asserts:* metrics that Entity Graph consumes. These use asserts_env, not deployment_environment.

# Entity discovery (central to how services appear)
gcx metrics query 'count(asserts:mixin_workload_job{asserts_env="ENV"})' --since 1h

# CALLS edges
gcx metrics query 'count(asserts:relation:calls{asserts_env="ENV"})' --since 1h

# Request rate KPI
gcx metrics query 'count(asserts:request:rate5m{asserts_env="ENV"})' --since 1h

Interpret:

asserts:mixin_workload_job has data but asserts:relation:calls doesn't → entities are discovered but no edges exist. Continue to Step 5.
All empty → recording rules aren't producing output. Check Step 6 (labels).
All have data → pipeline is healthy. For a specific missing service, go to Step 7.

Step 5: Edge Source Analysis

CALLS edges can come from 11 sources, not just OTel traces:

Source	Input Metric	Requires Traces?
`app_o11y_servicegraph`	`traces_service_graph_request_total`	Yes
`springboot`	`http_server_requests_seconds_count`	No
`nginx_ingress`	`nginx_ingress_controller_requests`	No
`istio`	`istio_requests_total`	No
`aws_rds`	CloudWatch RDS metrics	No
`aws_dynamodb`	CloudWatch DynamoDB metrics	No
`aws_s3`	CloudWatch S3 metrics	No
`aws_applicationelb`	CloudWatch ALB metrics	No
`azure_flexible_server`	Azure DB metrics	No
`kafka_exporter`	Kafka exporter metrics	No
`dbo11y_*`	Database observability metrics	No

# What edge sources are active on this stack?
gcx metrics labels --label asserts_source

# Check common Prometheus-based sources for a namespace:
gcx metrics query 'count(http_server_requests_seconds_count{namespace="NS"})' --since 1h
gcx metrics query 'count(nginx_ingress_controller_requests{namespace="NS"})' --since 1h
gcx metrics query 'count(istio_requests_total{namespace="NS"})' --since 1h

Critical: Check for the asserts_env gap. If a source metric exists but has no asserts_env label, the recording rules silently drop it. This is the most common reason for "metrics present but no edges":

# For each source that returned data above, check if it has asserts_env:
gcx metrics query 'count(istio_requests_total{asserts_env!=""})' --since 1h
gcx metrics query 'count(http_server_requests_seconds_count{asserts_env!=""})' --since 1h
gcx metrics query 'count(nginx_ingress_controller_requests{asserts_env!=""})' --since 1h

If the metric exists but the asserts_env!="" query returns "No data", the Mimir relabeling rules don't cover this source. The fix is to add a relabeling rule that maps namespace or another label to asserts_env for this metric.

Interpret:

No edge sources for this environment → edges are expected to be missing. Services need tracing or one of the Prometheus-based sources above.
Edge source exists but missing asserts_env → relabeling gap. Recording rules require asserts_env!="" and will silently ignore this data.
If services are discovered via JMX (job contains jmx) → JMX alone cannot produce edges. Spring Boot Actuator or OTel tracing is needed.

Shortcut: gcx kg diagnose now detects this gap automatically and warns when edge source metrics exist but lack asserts_env.

Most common fix: If metrics have deployment_environment but not asserts_env, the Asserts environment mapping is misconfigured. Go to Asserts app → Configuration → Connect Environment → Prometheus and set the environment label to deployment_environment. This tells the Mimir relabeling pipeline to derive asserts_env from deployment_environment on all incoming metrics — not just target_info.

If metrics lack both deployment_environment AND asserts_env: The scrape pipeline needs to add deployment_environment first. In Alloy, use prometheus.relabel to copy namespace (or another label) to deployment_environment before remote_write. Then configure the Connect Environment page as above.

Alternative path: Enable OTel tracing to get edges via traces_service_graph_request_total instead. Tempo generates this metric server-side with asserts_env already populated, bypassing the Mimir relabeling pipeline entirely.

Step 6: Label Pipeline

The most common issue: deployment_environment isn't mapped to asserts_env.

gcx metrics labels --label deployment_environment
gcx metrics labels --label asserts_env

Check: Every deployment_environment value should have a corresponding asserts_env value. If one is missing, the Mimir relabeling rules aren't configured for that environment.

Extra asserts_env values (like AWS account IDs) that don't match any deployment_environment are normal — they come from non-OTel sources.

Shortcut: gcx kg diagnose labels automates this cross-reference.

Step 7: Per-Service Investigation

For a specific missing or edge-less service:

# Find in graph
gcx kg cypher "MATCH (s:Service {name: \"SERVICE\"}) RETURN s" --since 1h

# Check relationships
gcx kg cypher "MATCH (s:Service {name: \"SERVICE\"})-[r]-(other) RETURN s, r, other" --since 1h

# Source metrics
gcx metrics query 'count(traces_service_graph_request_total{client="SERVICE"})' --since 1h
gcx metrics query 'count(traces_service_graph_request_total{server="SERVICE"})' --since 1h

# Recording rule output
gcx metrics query 'count(asserts:relation:calls{service="SERVICE"})' --since 1h
gcx metrics query 'count(asserts:mixin_workload_job{service="SERVICE"})' --since 1h

Interpret:

Found via Cypher but no relationships → check source metrics above.
server series exist but asserts:relation:calls doesn't → recording rule label mismatch (check asserts_env and namespace).
Not found via Cypher → check traces_target_info{service_name="SERVICE"}.
Leaf services (queue consumers, processors) correctly have no outgoing edges.

Shortcut: gcx kg diagnose service SERVICE --env ENV runs all checks and produces an interpreted diagnosis with suggested next steps. It also detects two common patterns that present as "missing entities":

Service-name collision (multiple workloads share one service.name, collapsing into one entity).
Env-scope split (workloads in the same namespace disagree on deployment.environment, so cross-env calls don't render as edges).

Read the diagnose check's Recommendation for the specific fix.

Producing a Report

Summarize findings as:

Stack health — KG enabled and complete?
Entity count — how many for the scoped environment?
Discovery path — OTel traces, Prometheus scrape, or cloud integration?
Trace data — do traces_target_info and traces_service_graph_request_total exist?
Edge data — does asserts:relation:calls exist? Which asserts_source values?
Alternative edge sources — Spring Boot, nginx, Istio, cloud integrations available?
Label mapping — deployment_environment correctly mapped to asserts_env?
Conclusion — expected state or configuration issue?
Recommendations — what would fix it?

When recommending a fix, set expectations on convergence time. The metrics the Knowledge Graph reads from (asserts:* recording rules, and the traces_* series Tempo generates) are time-series with a query lookback window — old data with the broken state will keep appearing in queries for at least 5–15 minutes after the fix is applied. The Entity Graph UI should fully stabilize on the corrected state within that window.

diagnose-entity-graph

이 저장소의 다른 Skills

이 저장소의 다른 Skills

Diagnose Entity Graph

Prerequisites

Step 1: Stack Health

Step 2: Entity Counts and Scopes

Step 3: Source Metrics in Mimir

Step 4: Recording Rules

Step 5: Edge Source Analysis

Step 6: Label Pipeline

Step 7: Per-Service Investigation

Producing a Report

Diagnose Entity Graph

Prerequisites

Step 1: Stack Health

Step 2: Entity Counts and Scopes

Step 3: Source Metrics in Mimir

Step 4: Recording Rules

Step 5: Edge Source Analysis

Step 6: Label Pipeline

Step 7: Per-Service Investigation

Producing a Report