Jeden Skill in Manus ausführen
mit einem Klick

Jeden Skill in Manus mit einem Klick ausführen

observability

Sterne648

Forks179

Aktualisiert23. Juni 2026 um 02:01

Observe Backend.AI logs, metrics, and traces during local development via the Grafana MCP (Loki logs, Prometheus metrics, Tempo traces, Pyroscope profiles). Use after restarting a service to verify behavior instead of tailing console output.

Installation

Mit Codex oder Claude installieren Kopieren Sie diesen Prompt, fügen Sie ihn in Codex, Claude oder einen anderen Assistant ein und lassen Sie die Skill-Seite prüfen und installieren.

In Manus ausführen

Quelle

lablup

lablup/backend.ai

GitHub-Repository öffnen Creator-Repositorys ansehen

Download

In Manus ausführen

Verwandte BerufeSOC

Basierend auf der SOC-Berufsklassifikation

SoftwareentwicklerInformatik- und Mathematikberufe·SOC 15-1252

SKILL.md

readonly

name	observability
description	Observe Backend.AI logs, metrics, and traces during local development via the Grafana MCP (Loki logs, Prometheus metrics, Tempo traces, Pyroscope profiles). Use after restarting a service to verify behavior instead of tailing console output.
invoke_method	automatic
auto_execute	false
enabled	true
tags	["observability","grafana","logs","metrics","traces"]

Observability — Logs & Metrics During Development

The local halfstack ships a full observability stack. Services emit logs/traces over OTEL ([otel] enabled = true in manager.toml / agent.toml / storage-proxy.toml), so logs land in Loki and metrics in Prometheus automatically. Query them through the Grafana MCP (grafana server in .mcp.json) — this is the single source of truth for runtime logs and metrics during development. The MCP runs as a resident container (backendai-half-grafana-mcp, streamable-http at http://localhost:3001/mcp) inside the observability profile, so .mcp.json just points Claude at that URL.

Stack

Service	URL	Datasource UID	Purpose
Grafana	http://localhost:3000 (`backend` / `develove`)	—	UI / MCP target
Grafana MCP	http://localhost:3001/mcp	—	MCP server Claude connects to
Loki	http://localhost:3100	`loki_ds_001`	Logs
Prometheus	http://localhost:9090	`prom_ds_001`	Metrics
Tempo	http://localhost:3200	`tempo_ds_001`	Traces
Pyroscope	http://localhost:4040	`pyroscope_ds_001`	Profiles

If these containers are not running (including the MCP), bring up the observability profile — see /halfstack.

Log streams are labelled by service_name (manager, agent, …). Confirm what is flowing with the Loki label-values tool before querying.

Grafana MCP — Common Queries

Use the grafana MCP tools (e.g. list_datasources, query_loki_logs, query_prometheus, list_loki_label_values, search_dashboards). Verify exact tool names/params from the MCP itself; the queries below are what you pass.

Logs (Loki / LogQL) — datasource loki_ds_001:

{service_name="manager"}                       # recent manager logs
{service_name="manager"} |= "error"            # errors only
{service_name="agent"} | json | level="ERROR"  # structured (JSON) filter
{service_name="manager"} |= "<request-id>"     # trace one request end-to-end

Metrics (Prometheus / PromQL) — datasource prom_ds_001:

up                                                     # which targets are healthy
backendai_api_request_count                            # REST request counter
rate(backendai_api_request_count[1m])                  # request rate
backendai_api_request_duration_sec_bucket              # API latency histogram
backendai_graphql_request_count                        # GraphQL operation counter

Metric definitions live in src/ai/backend/common/metrics/metric.py (backendai_api_request_*, backendai_graphql_request_*); component-specific metrics under src/ai/backend/{manager,agent,storage}/metrics/.

Dashboards / traces / profiles: the pre-built dashboard is provisioned from grafana-dashboards/dashboard.json (discover via search_dashboards). Tempo (tempo_ds_001) holds distributed traces; Pyroscope (pyroscope_ds_001) holds profiles.

Standard Verification Loop

After a code change:

Restart the affected service — ./dev restart mgr (see /local-dev).
Exercise it — e.g. a ./bai call (see /bai-cli).
Observe via Grafana MCP: query Loki for that service_name to confirm the request ran without errors, and Prometheus to confirm the expected metric moved.

This replaces tailing console output — always confirm runtime behavior through the MCP.

Related Skills

/local-dev — Restart services before observing.
/bai-cli — Exercise the API, then verify logs/metrics here.
/halfstack — Bring up / inspect the observability containers.

Mehr aus diesem Repository

gleiches Repository

api-guide

lablup/backend.ai

Guide for implementing REST and GraphQL APIs (create, get, search, update, delete, purge, scope prefix patterns, admin_ prefix, SearchScope, BaseFilterAdapter, @api_function, Click CLI)

2026-06-23648

bai-cli

lablup/backend.ai

./bai v2 CLI usage for testing/verifying API endpoints and managing resources — entity-command reference, login/config, search patterns, testing workflow (REST API client, NOT the v1 backend.ai CLI)

2026-06-23648

bep-guide

lablup/backend.ai

Guide for writing and managing BEPs (Backend.AI Enhancement Proposals) - creation workflow, document segmentation, context-for-ai blocks, Decision Log

2026-06-23648

cli-sdk-guide

lablup/backend.ai

Guide for implementing Backend.AI client SDK and CLI (Session, BaseFunction, @api_function, Click commands, Pydantic models, FieldSpec, output handlers, APIConfig, testing)

2026-06-23648

code-trace

lablup/backend.ai

Trace a feature across the layered architecture (REST v2, GraphQL, Service, Repository, DB, Errors) and explore the entity source tree — includes how to read the 21k-line supergraph.graphql without loading it whole

2026-06-23648

db-migrate

lablup/backend.ai

Inspect and apply database schema migrations (alembic current/heads, upgrade, downgrade, diverged heads) for Backend.AI components (manager, accountmgr, appproxy)

2026-06-23648

name	observability
description	Observe Backend.AI logs, metrics, and traces during local development via the Grafana MCP (Loki logs, Prometheus metrics, Tempo traces, Pyroscope profiles). Use after restarting a service to verify behavior instead of tailing console output.
invoke_method	automatic
auto_execute	false
enabled	true
tags	["observability","grafana","logs","metrics","traces"]

Observability — Logs & Metrics During Development

Stack

Service	URL	Datasource UID	Purpose
Grafana	http://localhost:3000 (`backend` / `develove`)	—	UI / MCP target
Grafana MCP	http://localhost:3001/mcp	—	MCP server Claude connects to
Loki	http://localhost:3100	`loki_ds_001`	Logs
Prometheus	http://localhost:9090	`prom_ds_001`	Metrics
Tempo	http://localhost:3200	`tempo_ds_001`	Traces
Pyroscope	http://localhost:4040	`pyroscope_ds_001`	Profiles

If these containers are not running (including the MCP), bring up the observability profile — see /halfstack.

Log streams are labelled by service_name (manager, agent, …). Confirm what is flowing with the Loki label-values tool before querying.

Grafana MCP — Common Queries

Logs (Loki / LogQL) — datasource loki_ds_001:

{service_name="manager"}                       # recent manager logs
{service_name="manager"} |= "error"            # errors only
{service_name="agent"} | json | level="ERROR"  # structured (JSON) filter
{service_name="manager"} |= "<request-id>"     # trace one request end-to-end

Metrics (Prometheus / PromQL) — datasource prom_ds_001:

up                                                     # which targets are healthy
backendai_api_request_count                            # REST request counter
rate(backendai_api_request_count[1m])                  # request rate
backendai_api_request_duration_sec_bucket              # API latency histogram
backendai_graphql_request_count                        # GraphQL operation counter

Standard Verification Loop

After a code change:

Restart the affected service — ./dev restart mgr (see /local-dev).
Exercise it — e.g. a ./bai call (see /bai-cli).
Observe via Grafana MCP: query Loki for that service_name to confirm the request ran without errors, and Prometheus to confirm the expected metric moved.

This replaces tailing console output — always confirm runtime behavior through the MCP.

Related Skills

/local-dev — Restart services before observing.
/bai-cli — Exercise the API, then verify logs/metrics here.
/halfstack — Bring up / inspect the observability containers.