一键导入
baseline-monitoring
// Index of baseline metric families for Razorpay services. Use it to determine which standard metrics must exist for HTTP, gRPC, workers, egress, outbox, Go runtime, and relevant infra components.
// Index of baseline metric families for Razorpay services. Use it to determine which standard metrics must exist for HTTP, gRPC, workers, egress, outbox, Go runtime, and relevant infra components.
Scores a Razorpay service repo against the Agentic SDLC Scorecard across three pillars — Context (C1–C4), Testing (T1–T4), CI/CD (D1–D4) — and outputs a band per pillar plus an aggregate score. Use when assessing how agent-ready a service is or tracking progress across teams.
Analyze alert coverage for Razorpay services by discovering multi-source monitoring (Prometheus, CloudWatch RDS, Performance Insights, Coralogix logs), scanning application metrics, and identifying missing business-critical metrics. Leverages repo skill Observability/Monitoring sections (when available) and verifies existing coverage with user before recommendations. Use when the user asks to analyze alert coverage, check for missing alerts, audit monitoring completeness, add metrics and alerts for a service, or improve observability. Only works with Razorpay repositories.
Analyzes API endpoints in Go codebases and generates comprehensive flow visualizations (both Mermaid charts and ASCII diagrams) showing the complete request execution path, including handlers, middleware, services, database queries, external HTTP APIs, cache operations, message queues, and all other components. Use when the user asks to visualize an API endpoint, see the flow of an API, understand how an API works, trace an API request, or map out API dependencies. Triggers on requests like "show me the flow for POST /users/create", "visualize the /orders endpoint", "trace the API call for account creation", or "map out what services are used by this endpoint".
Use when creating, updating, or reviewing baseline Prometheus alert rules for a Razorpay microservice. Triggered by requests like 'add baseline alerts for X service', 'set up monitoring alerts', 'create baseline alerting for my service'.
Apply security best practices when writing, reviewing, or discussing code. Covers authentication, injection prevention, API security, input validation, infrastructure, and AI/LLM security for Python, Go, JS/TS, React, PHP, Node.js.
Generates ready-to-run cURL commands from any codebase (Go, Laravel, Express, Fastify, or other frameworks). Scans route definitions to auto-generate commands, or builds them from user-described endpoints. Includes proper headers, authentication, request bodies with realistic sample data, and environment support (devstack/prod). Use when users ask to "generate curl", "create curl commands", "curl examples for this API", "test this endpoint", or "generate API commands".
| name | baseline-monitoring |
| description | Index of baseline metric families for Razorpay services. Use it to determine which standard metrics must exist for HTTP, gRPC, workers, egress, outbox, Go runtime, and relevant infra components. |
Defines the standard metric families agents must consider when working on service observability.
This skill covers metrics only. Logging and tracing standards are separate and should be loaded from their own skills or references when needed.
Observability in this skill is divided into three buckets:
references/baseline-metrics.md first.references/grpc-metrics.md and skip references/http-metrics.md.references/infra/ files unless the change explicitly involves those components.references/go-runtime-metrics.md.Use this section as the default routing rule.
| If the service has this | Then make sure these metrics exist | Read |
|---|---|---|
| Any owned request or job-handling surface | Traffic, latency, outcome, health, and runtime baseline coverage | references/baseline-metrics.md |
| HTTP or REST ingress | HTTP request, status, latency, and error metric families | references/http-metrics.md |
| gRPC ingress | gRPC request, method, status, latency, and error metric families | references/grpc-metrics.md |
| Outbound or downstream calls | Egress traffic, latency, and error metric families | references/egress-metrics.md |
| Background workers or queue processors | Queue depth, processing rate, failure, and duration metric families | references/worker-metrics.md |
| Outbox flow | Outbox store, handler, backlog-age, and lifecycle-time metric families | references/outbox-metrics.md |
| Go runtime owned by the repo | Go process and runtime metric families | references/go-runtime-metrics.md |
| Traefik in front of the service | Traefik request-volume and latency metrics | references/infra/traefik-metrics.md |
| Edge or CDN layer relevant to the service | Edge latency, error, cache, geo, bandwidth, and certificate metrics | references/infra/edge-metrics.md |
| Uptime or health ownership is relevant | Uptime, health-check, and availability metrics | references/infra/uptime-metrics.md |
| Canary rollout analysis exists | Canary versus baseline comparison metrics | references/infra/canary-metrics.md |
| Kubernetes runtime is part of the service view | Pod, deployment, node, quota, and cluster-event metrics | references/infra/kubernetes-metrics.md |
| Kafka is part of the service topology | Kafka throughput, message-rate, and lag metrics | references/infra/kafka-metrics.md |
| ALB fronts the service | ALB request, latency, error, and healthy-host metrics | references/infra/aws-alb-metrics.md |
| Auto Scaling Groups are part of the service topology | ASG capacity, instance-health, and scaling activity metrics | references/infra/aws-asg-metrics.md |
| RDS is a dependency in the service path | RDS connection, capacity, latency, throughput, and replication metrics | references/infra/aws-rds-metrics.md |
| ElastiCache is a dependency in the service path | Cache usage, connection, CPU, eviction, and replication metrics | references/infra/aws-elasticache-metrics.md |
| SQS or SNS is part of the service flow | Queue backlog, queue-aging, publish, and delivery metrics | references/infra/aws-sqs-sns-metrics.md |
references/baseline-metrics.md: compulsory baseline metric families and how to choose the additional files that apply.references/http-metrics.md: HTTP and REST ingress metrics.references/grpc-metrics.md: gRPC ingress metrics.references/egress-metrics.md: outbound dependency and external API metrics.references/worker-metrics.md: async worker and queue-processing metrics.references/outbox-metrics.md: outbox lifecycle metrics.references/go-runtime-metrics.md: Go process and runtime health metrics.references/infra/traefik-metrics.md: Traefik request and latency metrics.references/infra/edge-metrics.md: edge, CDN, and perimeter metrics.references/infra/uptime-metrics.md: uptime, SLA, and health-check metrics.references/infra/canary-metrics.md: canary-versus-baseline comparison metrics.references/infra/kubernetes-metrics.md: pod, deployment, node, quota, and cluster health metrics.references/infra/kafka-metrics.md: Kafka throughput and lag metrics.references/infra/aws-alb-metrics.md: ALB request, latency, error, and healthy-host metrics.references/infra/aws-asg-metrics.md: ASG capacity, health, and scaling metrics.references/infra/aws-rds-metrics.md: RDS capacity, performance, and error metrics.references/infra/aws-elasticache-metrics.md: ElastiCache usage and health metrics.references/infra/aws-sqs-sns-metrics.md: SNS publish and SQS queue backlog metrics.New gRPC service:
references/baseline-metrics.mdreferences/grpc-metrics.mdreferences/go-runtime-metrics.mdExisting HTTP API adding a new downstream call:
references/baseline-metrics.mdreferences/http-metrics.mdreferences/egress-metrics.mdreferences/go-runtime-metrics.md for Go runtime coverageWorker-only service using SQS and outbox:
references/baseline-metrics.mdreferences/worker-metrics.mdreferences/outbox-metrics.mdreferences/infra/aws-sqs-sns-metrics.mdWhen working on a repo, the agent should reason in this order:
Use this skill as a checklist of required metric families.
At the end of the implementation, the agent must list every metric it added in the PR description.
For each metric, include: