Ejecuta cualquier Skill en Manus
con un clic

Ejecuta cualquier Skill en Manus con un clic

$pwd:

alert-coverage-analyzer

Name: Alert Coverage Analyzer
Author: razorpay

// Analyze alert coverage for Razorpay services by discovering multi-source monitoring (Prometheus, CloudWatch RDS, Performance Insights, Coralogix logs), scanning application metrics, and identifying missing business-critical metrics. Leverages repo skill Observability/Monitoring sections (when available) and verifies existing coverage with user before recommendations. Use when the user asks to analyze alert coverage, check for missing alerts, audit monitoring completeness, add metrics and alerts for a service, or improve observability. Only works with Razorpay repositories.

Ejecutar en Manus

$ git log --oneline --stat

stars:30

forks:4

updated:5 de mayo de 2026, 11:18

Explorador de archivos

9 archivos

SKILL.md

readonly

name

alert-coverage-analyzer

description

Analyze alert coverage for Razorpay services by discovering multi-source monitoring (Prometheus, CloudWatch RDS, Performance Insights, Coralogix logs), scanning application metrics, and identifying missing business-critical metrics. Leverages repo skill Observability/Monitoring sections (when available) and verifies existing coverage with user before recommendations. Use when the user asks to analyze alert coverage, check for missing alerts, audit monitoring completeness, add metrics and alerts for a service, or improve observability. Only works with Razorpay repositories.

Alert Coverage Analyzer

Comprehensive workflow to analyze, identify, and add missing metrics and alerts for Razorpay services.

Prerequisites

Must be run in a Razorpay repository (verified via git remote)
Requires GitHub CLI (gh) for cloning repos and creating PRs
Strongly Recommended: Repo skill for the service with an Observability/Monitoring section (format: {service-name}-skill)
- Provides documented SLIs/SLOs to validate metric coverage
- Lists known monitoring gaps (pre-validated blind spots)
- Defines service-specific metric naming conventions
- Documents critical flows requiring monitoring
- Significantly improves analysis accuracy and completeness (can identify 2-3x more relevant gaps)

Workflow Overview

Discovery Phase:

Verify Razorpay repository
Identify service name (auto-detect or manual)
Select target regions - Ask user which regions to deploy alerts to (prod-rules, prod-us-rules, prod-sg-rules)
Check repo skill (MANDATORY - read observability docs first)
Setup GitHub CLI
Locate alert-rules repository
Multi-source discovery (across all selected regions):
- Application alerts (Prometheus)
- Infrastructure alerts (RDS, EKS, Lambda)
- ASK: Log-based alerts (Coralogix)
- ASK: Cloud monitoring (Performance Insights)
Scan repository for existing metrics
Verify understanding with user (prevent false positives)
Identify actual missing metrics

Implementation Phase (after user confirmation): 11. Add metrics to application code 12. Create alert rules (for all selected regions) 13. Push branches 14. Create PRs (or provide manual fallback)

Examples

Basic Usage

User: "Analyze alert coverage for this service"

Workflow:

Verifies Razorpay repository and identifies service name
Asks user to select target regions (e.g., "All regions")
Checks for repo skill with observability documentation
Scans existing alerts across all regions (Prometheus + CloudWatch infrastructure)
Verifies coverage with user (histogram _count, Performance Insights, log alerts)
Identifies actual gaps (2-5 missing metrics typically)
Creates branches, adds metrics, creates alert rules for all selected regions, opens PRs

Typical Output:

✅ Target regions: prod-rules, prod-us-rules, prod-sg-rules
✅ Found 403 alerts in prod-rules, 58 in prod-us-rules, 85 in prod-sg-rules
✅ ~600 metrics, 5 RDS instances monitored
✅ Verified: Histogram _count, HandleHelper tracking, Performance Insights, Coralogix panics
⚠️  Found 2 actual gaps: External service latency, Broken tracing pipeline
✅ PRs created: /pull/1234 (metrics), /pull/5678 (alerts for all regions)

For detailed examples, see:

Detailed Examples - Complete workflow, multi-source verification, first-time setup, troubleshooting scenarios

Workflow

See Detailed Workflow Steps for complete implementation instructions.

Quick reference:

Verify Razorpay repository - Run verification script
Identify service name - Auto-detect or ask user
Select target regions - Ask which regions to deploy alerts to:
- All regions (prod-rules, prod-us-rules, prod-sg-rules)
- India only (prod-rules)
- US only (prod-us-rules)
- Singapore only (prod-sg-rules)
- Custom selection
Check repo skill (MANDATORY) - Read observability docs first to understand existing monitoring
Setup GitHub CLI - Install and authenticate if needed
Locate alert-rules repo - Ask user for path or clone to /tmp
Multi-source discovery (across all selected regions):
- Search application alerts (Prometheus) in prod-rules, prod-us-rules, prod-sg-rules
- Search infrastructure alerts (RDS, EKS, Lambda, ElastiCache)
- ASK user about log-based alerts (Coralogix)
- ASK user about cloud monitoring (Performance Insights)
- Display which regions have existing alerts
Scan repository for metrics - Find existing Prometheus metrics in code
Verify with user (CRITICAL) - Confirm understanding about histogram _count, HandleHelper, Performance Insights, log alerts
Identify actual gaps - Compare against repo skill SLIs/SLOs and business flows
Add metrics to code - Create branch, define metrics, instrument code, commit
Create alert rules - Create branch in alert-rules, add YAML rules for each selected region, commit
Push and create PRs - Push both branches, create PRs with gh CLI (PR title includes regions)
Handle failures - See Troubleshooting Guide if PR creation fails

References

Alert Patterns: See references/alert-patterns.md for common alert patterns and thresholds
Metric Examples: See references/metric-examples.md for business-critical metric patterns
Monitoring Sources: See references/monitoring-sources.md for comprehensive checklist of all monitoring sources to verify (Prometheus, CloudWatch, Performance Insights, Coralogix, etc.)

Notes

Multi-Source Monitoring: Most services use 3-5 monitoring tools (Prometheus, CloudWatch, Coralogix, Performance Insights). Always verify coverage across ALL sources before recommending gaps. See references/monitoring-sources.md for complete checklist.
Repo Skills: If a repo skill exists for the service, its Observability/Monitoring section is the most authoritative source for:
- What metrics should exist (documented SLIs/SLOs)
- Known monitoring blind spots (documented gaps)
- Service-specific metric naming conventions
- Critical flows that require monitoring
- Alert severity and escalation policies Always start with the observability section when identifying gaps
Metrics follow Prometheus naming conventions: <namespace>_<metric>_<unit>_<type>
Alert files are organized by environment and region:
- prod-rules/ - India (default production)
- prod-us-rules/ - US region
- prod-sg-rules/ - Singapore region
- nonprod-rules/ - Non-production environments
Regional deployment: Services running in multiple regions should have alerts in all regional folders. Always ask user which regions to target.
All alerts require: severity, bu, pod, service, slack_channel labels
All alerts require: identifier, description, Runbook, vajra_link annotations
Thresholds should be based on production traffic patterns, not arbitrary values
CRITICAL: Never use high cardinality labels (merchant_id, terminal_id, user_id, payment_id, etc.) - causes Prometheus memory exhaustion
CRITICAL: External service alerts are owned by those services - do not add alerts for downstream API calls. Exception: Always add latency metrics (p99, p95, p50) and 5XX error metrics for downstream service calls to track dependency health and SLA compliance
Label cardinality must be < 100 unique values - use logs for per-entity debugging

related-skills.json

mismo repositorio

agent-readiness-score.md

from "razorpay/trino-gateway"

Scores a Razorpay service repo against the Agentic SDLC Scorecard across three pillars — Context (C1–C4), Testing (T1–T4), CI/CD (D1–D4) — and outputs a band per pillar plus an aggregate score. Use when assessing how agent-ready a service is or tracking progress across teams.

2026-05-0530

api-flow.md

from "razorpay/trino-gateway"

Analyzes API endpoints in Go codebases and generates comprehensive flow visualizations (both Mermaid charts and ASCII diagrams) showing the complete request execution path, including handlers, middleware, services, database queries, external HTTP APIs, cache operations, message queues, and all other components. Use when the user asks to visualize an API endpoint, see the flow of an API, understand how an API works, trace an API request, or map out API dependencies. Triggers on requests like "show me the flow for POST /users/create", "visualize the /orders endpoint", "trace the API call for account creation", or "map out what services are used by this endpoint".

2026-05-0530

baseline-alerting.md

from "razorpay/trino-gateway"

Use when creating, updating, or reviewing baseline Prometheus alert rules for a Razorpay microservice. Triggered by requests like 'add baseline alerts for X service', 'set up monitoring alerts', 'create baseline alerting for my service'.

2026-05-0530

baseline-monitoring.md

from "razorpay/trino-gateway"

Index of baseline metric families for Razorpay services. Use it to determine which standard metrics must exist for HTTP, gRPC, workers, egress, outbox, Go runtime, and relevant infra components.

2026-05-0530

application-security.md

from "razorpay/trino-gateway"

Apply security best practices when writing, reviewing, or discussing code. Covers authentication, injection prevention, API security, input validation, infrastructure, and AI/LLM security for Python, Go, JS/TS, React, PHP, Node.js.

2026-05-0530

curl-command-generator.md

from "razorpay/trino-gateway"

Generates ready-to-run cURL commands from any codebase (Go, Laravel, Express, Fastify, or other frameworks). Scans route definitions to auto-generate commands, or builds them from user-described endpoints. Includes proper headers, authentication, request bodies with realistic sample data, and environment support (devstack/prod). Use when users ask to "generate curl", "create curl commands", "curl examples for this API", "test this endpoint", or "generate API commands".

2026-05-0530

package.json

"author": "razorpay"

"repository": "razorpay/trino-gateway"

Abrir repositorio de GitHub Ver repositorios del creador

$ install --global

$ download --local

Ejecutar en Manus

$ useful --forSOC

Administradores de redes y sistemas informáticosOcupaciones informáticas y matemáticas15-1244L4

name

alert-coverage-analyzer

description

Alert Coverage Analyzer

Comprehensive workflow to analyze, identify, and add missing metrics and alerts for Razorpay services.

Prerequisites

Must be run in a Razorpay repository (verified via git remote)
Requires GitHub CLI (gh) for cloning repos and creating PRs
Strongly Recommended: Repo skill for the service with an Observability/Monitoring section (format: {service-name}-skill)
- Provides documented SLIs/SLOs to validate metric coverage
- Lists known monitoring gaps (pre-validated blind spots)
- Defines service-specific metric naming conventions
- Documents critical flows requiring monitoring
- Significantly improves analysis accuracy and completeness (can identify 2-3x more relevant gaps)

Workflow Overview

Discovery Phase:

Verify Razorpay repository
Identify service name (auto-detect or manual)
Select target regions - Ask user which regions to deploy alerts to (prod-rules, prod-us-rules, prod-sg-rules)
Check repo skill (MANDATORY - read observability docs first)
Setup GitHub CLI
Locate alert-rules repository
Multi-source discovery (across all selected regions):
- Application alerts (Prometheus)
- Infrastructure alerts (RDS, EKS, Lambda)
- ASK: Log-based alerts (Coralogix)
- ASK: Cloud monitoring (Performance Insights)
Scan repository for existing metrics
Verify understanding with user (prevent false positives)
Identify actual missing metrics

Implementation Phase (after user confirmation): 11. Add metrics to application code 12. Create alert rules (for all selected regions) 13. Push branches 14. Create PRs (or provide manual fallback)

Examples

Basic Usage

User: "Analyze alert coverage for this service"

Workflow:

Verifies Razorpay repository and identifies service name
Asks user to select target regions (e.g., "All regions")
Checks for repo skill with observability documentation
Scans existing alerts across all regions (Prometheus + CloudWatch infrastructure)
Verifies coverage with user (histogram _count, Performance Insights, log alerts)
Identifies actual gaps (2-5 missing metrics typically)
Creates branches, adds metrics, creates alert rules for all selected regions, opens PRs

Typical Output:

✅ Target regions: prod-rules, prod-us-rules, prod-sg-rules
✅ Found 403 alerts in prod-rules, 58 in prod-us-rules, 85 in prod-sg-rules
✅ ~600 metrics, 5 RDS instances monitored
✅ Verified: Histogram _count, HandleHelper tracking, Performance Insights, Coralogix panics
⚠️  Found 2 actual gaps: External service latency, Broken tracing pipeline
✅ PRs created: /pull/1234 (metrics), /pull/5678 (alerts for all regions)

For detailed examples, see:

Detailed Examples - Complete workflow, multi-source verification, first-time setup, troubleshooting scenarios

Workflow

See Detailed Workflow Steps for complete implementation instructions.

Quick reference:

Verify Razorpay repository - Run verification script
Identify service name - Auto-detect or ask user
Select target regions - Ask which regions to deploy alerts to:
- All regions (prod-rules, prod-us-rules, prod-sg-rules)
- India only (prod-rules)
- US only (prod-us-rules)
- Singapore only (prod-sg-rules)
- Custom selection
Check repo skill (MANDATORY) - Read observability docs first to understand existing monitoring
Setup GitHub CLI - Install and authenticate if needed
Locate alert-rules repo - Ask user for path or clone to /tmp
Multi-source discovery (across all selected regions):
- Search application alerts (Prometheus) in prod-rules, prod-us-rules, prod-sg-rules
- Search infrastructure alerts (RDS, EKS, Lambda, ElastiCache)
- ASK user about log-based alerts (Coralogix)
- ASK user about cloud monitoring (Performance Insights)
- Display which regions have existing alerts
Scan repository for metrics - Find existing Prometheus metrics in code
Verify with user (CRITICAL) - Confirm understanding about histogram _count, HandleHelper, Performance Insights, log alerts
Identify actual gaps - Compare against repo skill SLIs/SLOs and business flows
Add metrics to code - Create branch, define metrics, instrument code, commit
Create alert rules - Create branch in alert-rules, add YAML rules for each selected region, commit
Push and create PRs - Push both branches, create PRs with gh CLI (PR title includes regions)
Handle failures - See Troubleshooting Guide if PR creation fails

References

Alert Patterns: See references/alert-patterns.md for common alert patterns and thresholds
Metric Examples: See references/metric-examples.md for business-critical metric patterns
Monitoring Sources: See references/monitoring-sources.md for comprehensive checklist of all monitoring sources to verify (Prometheus, CloudWatch, Performance Insights, Coralogix, etc.)

Notes

Multi-Source Monitoring: Most services use 3-5 monitoring tools (Prometheus, CloudWatch, Coralogix, Performance Insights). Always verify coverage across ALL sources before recommending gaps. See references/monitoring-sources.md for complete checklist.
Repo Skills: If a repo skill exists for the service, its Observability/Monitoring section is the most authoritative source for:
- What metrics should exist (documented SLIs/SLOs)
- Known monitoring blind spots (documented gaps)
- Service-specific metric naming conventions
- Critical flows that require monitoring
- Alert severity and escalation policies Always start with the observability section when identifying gaps
Metrics follow Prometheus naming conventions: <namespace>_<metric>_<unit>_<type>
Alert files are organized by environment and region:
- prod-rules/ - India (default production)
- prod-us-rules/ - US region
- prod-sg-rules/ - Singapore region
- nonprod-rules/ - Non-production environments
Regional deployment: Services running in multiple regions should have alerts in all regional folders. Always ask user which regions to target.
All alerts require: severity, bu, pod, service, slack_channel labels
All alerts require: identifier, description, Runbook, vajra_link annotations
Thresholds should be based on production traffic patterns, not arbitrary values
CRITICAL: Never use high cardinality labels (merchant_id, terminal_id, user_id, payment_id, etc.) - causes Prometheus memory exhaustion
CRITICAL: External service alerts are owned by those services - do not add alerts for downstream API calls. Exception: Always add latency metrics (p99, p95, p50) and 5XX error metrics for downstream service calls to track dependency health and SLA compliance
Label cardinality must be < 100 unique values - use logs for per-entity debugging

alert-coverage-analyzer

Alert Coverage Analyzer

Prerequisites

Workflow Overview

Examples

Basic Usage

Workflow

References

Notes

Más de este repositorio

Alert Coverage Analyzer

Prerequisites

Workflow Overview

Examples

Basic Usage

Workflow

References

Notes

Más de este repositorio