con un clic
monitoring-skill
Monitoring and observability with Prometheus, Grafana, ELK Stack, and distributed tracing.
Menú
Monitoring and observability with Prometheus, Grafana, ELK Stack, and distributed tracing.
DevOps scripting with Bash, Python, and Go for automation, tooling, and infrastructure management
CI/CD pipelines with Git, GitHub Actions, GitLab CI, Jenkins, and deployment strategies.
Cloud infrastructure with AWS, Azure, GCP - architecture, services, security, and cost optimization.
Docker and Kubernetes - containerization, orchestration, and production deployment.
Infrastructure as Code with Terraform, Ansible, and CloudFormation.
Complete Linux administration skill covering process management, filesystem, permissions, package management, users, bash scripting, and system monitoring.
| name | monitoring-skill |
| description | Monitoring and observability with Prometheus, Grafana, ELK Stack, and distributed tracing. |
| sasmp_version | 1.3.0 |
| bonded_agent | 06-monitoring-observability |
| bond_type | PRIMARY_BOND |
| parameters | [{"name":"pillar","type":"string","required":false,"enum":["metrics","logs","traces","all"],"default":"all"},{"name":"tool","type":"string","required":false,"enum":["prometheus","grafana","elk","jaeger"],"default":"prometheus"}] |
| retry_config | {"strategy":"exponential_backoff","initial_delay_ms":1000,"max_retries":3} |
| observability | {"logging":"structured","metrics":"enabled"} |
Master the three pillars of observability: metrics, logs, and traces.
| Name | Type | Required | Default | Description |
|---|---|---|---|---|
| pillar | string | No | all | Observability pillar |
| tool | string | No | prometheus | Tool focus |
# PromQL
sum(rate(http_requests_total[5m])) by (service)
histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))
100 * sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m]))
# Prometheus API
curl http://localhost:9090/api/v1/targets
curl 'http://localhost:9090/api/v1/query?query=up'
curl -X POST http://localhost:9090/-/reload
# Alertmanager
amtool silence add alertname="HighLatency" --duration=2h
amtool alert
| Signal | Metric |
|---|---|
| Latency | histogram_quantile(0.99, ...) |
| Traffic | sum(rate(requests_total[5m])) |
| Errors | rate(errors_total[5m]) |
| Saturation | node_memory_MemAvailable_bytes |
| Symptom | Root Cause | Solution |
|---|---|---|
| No data | Scrape failing | Check targets page |
| Alert not firing | PromQL error | Test in UI |
| High cardinality | Too many labels | Reduce labels |
| Slow queries | Too much data | Add aggregation |
/targetsjournalctl -u prometheus