ワンクリックで
k8s-hpa-cost-tuning
// Tune Kubernetes HPA, topology spread, requests, and scale-down behavior for cluster cost audits, incidents, replica/node issues, and over-reservation.
// Tune Kubernetes HPA, topology spread, requests, and scale-down behavior for cluster cost audits, incidents, replica/node issues, and over-reservation.
| name | k8s-hpa-cost-tuning |
| description | Tune Kubernetes HPA, topology spread, requests, and scale-down behavior for cluster cost audits, incidents, replica/node issues, and over-reservation. |
Declare a mode before executing this skill. All reasoning, thresholds, and recommendations depend on this choice.
mode = audit | incident
If no mode is provided, refuse to run and request clarification.
mode = audit — Periodic cost-savings auditRun on a schedule (weekly or bi-weekly) to:
This mode assumes no active incident and prioritizes stability-preserving recommendations.
mode = incident — Post-incident scaling analysisRun after a production incident or anomaly, attaching:
This mode focuses on:
This skill assumes Datadog for observability and standard Kubernetes HPA + Cluster Autoscaler.
Kubernetes scaling is a three-layer system:
Cost optimization only works if all three layers can move downward.
Key takeaway: HPA decides quantity, scheduler decides placement, autoscaler decides cost. Scale-up can be aggressive; scale-down must be possible. If replicas drop but nodes do not, the scheduler is the bottleneck.
The utility scripts query three metric families:
kubernetes.cpu.usage.total / node.cpu_allocatable)kubernetes.cpu.requests / node.cpu_allocatable)CPU requested % must go down after scale-down for cost savings to be real. If memory usage stays above target, memory drives scale-up even when CPU is idle.
When scale-down is slow or blocked:
Result: permanent over-reservation.
scaleDown:
stabilizationWindowSeconds: 60
selectPolicy: Max
policies:
- type: Percent
value: 50
periodSeconds: 30
Effects: fast reaction once load drops, predictable replica collapse, low flapping risk.
Topology spread must never prevent pod consolidation during scale-down.
Strict constraints block scheduler flexibility and freeze cluster size.
maxSkew: 1
whenUnsatisfiable: DoNotSchedule
Pods cannot collapse onto fewer nodes. Nodes never drain. Reserved CPU/memory never decreases.
topologySpreadConstraints:
- topologyKey: kubernetes.io/hostname
maxSkew: 2
whenUnsatisfiable: ScheduleAnyway
Strong preference for spreading while allowing bin-packing during scale-down and enabling node removal.
When hard guarantees are required:
topologySpreadConstraints:
- topologyKey: topology.kubernetes.io/zone
maxSkew: 1
whenUnsatisfiable: DoNotSchedule
Do not combine this with strict hostname-level spread.
To avoid hot nodes without blocking scale-down:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
topologyKey: kubernetes.io/hostname
labelSelector:
matchLabels:
app: your-app
Anti-affinity is advisory and cost-safe.
Practical defaults:
targetCPUUtilizationPercentage: 70targetMemoryUtilizationPercentage: 75–80Adjust one knob at a time.
Run weekly (or after changes):
current/target valueskubectl -n <namespace> get hpa <deployment>
kubectl -n <namespace> describe hpa <deployment>
kubectl -n <namespace> top pod --containers
kubectl top node
kubectl -n <namespace> get pods -o wide | sort -k7
Both scripts require Datadog credentials:
export DD_API_KEY=...
export DD_APP_KEY=...
export DD_SITE=datadoghq.com # optional, defaults to datadoghq.com
audit-metrics.mjs — Cost-savings discoveryScan a cluster over a wide window (default 24 h) to find over-reservation and waste.
# Cluster-wide audit
node scripts/audit-metrics.mjs --cluster <cluster>
# With deployment deep-dive
node scripts/audit-metrics.mjs \
--cluster <cluster> \
--namespace <namespace> \
--deployment <deployment>
Reports:
incident-metrics.mjs — Post-incident analysisCollect metrics for a narrow incident window and get a tuning recommendation.
node scripts/incident-metrics.mjs \
--cluster <cluster> \
--namespace <namespace> \
--deployment <deployment> \
--from <ISO8601> \
--to <ISO8601>
Reports:
limits.memory unchanged unless OOMKills or near-limit memory usage are confirmed--out <path> to save full JSON for deeper analysis or diffing across runs--help on either script for all options (relative windows, custom HPA name, pretty JSON)Automate browserless/Puppeteer headless Chrome for screenshots, PDFs, HTML/text extraction, status checks, Lighthouse audits, and browser pipelines.
Create project-local skills for Cursor and Claude Code when users ask to create, add, or update reusable repo instructions.
Retrieve normalized HTML from URLs with fetch or headless prerender for JS pages, absolute URL rewriting, and metadata extraction pipelines.
Build @keyvhq/core key-value caches with TTL, namespaces, memoization, cache-aside patterns, and Redis/Mongo/MySQL/PostgreSQL/SQLite adapters.
Extract metadata from HTML with metascraper rules for link previews, Open Graph, Twitter Cards, JSON-LD, titles, images, authors, and custom parsers.
Use Microlink API/MQL to extract URL metadata, build link previews, capture screenshots/PDFs, scrape CSS-selected data, and avoid browser infrastructure.