Generate a combined release blog post for ai-platform-engineering. Produces a single docs/releases/YYYY-MM-DD-release-X-Y-Z.md file containing release notes and the upgrade guide (migration guide) inline. Use when cutting a release, when a user asks "what changed in 0.4.x", or when upgrading their values.yaml to a new chart version.
Audit and update all documentation moving parts for ai-platform-engineering. Checks release blog posts, features page, agent docs, homepage version strings, Docusaurus version config, and sidebar completeness. Fixes what is stale and reports what needs manual attention. Use after cutting a release, adding a new agent, or updating platform features.
Produce a thorough incident post-mortem report after an outage or customer-impacting event. Covers executive summary, impact, detailed timeline, root cause, contributing factors, corrective and preventive actions, and lessons learned. Use when the user asks to write, draft, or complete a post-mortem, blameless review, or incident review document.
Analyze AWS costs by service, account, and time period. Identifies top spenders, cost anomalies, and optimization opportunities. Use when reviewing cloud spend, preparing cost reports, or investigating unexpected charges.
Check the health and sync status of all ArgoCD applications across clusters. Identifies out-of-sync, degraded, or unhealthy deployments and provides actionable remediation steps. Use when monitoring deployments, troubleshooting sync failures, or verifying environment health after a release.
Check Kubernetes cluster health including pod status, node conditions, resource utilization, and pending alerts across EKS clusters. Use when monitoring infrastructure health, investigating capacity issues, or performing cluster audits.
Correlate PagerDuty incidents with Jira tickets and recent ArgoCD deployments to accelerate root cause analysis. Orchestrates multiple agents to build a timeline of events. Use when investigating active incidents, performing post-mortems, or correlating alerts with changes.
Generate a comprehensive on-call handoff document by aggregating open incidents, ongoing issues, recent deployments, and systems to watch. Orchestrates PagerDuty, Jira, and ArgoCD agents. Use during on-call rotation changes or shift handoffs.