en un clic
gke-productionize
// Assists in preparing applications and clusters on GKE for production.
// Assists in preparing applications and clusters on GKE for production.
Expert instructions for building high-quality GKE troubleshooting skills. Codifies Step 0 context rules, zero-hallucination signatures, and explicit LQL/PromQL query requirements.
Diagnose and prevent `vbar_control_agent` segfaults and OOMs caused by race conditions during TPU device resets and frequent metrics collection (e.g. every 3s). Use when TPU slice initialization fails or `vbar_control_agent` crashes on TPU v6e nodes.
Workflows for containerizing and deploying applications to GKE for the first time.
Workflows for auditing and hardening the security of GKE workloads.
Answer natural language questions about GKE-related costs by leveraging BigQuery export and cost allocation data.
Guides the user through creating GKE clusters using pre-defined templates (Standard, Autopilot, GPU/AI).
| name | gke-productionize |
| description | Assists in preparing applications and clusters on GKE for production. |
This skill acts as a high-level orchestrator for preparing a GKE cluster and its workloads for production readiness.
[!IMPORTANT] This is a meta-skill or orchestrator skill. You are expected to invoke and run many other specialized skills listed in this document as part of the overall productionization process. Do not attempt to implement all production readiness features directly within this skill; instead, use this skill to assess the environment and then delegate to the specific skills for each domain.
This skill is adaptable to:
Before making recommendations, discover the current state of the environment.
Run these commands to understand the cluster setup:
gcloud container clusters describe <cluster-name> --location <location> --project <project>autopilot: true in the describe output.releaseChannel.If a specific application is targeted, discover its configuration:
kubectl get deployment <app-name> -n <namespace> -o yamlkubectl get namespace <namespace> -o yaml (Look for Pod Security Standards labels).kubectl get hpa -n <namespace>kubectl get pdb -n <namespace>kubectl get networkpolicy -n <namespace>Before implementation, you MUST run the skills for each relevant specialized area listed below and incorporate its guidance into your assessment and plan. Failure to do so will result in a non-compliant production configuration.
If the application is not yet running on GKE, you MUST run the gke-app-onboarding skill for planning containerization, image building, and basic deployment.
Ensure workloads have appropriate resources and autoscaling.
gke-workload-scaling skill for configuring HPA, VPA, and resource limits.Ensure adequate logging and monitoring are in place.
gke-observability skill for setting up Cloud Logging, Monitoring, and Managed Prometheus.Ensure high availability and graceful degradation.
gke-reliability skill for configuring regional clusters, PDBs, and health probes.Harden the cluster and workloads.
gke-workload-security skill for Workload Identity, Network Policies, and Shielded Nodes.default ServiceAccount.Ensure stateful data is protected.
gke-backup-dr skill for configuring Backup for GKE and restore procedures.Secure external access.
gke-networking-edge skill for Gateway API, Ingress, and Cloud Armor.Ensure efficient use of resources.
gke-cost-optimization skill for strategies on rightsizing, quotas, and Spot VMs.After the assessment, provide a summary report with a RAG (Red, Amber, Green) status for each area and an overall readiness score. This helps prioritize remediation efforts.