mit einem Klick
gke-cluster-lifecycle
// Guidance on managing the lifecycle and upgrades of Google Kubernetes Engine (GKE) clusters.
// Guidance on managing the lifecycle and upgrades of Google Kubernetes Engine (GKE) clusters.
| name | gke-cluster-lifecycle |
| description | Guidance on managing the lifecycle and upgrades of Google Kubernetes Engine (GKE) clusters. |
This skill provides guidance on managing the lifecycle and upgrades of Google Kubernetes Engine (GKE) clusters.
Managing cluster upgrades is crucial for security and access to new features. GKE provides automated upgrades, but they must be configured to minimize disruption.
Release channels allow you to choose the balance between stability and feature availability.
Command to set release channel:
gcloud container clusters update <cluster-name> \
--release-channel=stable \
--region <region>
Surge upgrades allow you to specify how many nodes can be created above the target size during an upgrade, minimizing disruption.
Example configuration:
gcloud container node-pools update <pool-name> \
--cluster=<cluster-name> \
--max-surge-upgrade=2 \
--max-unavailable-upgrade=0 \
--region <region>
Setting max-unavailable-upgrade=0 ensures that no nodes are taken offline before new ones are ready.
For high-risk upgrades, you can create a new node pool (Green) with the new version, test it, and then migrate workloads from the old node pool (Blue).
Steps:
Stable or Regular).max-surge-upgrade to ensure availability during upgrades.Expert instructions for building high-quality GKE troubleshooting skills. Codifies Step 0 context rules, zero-hallucination signatures, and explicit LQL/PromQL query requirements.
Diagnose and prevent `vbar_control_agent` segfaults and OOMs caused by race conditions during TPU device resets and frequent metrics collection (e.g. every 3s). Use when TPU slice initialization fails or `vbar_control_agent` crashes on TPU v6e nodes.
Assists in preparing applications and clusters on GKE for production.
Workflows for containerizing and deploying applications to GKE for the first time.
Workflows for auditing and hardening the security of GKE workloads.
Answer natural language questions about GKE-related costs by leveraging BigQuery export and cost allocation data.