一键导入
gke-workload-security
// Workflows for auditing and hardening the security of GKE workloads.
// Workflows for auditing and hardening the security of GKE workloads.
| name | gke-workload-security |
| description | Workflows for auditing and hardening the security of GKE workloads. |
This skill provides workflows and best practices for securing GKE workloads. It covers security auditing, Identity and Access Management (Workload Identity), Network Security (Network Policies), and Node Security.
Assess the current security posture of your cluster using the provided audit script.
Capabilities:
Command:
./scripts/audit_cluster.sh <cluster-name> <region> <project-id>
Workload Identity allows Kubernetes Service Accounts (KSAs) to impersonate Google Service Accounts (GSAs). This is the recommended method for workloads to access Google Cloud APIs.
Steps:
Create Namespace and KSA:
kubectl create namespace workload-identity-test-ns
kubectl create serviceaccount <ksa-name> \
--namespace workload-identity-test-ns
Bind KSA to GSA:
gcloud iam service-accounts add-iam-policy-binding <gsa-name>@<project-id>.iam.gserviceaccount.com \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:<project-id>.svc.id.goog[workload-identity-test-ns/<ksa-name>]"
Annotate KSA:
kubectl annotate serviceaccount <ksa-name> \
--namespace workload-identity-test-ns \
iam.gke.io/gcp-service-account=<gsa-name>@<project-id>.iam.gserviceaccount.com
Verify Example Pod:
Use existing asset assets/workload-identity-pod.yaml to test the
configuration. Update the <ksa-name> in the file first.
kubectl apply -f ./assets/workload-identity-pod.yaml -n workload-identity-test-ns
Control traffic flow between Pods using Network Policies. By default, all traffic is allowed.
Enable Network Policy Enforcement:
gcloud container clusters update <cluster-name> \
--update-addons=NetworkPolicy=ENABLED \
--region <region>
[!NOTE] If your cluster uses Dataplane V2 (
--enable-dataplane-v2), Network Policy enforcement is built-in and this step is not required (and may fail).
Apply Default Deny Policy: Isolate namespaces by denying all ingress and egress traffic by default.
Replace with the namespace you want to isolate.
kubectl apply -f ./assets/default-deny-netpol.yaml -n <target-namespace>
Ensure nodes are running with verifiable integrity.
Command:
gcloud container clusters update <cluster-name> \
--enable-shielded-nodes \
--region <region>
Run untrusted workloads in a sandbox for extra isolation.
Enable GKE Sandbox:
gcloud container clusters update <cluster-name> \
--enable-gke-sandbox \
--region <region>
Run a Sandboxed Pod:
Add runtimeClassName: gvisor to your Pod spec.
Enforce security policies on namespaces using labels.
Enforce Restricted Profile:
kubectl label --overwrite ns <namespace> \
pod-security.kubernetes.io/enforce=restricted \
pod-security.kubernetes.io/enforce-version=latest
[!NOTE] Using
latestensures you use the policies corresponding to the cluster's current version. You can pin it to a specific version (e.g.,v1.30) to lock down the namespace to policies of a specific release.
Mount secrets from Google Cloud Secret Manager directly as volumes in your pods.
Prerequisites: Secret Manager CSI driver must be enabled on the cluster.
Example SecretProviderClass:
apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
name: my-secret-provider
spec:
provider: gcp
parameters:
secrets: |
- resourceName: "projects/<project-id>/secrets/my-secret/versions/latest"
fileName: "my-secret-file"
Example Pod Spec excerpt:
spec:
containers:
- name: my-app
volumeMounts:
- name: secrets-store-inline
mountPath: "/mnt/secrets"
readOnly: true
volumes:
- name: secrets-store-inline
csi:
driver: secrets-store.csi.k8s.io
readOnly: true
volumeAttributes:
secretProviderClass: "my-secret-provider"
If using GKE Dataplane V2, you can log allowed and denied connections.
Steps:
NetworkLogging custom resource.Example NetworkLogging Manifest:
apiVersion: networking.gke.io/v1alpha1
kind: NetworkLogging
metadata:
name: default
spec:
cluster:
allow:
log: true
delegate: true
deny:
log: true
delegate: true
This will log connection details to Cloud Logging.
baseline or restricted Pod Security Standards on all non-system namespaces.Expert instructions for building high-quality GKE troubleshooting skills. Codifies Step 0 context rules, zero-hallucination signatures, and explicit LQL/PromQL query requirements.
Diagnose and prevent `vbar_control_agent` segfaults and OOMs caused by race conditions during TPU device resets and frequent metrics collection (e.g. every 3s). Use when TPU slice initialization fails or `vbar_control_agent` crashes on TPU v6e nodes.
Assists in preparing applications and clusters on GKE for production.
Workflows for containerizing and deploying applications to GKE for the first time.
Answer natural language questions about GKE-related costs by leveraging BigQuery export and cost allocation data.
Guides the user through creating GKE clusters using pre-defined templates (Standard, Autopilot, GPU/AI).