一键在 Manus 中运行任何 Skill

$pwd:

gke-ai-troubleshooting-skill-creation-guide

Name: Gke Ai Troubleshooting Skill Creation Guide
Author: GoogleCloudPlatform

// Expert instructions for building high-quality GKE troubleshooting skills. Codifies Step 0 context rules, zero-hallucination signatures, and explicit LQL/PromQL query requirements.

在 Manus 中运行

$ git log --oneline --stat

stars:155

forks:73

updated:2026年5月4日 14:12

文件资源管理器

4 个文件

SKILL.md

readonly

name	gke-ai-troubleshooting-skill-creation-guide
description	Expert instructions for building high-quality GKE troubleshooting skills. Codifies Step 0 context rules, zero-hallucination signatures, and explicit LQL/PromQL query requirements.

Troubleshooting Skill Creation Guide

Use this guide to build high-quality troubleshooting skills that enable AI agents to diagnose complex failures in GKE workloads.

🏗️ Skill Structure Standard

Mandatory Components

SKILL.md: The core diagnostic and resolution workflow.
README.md: Public-facing overview and "When to use" guide.
references/failure_signatures.md: Authentic log/metric signatures.
scripts/validate_queries.sh: Automatic syntax validator for all queries.
TEST.md: Manual verification plan for humans.
EVAL.textproto: Evaluation suite for performance tracking.

Optional Components

BUILD: Build definition.

🏷️ Naming Conventions

Directory Name: MUST be kebab-case (e.g., gke-ai-troubleshooting-tpu-vbar-oom).
Skill Name: MUST match the directory name.

🔍 Diagnostic Workflow Standards

Step 0: Mandatory Context

Every skill MUST begin with a "Step 0" to acquire necessary context.

Mandatory Fields: <project_id>, <location>, <cluster_name>, <timestamp>.
Optional/Case-by-Case Fields: <node_name>, <workload_name>, <workload_namespace>, <nodepool_name>.
Time Rule: Reject relative time (e.g., "5 minutes ago"). Calculate a window of [T - 30m] to [T + 30m].

Diagnostic Steps

Explicit Queries: Every step MUST provide a ready-to-use Cloud Logging (LQL) or Cloud Monitoring (PromQL) query.
Placeholder Syntax: Use angle brackets like <project_id> instead of curly braces for placeholders to avoid template resolution errors.
Risk Categorization: Label every step as [Low Risk] (Read-only) or [High Risk] (Mutative/Destructive).
Automation: Specify if the agent should proceed automatically or wait for user confirmation.

🛠️ Accuracy & Validation

Zero Hallucination

Never synthesize example logs or metrics.
Source signatures from real incidents and anonymize where necessary.
DO NOT EXTRAPOLATE: Only include steps and queries that were verified in the source conversation.

Security & Privacy

No Raw Dumps: Do not instruct the agent to dump raw logs into shared spaces (bugs, chat).
Signal Only: Instruct the agent to summarize findings and report only high-signal information (e.g., "Found specific error pattern X on node Y").

Automated Validation

Every skill MUST include a script (at scripts/validate_queries.sh) that uses query_logs or gcloud logging read ... --limit=1 to verify its LQL queries.

📋 Best Practices

Conciseness: Keep instructions lean. Focus on "what to do" and "how to verify".
Public Ready: Remove all internal notes, personal bookmarks, or project-specific jargon.
Error Signatures: Explicitly link to references/failure_signatures.md in relevant diagnostic steps.

related-skills.json

同仓库

gke-ai-troubleshooting-tpu-connection-failure-vbar-oom.md

from "GoogleCloudPlatform/gke-mcp"

Diagnose and prevent `vbar_control_agent` segfaults and OOMs caused by race conditions during TPU device resets and frequent metrics collection (e.g. every 3s). Use when TPU slice initialization fails or `vbar_control_agent` crashes on TPU v6e nodes.

2026-05-04155

gke-productionize.md

from "GoogleCloudPlatform/gke-mcp"

Assists in preparing applications and clusters on GKE for production.

2026-04-29155

gke-app-onboarding.md

from "GoogleCloudPlatform/gke-mcp"

Workflows for containerizing and deploying applications to GKE for the first time.

2026-04-29155

gke-workload-security.md

from "GoogleCloudPlatform/gke-mcp"

Workflows for auditing and hardening the security of GKE workloads.

2026-04-21155

gke-cost-analysis.md

from "GoogleCloudPlatform/gke-mcp"

Answer natural language questions about GKE-related costs by leveraging BigQuery export and cost allocation data.

2026-04-15155

gke-cluster-creator.md

from "GoogleCloudPlatform/gke-mcp"

Guides the user through creating GKE clusters using pre-defined templates (Standard, Autopilot, GPU/AI).

2026-04-13155

package.json

"author": "GoogleCloudPlatform"

"repository": "GoogleCloudPlatform/gke-mcp"

打开 GitHub 仓库查看创作者相关仓库

$ install --global

$ download --local

在 Manus 中运行

$ useful --forSOC

网络与计算机系统管理员计算机与数学类职业15-1244L4

name	gke-ai-troubleshooting-skill-creation-guide
description	Expert instructions for building high-quality GKE troubleshooting skills. Codifies Step 0 context rules, zero-hallucination signatures, and explicit LQL/PromQL query requirements.

Troubleshooting Skill Creation Guide

Use this guide to build high-quality troubleshooting skills that enable AI agents to diagnose complex failures in GKE workloads.

🏗️ Skill Structure Standard

Mandatory Components

SKILL.md: The core diagnostic and resolution workflow.
README.md: Public-facing overview and "When to use" guide.
references/failure_signatures.md: Authentic log/metric signatures.
scripts/validate_queries.sh: Automatic syntax validator for all queries.
TEST.md: Manual verification plan for humans.
EVAL.textproto: Evaluation suite for performance tracking.

Optional Components

BUILD: Build definition.

🏷️ Naming Conventions

Directory Name: MUST be kebab-case (e.g., gke-ai-troubleshooting-tpu-vbar-oom).
Skill Name: MUST match the directory name.

🔍 Diagnostic Workflow Standards

Step 0: Mandatory Context

Every skill MUST begin with a "Step 0" to acquire necessary context.

Mandatory Fields: <project_id>, <location>, <cluster_name>, <timestamp>.
Optional/Case-by-Case Fields: <node_name>, <workload_name>, <workload_namespace>, <nodepool_name>.
Time Rule: Reject relative time (e.g., "5 minutes ago"). Calculate a window of [T - 30m] to [T + 30m].

Diagnostic Steps

Explicit Queries: Every step MUST provide a ready-to-use Cloud Logging (LQL) or Cloud Monitoring (PromQL) query.
Placeholder Syntax: Use angle brackets like <project_id> instead of curly braces for placeholders to avoid template resolution errors.
Risk Categorization: Label every step as [Low Risk] (Read-only) or [High Risk] (Mutative/Destructive).
Automation: Specify if the agent should proceed automatically or wait for user confirmation.

🛠️ Accuracy & Validation

Zero Hallucination

Never synthesize example logs or metrics.
Source signatures from real incidents and anonymize where necessary.
DO NOT EXTRAPOLATE: Only include steps and queries that were verified in the source conversation.

Security & Privacy

No Raw Dumps: Do not instruct the agent to dump raw logs into shared spaces (bugs, chat).
Signal Only: Instruct the agent to summarize findings and report only high-signal information (e.g., "Found specific error pattern X on node Y").

Automated Validation

Every skill MUST include a script (at scripts/validate_queries.sh) that uses query_logs or gcloud logging read ... --limit=1 to verify its LQL queries.

📋 Best Practices

Conciseness: Keep instructions lean. Focus on "what to do" and "how to verify".
Public Ready: Remove all internal notes, personal bookmarks, or project-specific jargon.
Error Signatures: Explicitly link to references/failure_signatures.md in relevant diagnostic steps.

gke-ai-troubleshooting-skill-creation-guide

Troubleshooting Skill Creation Guide

🏗️ Skill Structure Standard

Mandatory Components

Optional Components

🏷️ Naming Conventions

🔍 Diagnostic Workflow Standards

Step 0: Mandatory Context

Diagnostic Steps

🛠️ Accuracy & Validation

Zero Hallucination

Security & Privacy

Automated Validation

📋 Best Practices

同仓库更多 Skills

同仓库更多 Skills

Troubleshooting Skill Creation Guide

🏗️ Skill Structure Standard

Mandatory Components

Optional Components

🏷️ Naming Conventions

🔍 Diagnostic Workflow Standards

Step 0: Mandatory Context

Diagnostic Steps

🛠️ Accuracy & Validation

Zero Hallucination

Security & Privacy

Automated Validation

📋 Best Practices