一键在 Manus 中运行任何 Skill

$pwd:

cluster-resource-health

Name: Cluster Resource Health
Author: cnoe-io

// Check Kubernetes cluster health including pod status, node conditions, resource utilization, and pending alerts across EKS clusters. Use when monitoring infrastructure health, investigating capacity issues, or performing cluster audits.

在 Manus 中运行

$ git log --oneline --stat

stars:366

forks:64

updated:2026年4月15日 11:21

文件资源管理器

2 个文件

SKILL.md

readonly

name	cluster-resource-health
description	Check Kubernetes cluster health including pod status, node conditions, resource utilization, and pending alerts across EKS clusters. Use when monitoring infrastructure health, investigating capacity issues, or performing cluster audits.

Cluster Resource Health

Query AWS EKS clusters for node health, pod status, resource utilization, and alerts to produce a cluster health dashboard.

Instructions

Phase 1: Cluster Overview (AWS Agent)

List EKS clusters and their status
Check Kubernetes version - current vs. latest, end-of-support date

Phase 2: Node Health

Inspect node conditions - Ready, MemoryPressure, DiskPressure, PIDPressure
Resource utilization per node - CPU, Memory, Pod count

Phase 3: Pod Health

Identify problematic pods - CrashLoopBackOff, ImagePullBackOff, OOMKilled, Pending
Namespace-level summary - pods running, pending, failed per namespace

Phase 4: Resource Capacity Analysis

Cluster-wide utilization - total CPU/Memory requested vs. allocatable
Capacity risks - nodes at >80%, namespaces exceeding quotas

Output Format

```markdown

Cluster Resource Health Report

Cluster Summary

Cluster	Version	Nodes	Status	Overall Health
prod-us-west-2	1.29	12/12 Ready	Active	HEALTHY

Resource Utilization

Resource	Requested	Allocatable	Utilization
CPU	38 cores	48 cores	79%
Memory	96 Gi	128 Gi	75%
```

Examples

"Check the health of our EKS clusters"
"Are there any failing pods in production?"
"Show me cluster resource utilization"
"Which nodes are under memory pressure?"

Guidelines

Check all clusters unless a specific cluster is requested
Flag any node above 85% resource utilization as a capacity risk
For CrashLoopBackOff pods, suggest checking logs as the immediate action
EKS version end-of-support should be flagged at least 90 days before EOL
Use kubectl read-only commands only (never modify cluster state during health checks)

related-skills.json

同仓库

release-docs.md

from "cnoe-io/ai-platform-engineering"

Generate a combined release blog post for ai-platform-engineering. Produces a single docs/releases/YYYY-MM-DD-release-X-Y-Z.md file containing release notes and the upgrade guide (migration guide) inline. Use when cutting a release, when a user asks "what changed in 0.4.x", or when upgrading their values.yaml to a new chart version.

2026-05-12366

update-docs.md

from "cnoe-io/ai-platform-engineering"

Audit and update all documentation moving parts for ai-platform-engineering. Checks release blog posts, features page, agent docs, homepage version strings, Docusaurus version config, and sidebar completeness. Fixes what is stale and reports what needs manual attention. Use after cutting a release, adding a new agent, or updating platform features.

2026-05-07366

incident-postmortem-report.md

from "cnoe-io/ai-platform-engineering"

Produce a thorough incident post-mortem report after an outage or customer-impacting event. Covers executive summary, impact, detailed timeline, root cause, contributing factors, corrective and preventive actions, and lessons learned. Use when the user asks to write, draft, or complete a post-mortem, blameless review, or incident review document.

2026-05-05366

aws-cost-analysis.md

from "cnoe-io/ai-platform-engineering"

Analyze AWS costs by service, account, and time period. Identifies top spenders, cost anomalies, and optimization opportunities. Use when reviewing cloud spend, preparing cost reports, or investigating unexpected charges.

2026-04-15366

check-deployment-status.md

from "cnoe-io/ai-platform-engineering"

Check the health and sync status of all ArgoCD applications across clusters. Identifies out-of-sync, degraded, or unhealthy deployments and provides actionable remediation steps. Use when monitoring deployments, troubleshooting sync failures, or verifying environment health after a release.

2026-04-15366

incident-investigation.md

from "cnoe-io/ai-platform-engineering"

Correlate PagerDuty incidents with Jira tickets and recent ArgoCD deployments to accelerate root cause analysis. Orchestrates multiple agents to build a timeline of events. Use when investigating active incidents, performing post-mortems, or correlating alerts with changes.

2026-04-15366

package.json

"author": "cnoe-io"

"repository": "cnoe-io/ai-platform-engineering"

打开 GitHub 仓库查看创作者相关仓库

$ install --global

$ download --local

在 Manus 中运行

$ useful --forSOC

网络与计算机系统管理员计算机与数学类职业15-1244L4

name	cluster-resource-health
description	Check Kubernetes cluster health including pod status, node conditions, resource utilization, and pending alerts across EKS clusters. Use when monitoring infrastructure health, investigating capacity issues, or performing cluster audits.

Cluster Resource Health

Query AWS EKS clusters for node health, pod status, resource utilization, and alerts to produce a cluster health dashboard.

Instructions

Phase 1: Cluster Overview (AWS Agent)

List EKS clusters and their status
Check Kubernetes version - current vs. latest, end-of-support date

Phase 2: Node Health

Inspect node conditions - Ready, MemoryPressure, DiskPressure, PIDPressure
Resource utilization per node - CPU, Memory, Pod count

Phase 3: Pod Health

Identify problematic pods - CrashLoopBackOff, ImagePullBackOff, OOMKilled, Pending
Namespace-level summary - pods running, pending, failed per namespace

Phase 4: Resource Capacity Analysis

Cluster-wide utilization - total CPU/Memory requested vs. allocatable
Capacity risks - nodes at >80%, namespaces exceeding quotas

Output Format

```markdown

Cluster Resource Health Report

Cluster Summary

Cluster	Version	Nodes	Status	Overall Health
prod-us-west-2	1.29	12/12 Ready	Active	HEALTHY

Resource Utilization

Resource	Requested	Allocatable	Utilization
CPU	38 cores	48 cores	79%
Memory	96 Gi	128 Gi	75%
```

Examples

"Check the health of our EKS clusters"
"Are there any failing pods in production?"
"Show me cluster resource utilization"
"Which nodes are under memory pressure?"

Guidelines

Check all clusters unless a specific cluster is requested
Flag any node above 85% resource utilization as a capacity risk
For CrashLoopBackOff pods, suggest checking logs as the immediate action
EKS version end-of-support should be flagged at least 90 days before EOL
Use kubectl read-only commands only (never modify cluster state during health checks)

cluster-resource-health

Cluster Resource Health

Instructions

Phase 1: Cluster Overview (AWS Agent)

Phase 2: Node Health

Phase 3: Pod Health

Phase 4: Resource Capacity Analysis

Output Format

Cluster Resource Health Report

Cluster Summary

Resource Utilization

Examples

Guidelines

同仓库更多 Skills

同仓库更多 Skills

Cluster Resource Health

Instructions

Phase 1: Cluster Overview (AWS Agent)

Phase 2: Node Health

Phase 3: Pod Health

Phase 4: Resource Capacity Analysis

Output Format

Cluster Resource Health Report

Cluster Summary

Resource Utilization

Examples

Guidelines