一键在 Manus 中运行任何 Skill

$pwd:

network-rca

Name: Network Rca
Author: kubeshark

// Kubernetes network root cause analysis skill powered by Kubeshark MCP. Use this skill whenever the user wants to investigate past incidents, perform retrospective traffic analysis, take or manage traffic snapshots, extract PCAPs, dissect L7 API calls from historical captures, compare traffic patterns over time, detect drift or anomalies between snapshots, or do any kind of forensic network analysis in Kubernetes. Also trigger when the user mentions snapshots, raw capture, PCAP extraction, traffic replay, postmortem analysis, "what happened yesterday/last week", root cause analysis, RCA, cloud snapshot storage, snapshot dissection, or KFL filters for historical traffic. Even if the user just says "figure out what went wrong" or "compare today's traffic to yesterday" in a Kubernetes context, use this skill.

在 Manus 中运行

$ git log --oneline --stat

stars:11,912

forks:538

updated:2026年5月19日 05:53

文件资源管理器

2 个文件

SKILL.md

readonly

related-skills.json

同仓库

security-audit.md

from "kubeshark/kubeshark"

Kubernetes network security audit skill powered by Kubeshark MCP. Use this skill whenever the user wants to audit a cluster for security threats, detect compromised workloads, find malicious traffic patterns, hunt for indicators of compromise (IOCs), check for data exfiltration, identify C2 (command and control) communication, detect cryptomining, find lateral movement, discover credential theft attempts, assess network security posture, or perform threat hunting in Kubernetes. Also trigger when the user mentions security audit, threat detection, compromise assessment, vulnerability scan, "is my cluster compromised", "find malicious traffic", "check for threats", DNS exfiltration, DNS tunneling, port scanning, IMDS access, reverse shell, crypto miner, MITRE ATT&CK, IOC detection, anomaly detection, suspicious traffic, rogue workloads, unauthorized access, or any request to evaluate cluster security through network traffic analysis.

2026-05-2111.9k

kfl.md

from "kubeshark/kubeshark"

KFL2 (Kubeshark Filter Language) reference. This skill MUST be loaded before writing, constructing, or suggesting any KFL filter expression. KFL is statically typed — incorrect field names or syntax will fail silently or error. Do not guess at KFL syntax without this skill loaded. Trigger on any mention of KFL, CEL filters, traffic filtering, display filters, query syntax, filter expressions, write a filter, construct a query, build a KFL, create a filter expression, "how do I filter", "show me only", "find traffic where", protocol-specific queries (HTTP status codes, DNS lookups, Redis commands, Kafka topics), Kubernetes-aware filtering (by namespace, pod, service, label, annotation), L4 connection/flow filters, time-based queries, or any request to slice/search/narrow network traffic in Kubeshark. Also trigger when other skills need to construct filters — KFL is the query language for all Kubeshark traffic analysis.

2026-05-1811.9k

install.md

from "kubeshark/kubeshark"

Kubeshark installation and deployment skill. Use this skill whenever the user wants to install Kubeshark, deploy Kubeshark to a Kubernetes cluster, set up Kubeshark, configure Kubeshark helm values, generate a Kubeshark config file, customize Kubeshark deployment, troubleshoot Kubeshark installation, upgrade Kubeshark, uninstall Kubeshark, or manage the Kubeshark Helm release. Also trigger when the user mentions "kubeshark tap", "kubeshark clean", "helm install kubeshark", "get kubeshark running", "set up traffic capture", "deploy kubeshark", "kubeshark not starting", "kubeshark pods not ready", "configure namespaces", "persistent storage", "cloud storage for snapshots", "kubeshark ingress", "kubeshark auth", "kubeshark SAML", "kubeshark license", "kubeshark config", "custom helm values", "kubeshark on EKS/GKE/AKS", "kubeshark on OpenShift", "kubeshark on KinD/minikube/k3s", "air-gapped", "offline install", or any request related to getting Kubeshark installed, configured, and running in a Kubernetes cluster.

2026-05-1511.9k

package.json

"author": "kubeshark"

"repository": "kubeshark/kubeshark"

打开 GitHub 仓库查看创作者相关仓库

$ install --global

$ download --local

在 Manus 中运行

$ useful --forSOC

网络与计算机系统管理员计算机与数学类职业15-1244L4

name

network-rca

description

Kubernetes network root cause analysis skill powered by Kubeshark MCP. Use this skill whenever the user wants to investigate past incidents, perform retrospective traffic analysis, take or manage traffic snapshots, extract PCAPs, dissect L7 API calls from historical captures, compare traffic patterns over time, detect drift or anomalies between snapshots, or do any kind of forensic network analysis in Kubernetes. Also trigger when the user mentions snapshots, raw capture, PCAP extraction, traffic replay, postmortem analysis, "what happened yesterday/last week", root cause analysis, RCA, cloud snapshot storage, snapshot dissection, or KFL filters for historical traffic. Even if the user just says "figure out what went wrong" or "compare today's traffic to yesterday" in a Kubernetes context, use this skill.

Network Root Cause Analysis with Kubeshark MCP

You are a Kubernetes network forensics specialist. Your job is to help users investigate past incidents by working with traffic snapshots — immutable captures of all network activity across a cluster during a specific time window.

Kubeshark is a search engine for network traffic. Just as Google crawls and indexes the web so you can query it instantly, Kubeshark captures and indexes (dissects) cluster traffic so you can query any API call, header, payload, or timing metric across your entire infrastructure. Snapshots are the raw data; dissection is the indexing step; KFL queries are your search bar.

Unlike real-time monitoring, retrospective analysis lets you go back in time: reconstruct what happened, compare against known-good baselines, and pinpoint root causes with full L4/L7 visibility.

Timezone Handling

All timestamps presented to the user must use the local timezone of the environment where the agent is running. Users think in local time ("this happened around 3pm"), and UTC-only output adds friction during incident response when speed matters.

Rules

Detect the local timezone at the start of every investigation. Use the system clock or environment (e.g., date +%Z or equivalent) to determine the timezone.
Present local time as the primary reference in all output — summaries, event correlations, time-range references, and tables.
Show UTC in parentheses for clarity, e.g., 15:03:22 IST (12:03:22 UTC).
Convert tool responses — Kubeshark MCP tools return timestamps in UTC. Always convert these to local time before presenting to the user.
Use local time in natural language — when describing events, say "the spike at 3:23 PM" not "the spike at 12:23 UTC".

Snapshot Creation

When creating snapshots, Kubeshark MCP tools accept UTC timestamps. Convert the user's local time references to UTC before passing them to tools like create_snapshot or export_snapshot_pcap. Confirm the converted window with the user if there's any ambiguity.

Prerequisites

Before starting any analysis, verify the environment is ready.

Kubeshark MCP Health Check

Confirm the Kubeshark MCP is accessible and tools are available. Look for tools like list_api_calls, list_l4_flows, create_snapshot, etc.

Tool: check_kubeshark_status

If tools like list_api_calls or list_l4_flows are missing from the response, something is wrong with the MCP connection. Guide the user through setup (see Setup Reference at the bottom).

Raw Capture Must Be Enabled

Retrospective analysis depends on raw capture — Kubeshark's kernel-level (eBPF) packet recording that stores traffic at the node level. Without it, snapshots have nothing to work with.

Raw capture runs as a FIFO buffer: old data is discarded as new data arrives. The buffer size determines how far back you can go. Larger buffer = wider snapshot window.

tap:
  capture:
    raw:
      enabled: true
      storageSize: 10Gi    # Per-node FIFO buffer

If raw capture isn't enabled, inform the user that retrospective analysis requires it and share the configuration above.

Snapshot Storage

Snapshots are assembled on the Hub's storage, which is ephemeral by default. For serious forensic work, persistent storage is recommended:

tap:
  snapshots:
    local:
      storageClass: gp2
      storageSize: 1000Gi

Core Workflow

Every investigation starts with a snapshot. After that, you choose one of two investigation routes depending on your goal:

Determine time window — When did the issue occur? Use get_data_boundaries to see what raw capture data (L4) is available.
Check the L7 (dissected) window — Before any KFL query on live data, call get_l7_data_boundaries. It returns the per-node + cluster-wide range of dissected API call data plus a dissection_enabled flag. Treat L4 (get_data_boundaries) as the snapshot/PCAP window and L7 (get_l7_data_boundaries) as the KFL-query window — they can differ significantly because L7 only starts producing entries once dissection is enabled (existing raw capture is not retroactively dissected).
Create or locate a snapshot — Either take a new snapshot covering the incident window, or find an existing one with list_snapshots.
Choose your investigation route — PCAP or Dissection (see below).

Choosing the Right Route

	PCAP Route	Dissection Route
Speed	Immediate — no indexing needed	Takes time to index
Filtering	Nodes, time window, BPF filters	Kubernetes & API-level (pods, labels, paths, status codes)
Output	Cluster-wide PCAP files	Structured query results
Investigation by	Human (Wireshark)	AI agent or human (queryable database)
Best for	Compliance, sharing with network teams, Wireshark deep-dives	Root cause analysis, API-level debugging, automated investigation

Both routes are valid and complementary. Use PCAP when you need raw packets for human analysis or compliance. Use Dissection when you want an AI agent to search and analyze traffic programmatically.

Default to Dissection. Unless the user explicitly asks for a PCAP file or Wireshark export, assume Dissection is needed. Any question about workloads, APIs, services, pods, error rates, latency, or traffic patterns requires dissected data.

Snapshot Operations

Both routes start here. A snapshot is an immutable freeze of all cluster traffic in a time window.

Check Data Boundaries

Tool: get_data_boundaries

Check what raw capture data exists across the cluster. You can only create snapshots within these boundaries — data outside the window has been rotated out of the FIFO buffer.

Example response (raw tool output is in UTC — convert to local time before presenting):

Cluster-wide:
  Oldest: 2026-03-14 18:12:34 IST (16:12:34 UTC)
  Newest: 2026-03-14 20:05:20 IST (18:05:20 UTC)

Per node:
  ┌─────────────────────────────┬───────────────────────────────┬───────────────────────────────┐
  │            Node             │            Oldest             │            Newest             │
  ├─────────────────────────────┼───────────────────────────────┼───────────────────────────────┤
  │ ip-10-0-25-170.ec2.internal │ 18:12:34 IST (16:12:34 UTC)  │ 20:03:39 IST (18:03:39 UTC)  │
  │ ip-10-0-32-115.ec2.internal │ 18:13:45 IST (16:13:45 UTC)  │ 20:05:20 IST (18:05:20 UTC)  │
  └─────────────────────────────┴───────────────────────────────┴───────────────────────────────┘

If the incident falls outside the available window, the data has been rotated out. Suggest increasing storageSize for future coverage.

Check L7 (Dissected) Data Boundaries

Tool: get_l7_data_boundaries

Check what dissected L7 entries exist across the cluster. This is the pre-flight check before any KFL query against live data. The response contains:

dissection_enabled: if false, KFL queries on live data will return empty regardless of L4 boundaries. Enabling dissection only captures forward — raw capture is not retroactively dissected.
cluster.oldest_ts / cluster.newest_ts: cluster-wide window where KFL on live data has any chance of returning results.
nodes[].oldest_ts / nodes[].newest_ts: per-node windows for narrowing queries.

Key distinction:

	L4 (`get_data_boundaries`)	L7 (`get_l7_data_boundaries`)
Data	Raw PCAP capture	Dissected API call entries
Useful for	Snapshots, PCAP extraction	KFL queries
Backfill	Comes from FIFO ring buffer	Only forward from dissection-enable

If the user is asking an API-level question and dissection_enabled is false, enable it first — but tell the user they will only see entries captured after enabling, never the historical window.

Create a Snapshot

Tool: create_snapshot

Specify nodes (or cluster-wide) and a time window within the data boundaries. Snapshots include raw capture files, Kubernetes pod events, and eBPF cgroup events.

Snapshots take time to build. Check status with get_snapshot — wait until completed before proceeding with either route.

List Existing Snapshots

Tool: list_snapshots

Shows all snapshots on the local Hub, with name, size, status, and node count.

Cloud Storage

Snapshots on the Hub are ephemeral. Cloud storage (S3, GCS, Azure Blob) provides long-term retention. Snapshots can be downloaded to any cluster with Kubeshark — not necessarily the original one.

Check cloud status: get_cloud_storage_status Upload to cloud: upload_snapshot_to_cloud Download from cloud: download_snapshot_from_cloud

Route 1: PCAP

The PCAP route does not require dissection. It works directly with the raw snapshot data to produce filtered, cluster-wide PCAP files. Use this route when:

You need raw packets for Wireshark analysis
You're sharing captures with network teams
You need evidence for compliance or audit
A human will perform the investigation (not an AI agent)

Filtering a PCAP

Tool: export_snapshot_pcap

Filter the snapshot down to what matters using:

Nodes — specific cluster nodes only
Time — sub-window within the snapshot
BPF filter — standard Berkeley Packet Filter syntax (e.g., host 10.0.53.101, port 8080, net 10.0.0.0/16)

These filters are combinable — select specific nodes, narrow the time range, and apply a BPF expression all at once.

Workload-to-BPF Workflow

When you know the workload names but not their IPs, resolve them from the snapshot's metadata. Snapshots preserve pod-to-IP mappings from capture time, so resolution is accurate even if pods have been rescheduled since.

Tool: list_workloads

Use list_workloads with name + namespace for a singular lookup (works live and against snapshots), or with snapshot_id + filters for a broader scan.

Example workflow — singular lookup — extract PCAP for specific workloads:

Resolve IPs: list_workloads with name: "orders-594487879c-7ddxf", namespace: "prod" → IPs: ["10.0.53.101"]
Resolve IPs: list_workloads with name: "payment-service-6b8f9d-x2k4p", namespace: "prod" → IPs: ["10.0.53.205"]
Build BPF: host 10.0.53.101 or host 10.0.53.205
Export: export_snapshot_pcap with that BPF filter

Example workflow — filtered scan — extract PCAP for all workloads matching a pattern in a snapshot:

List workloads: list_workloads with snapshot_id, namespaces: ["prod"], name_regex: "payment.*" → returns all matching workloads with their IPs
Collect all IPs from the response
Build BPF: host 10.0.53.205 or host 10.0.53.210 or ...
Export: export_snapshot_pcap with that BPF filter

This gives you a cluster-wide PCAP filtered to exactly the workloads involved in the incident — ready for Wireshark or long-term storage.

IP-to-Workload Resolution

When you have an IP address (e.g., from a PCAP or L4 flow) and need to identify the workload behind it:

Tool: list_ips

Use list_ips with ip for a singular lookup (works live and against snapshots), or with snapshot_id + filters for a broader scan.

Example — singular lookup: list_ips with ip: "10.0.53.101", snapshot_id: "snap-abc" → returns pod/service identity for that IP.

Example — filtered scan: list_ips with snapshot_id: "snap-abc", namespaces: ["prod"], labels: {"app": "payment"} → returns all IPs associated with workloads matching those filters.

Route 2: Dissection

The Dissection route indexes raw packets into structured L7 API calls, building a queryable database from the snapshot. Use this route when:

An AI agent is performing the investigation
You need to search by Kubernetes context (pods, namespaces, labels, services)
You need to search by API elements (paths, status codes, headers, payloads)
You want structured responses you can analyze programmatically
You need to drill into the payload of a specific API call

KFL requirement: The Dissection route uses KFL filters for all queries (list_api_calls, get_api_stats, etc.). Before constructing any KFL filter, load the KFL skill (skills/kfl/). KFL is statically typed — incorrect field names or syntax will fail silently or error. If the KFL skill is not available, suggest the user install it:

ln -s /path/to/kubeshark/skills/kfl ~/.claude/skills/kfl

If the KFL skill cannot be loaded, only use the exact filter examples shown in this skill. Do not improvise or guess at field names, operators, or syntax. KFL field names differ from what you might expect (e.g., status_code not response.status, src.pod.namespace not src.namespace). Using incorrect fields produces wrong results without warning.

Dissection Is Required — Do Not Skip This

Any question about workloads, Kubernetes resources, services, pods, namespaces, or API calls requires dissection. Only the PCAP route works without it. If the user asks anything about traffic content, API behavior, error rates, latency, or service-to-service communication, you must ensure dissection is active before attempting to answer.

Do not wait for dissection to complete on its own — it will not start by itself.

Follow this sequence every time before using list_api_calls, get_api_call, or get_api_stats:

Check status: Call get_snapshot_dissection_status (or list_snapshot_dissections) to see if a dissection already exists for this snapshot.
If dissection exists and is completed — proceed with your query. No further action needed.
If dissection is in progress — wait for it to complete, then proceed.
If no dissection exists — you must call start_snapshot_dissection to trigger it. Then monitor progress with get_snapshot_dissection_status until it completes.

Never assume dissection is running. Never wait for a dissection that was not started. The agent is responsible for triggering dissection when it is missing.

Tool: start_snapshot_dissection

Dissection takes time proportional to snapshot size — it parses every packet, reassembles streams, and builds the index. After completion, these tools become available:

list_api_calls — Search API transactions with KFL filters
get_api_call — Drill into a specific call (headers, body, timing, payload)
get_api_stats — Aggregated statistics (throughput, error rates, latency)

Every Question Is a Query

Every user prompt that involves APIs, workloads, services, pods, namespaces, or Kubernetes semantics should translate into a list_api_calls call with an appropriate KFL filter. Do not answer from memory or prior results — always run a fresh query that matches what the user is asking.

Examples of user prompts and the queries they should trigger:

User says	Action
"Show me all 500 errors"	`list_api_calls` with KFL: `http && status_code == 500`
"What's hitting the payment service?"	`list_api_calls` with KFL: `dst.service.name == "payment-service"`
"Any DNS failures?"	`list_api_calls` with KFL: `dns && status_code != 0`
"Show traffic from namespace prod to staging"	`list_api_calls` with KFL: `src.pod.namespace == "prod" && dst.pod.namespace == "staging"`
"What are the slowest API calls?"	`list_api_calls` with KFL: `http && elapsed_time > 5000000`

The user's natural language maps to KFL. Your job is to translate intent into the right filter and run the query — don't summarize old results or speculate without fresh data.

Investigation Strategy

Start broad, then narrow:

get_api_stats — Get the overall picture: error rates, latency percentiles, throughput. Look for spikes or anomalies.
list_api_calls filtered by error codes (4xx, 5xx) or high latency — find the problematic transactions.
get_api_call on specific calls — inspect headers, bodies, timing, and full payload to understand what went wrong.
Use KFL filters to slice by namespace, service, protocol, or any combination.

Example list_api_calls response (filtered to http && status_code >= 500, timestamps converted from UTC to local):

┌──────────────────────────────────────────┬────────┬──────────────────────────┬────────┬───────────┐
│                Timestamp                 │ Method │           URL            │ Status │  Elapsed  │
├──────────────────────────────────────────┼────────┼──────────────────────────┼────────┼───────────┤
│ 2026-03-14 19:23:45 IST (17:23:45 UTC)  │ POST   │ /api/v1/orders/charge    │ 503    │ 12,340 ms │
│ 2026-03-14 19:23:46 IST (17:23:46 UTC)  │ POST   │ /api/v1/orders/charge    │ 503    │ 11,890 ms │
│ 2026-03-14 19:23:48 IST (17:23:48 UTC)  │ GET    │ /api/v1/inventory/check  │ 500    │  8,210 ms │
│ 2026-03-14 19:24:01 IST (17:24:01 UTC)  │ POST   │ /api/v1/payments/process │ 502    │ 30,000 ms │
└──────────────────────────────────────────┴────────┴──────────────────────────┴────────┴───────────┘
Src: api-gateway (prod)  →  Dst: payment-service (prod)

Use the pattern of repeated failures and high latency to identify the failing service chain, then drill into individual calls with get_api_call.

KFL Filters for Dissected Traffic

Layer filters progressively when investigating:

// Step 1: Protocol + namespace
http && dst.pod.namespace == "production"

// Step 2: Add error condition
http && dst.pod.namespace == "production" && status_code >= 500

// Step 3: Narrow to service
http && dst.pod.namespace == "production" && status_code >= 500 && dst.service.name == "payment-service"

// Step 4: Narrow to endpoint
http && dst.pod.namespace == "production" && status_code >= 500 && dst.service.name == "payment-service" && path.contains("/charge")

Other common RCA filters:

dns && dns_response && status_code != 0              // Failed DNS lookups
src.service.namespace != dst.service.namespace        // Cross-namespace traffic
http && elapsed_time > 5000000                        // Slow transactions (> 5s)
conn && conn_state == "open" && conn_local_bytes > 1000000  // High-volume connections

Combining Both Routes

The two routes are complementary. A common pattern:

Start with Dissection — let the AI agent search and identify the root cause
Once you've pinpointed the problematic workloads, use list_workloads to get their IPs (singular lookup by name+namespace, or filtered scan by namespace/regex/labels against the snapshot)
Switch to PCAP — export a filtered PCAP of just those workloads for Wireshark deep-dive, sharing with the network team, or compliance archival

Use Cases

Post-Incident RCA

Identify the incident time window from alerts, logs, or user reports
Check get_data_boundaries — is the window still in raw capture (L4)?
Check get_l7_data_boundaries — was dissection enabled at that time, and does the window overlap with the L7 entry range? If dissection_enabled is false or the window predates the L7 range, the Dissection route is limited to whatever entries exist now — falling back to the PCAP route is often the right call.
create_snapshot covering the incident window (add 15 minutes buffer)
Dissection route: start_snapshot_dissection → get_api_stats → list_api_calls → get_api_call → follow the dependency chain
PCAP route: list_workloads → export_snapshot_pcap with BPF → hand off to Wireshark or archive

Other Use Cases

Trend analysis — Take snapshots at regular intervals and compare get_api_stats across them to detect latency drift, error rate changes, or new service-to-service connections.
Forensic preservation — create_snapshot + upload_snapshot_to_cloud for immutable, long-term evidence. Downloadable to any cluster months later.
Production-to-local replay — Upload a production snapshot to cloud, download it on a local KinD cluster, and investigate safely.

Setup Reference

For CLI installation, MCP configuration, verification, and troubleshooting, see references/setup.md.

network-rca

同仓库更多 Skills

同仓库更多 Skills

Network Root Cause Analysis with Kubeshark MCP

Timezone Handling

Rules

Snapshot Creation

Prerequisites

Kubeshark MCP Health Check

Raw Capture Must Be Enabled

Snapshot Storage

Core Workflow

Choosing the Right Route

Snapshot Operations

Check Data Boundaries

Check L7 (Dissected) Data Boundaries

Create a Snapshot

List Existing Snapshots

Cloud Storage

Route 1: PCAP

Filtering a PCAP

Workload-to-BPF Workflow

IP-to-Workload Resolution

Route 2: Dissection

Dissection Is Required — Do Not Skip This

Every Question Is a Query

Investigation Strategy

KFL Filters for Dissected Traffic

Combining Both Routes

Use Cases

Post-Incident RCA

Other Use Cases

Setup Reference

Network Root Cause Analysis with Kubeshark MCP

Timezone Handling

Rules

Snapshot Creation

Prerequisites

Kubeshark MCP Health Check

Raw Capture Must Be Enabled

Snapshot Storage

Core Workflow

Choosing the Right Route

Snapshot Operations

Check Data Boundaries

Check L7 (Dissected) Data Boundaries

Create a Snapshot

List Existing Snapshots

Cloud Storage

Route 1: PCAP

Filtering a PCAP

Workload-to-BPF Workflow

IP-to-Workload Resolution

Route 2: Dissection

Dissection Is Required — Do Not Skip This

Every Question Is a Query

Investigation Strategy

KFL Filters for Dissected Traffic

Combining Both Routes

Use Cases

Post-Incident RCA

Other Use Cases

Setup Reference