Run any Skill in Manus with one click

$pwd:

networkpolicy-debug

Name: Networkpolicy Debug
Author: scitix

// Diagnose NetworkPolicy-related connectivity issues (traffic unexpectedly blocked, default-deny effects, egress blocking DNS). Identifies which NetworkPolicies affect a pod, checks ingress/egress rules, and verifies CNI support.

Run Skill in Manus

$ git log --oneline --stat

stars:200

forks:23

updated:March 13, 2026 at 02:44

SKILL.md

readonly

related-skills.json

same repository

skill-authoring.md

from "scitix/siclaw"

Guide for writing and improving Siclaw skills. Read this when creating or modifying a skill. Covers SKILL.md format, script execution modes, and best practices.

2026-04-22200

node-health-check.md

from "scitix/siclaw"

Check node health and diagnose node-level issues (NotReady, DiskPressure, MemoryPressure, PIDPressure). Inspects node conditions, resource allocation, and real-time usage.

2026-04-08200

dns-debug.md

from "scitix/siclaw"

Diagnose DNS resolution failures in the cluster (NXDOMAIN, timeouts, SERVFAIL). Checks CoreDNS health, service endpoints, and DNS configuration.

2026-04-01200

pod-ping-gateway.md

from "scitix/siclaw"

Ping a pod's gateway for a given network interface. Auto-detects gateway IP from the routing table, then pings it. First resolve_pod_netns, then node_script with netns param.

2026-03-26200

pod-show-gateway.md

from "scitix/siclaw"

Show the gateway for a network interface in a Kubernetes pod. Reads the routing table via `ip -j route` from the pod's network namespace. First resolve_pod_netns, then node_script with netns param.

2026-03-26200

node-logs.md

from "scitix/siclaw"

Retrieve logs from a Kubernetes node. Supports journalctl (systemd units) and file-based logs. Use when you need to inspect node-level logs (containerd, kubelet, etc.). Execute via node_script tool.

2026-03-13200

package.json

"author": "scitix"

"repository": "scitix/siclaw"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Network and Computer Systems AdministratorsComputer and Mathematical Occupations15-1244L4

name	networkpolicy-debug
description	Diagnose NetworkPolicy-related connectivity issues (traffic unexpectedly blocked, default-deny effects, egress blocking DNS). Identifies which NetworkPolicies affect a pod, checks ingress/egress rules, and verifies CNI support.

NetworkPolicy Connectivity Diagnosis

When pod-to-pod or pod-to-external communication is unexpectedly blocked, and Service/DNS/Ingress diagnostics show no issues, NetworkPolicy is a common root cause. Follow this flow to identify whether a NetworkPolicy is blocking traffic.

Scope: This skill is for diagnosis only. Once you identify the root cause, report it to the user and stop. Do NOT attempt to modify or delete NetworkPolicies — that should be left to the user or cluster administrator.

When to use: Pod connectivity "suddenly broke" or a newly deployed pod cannot reach other services. Typical clues:

service-debug shows endpoints exist and ports match, but connections time out
dns-debug shows DNS timeouts (may be egress NetworkPolicy blocking UDP 53)
Traffic works from some pods but not others in the same namespace
A new NetworkPolicy was recently applied

Not for other network issues: If the problem is DNS resolution → use dns-debug. If the problem is Service having no endpoints → use service-debug. If the problem is Ingress routing → use ingress-debug. This skill specifically diagnoses NetworkPolicy-level blocking.

Key Concepts

A NetworkPolicy selects pods via podSelector and defines allowed ingress (incoming) and/or egress (outgoing) traffic rules.
NetworkPolicy is deny-by-default once applied. If any NetworkPolicy selects a pod for a given direction (ingress or egress), all traffic in that direction is denied EXCEPT what is explicitly allowed by the rules. Pods with NO NetworkPolicy selecting them allow all traffic.
Multiple NetworkPolicies selecting the same pod are additive (union) — a connection is allowed if ANY matching policy permits it.
NetworkPolicy requires CNI support. If the CNI plugin does not support NetworkPolicy (e.g., Flannel without additional plugins), policies are silently ignored — they can be created but have no effect.
hostNetwork: true pods are exempt. Pods using the host network namespace are not selected by any NetworkPolicy — neither as targets nor as sources. A default-deny policy does not protect or restrict hostNetwork pods.

Diagnostic Flow

1. Verify CNI supports NetworkPolicy

Not all CNI plugins enforce NetworkPolicy. If the CNI does not support it, policies are silently ignored — they exist as API objects but have no effect.

Check which CNI is running:

kubectl get pods -n kube-system -o custom-columns='NAME:.metadata.name' | grep -E 'calico|cilium|weave|antrea|flannel|canal|kube-router'

If no results, the CNI may run in a different namespace (e.g., cilium in cilium namespace, calico in calico-system). Check other namespaces or inspect the node's CNI config:

kubectl get pods -A -o custom-columns='NAMESPACE:.metadata.namespace,NAME:.metadata.name' | grep -E 'calico|cilium|weave|antrea|flannel|canal|kube-router'

CNI	NetworkPolicy support
Calico	Yes
Cilium	Yes (also supports extended CiliumNetworkPolicy)
Weave Net	Yes
Antrea	Yes
Canal (Flannel + Calico)	Yes
kube-router	Yes
Flannel (standalone)	No — policies are silently ignored
kubenet	No

If the CNI does not support NetworkPolicy:

Policies exist but do nothing — not the cause of blocked traffic, look elsewhere
If the user expects policies to work, they need to switch to a CNI that supports them

If the CNI does support NetworkPolicy, continue to step 2.

2. List NetworkPolicies in the namespace

kubectl get networkpolicy -n <ns>

If no NetworkPolicies exist in the namespace, standard Kubernetes NetworkPolicy is not the cause — all traffic is allowed by default. However, if the CNI is Calico or Cilium, also check for CNI-specific extended policies that operate independently of standard NetworkPolicy:

# Cilium extended policies
kubectl get ciliumnetworkpolicy -n <ns> 2>/dev/null
kubectl get ciliumclusterwidenetworkpolicy 2>/dev/null

# Calico extended policies
kubectl get networkpolicy.crd.projectcalico.org -n <ns> 2>/dev/null
kubectl get globalnetworkpolicy.crd.projectcalico.org 2>/dev/null

These CNI-specific policies can block traffic even when no standard NetworkPolicy exists, and they take effect at a higher priority. If extended policies exist, examine their rules using -o yaml.

If neither standard nor extended policies exist, look elsewhere (firewall rules, service mesh, node-level iptables).

If policies exist (standard or extended), continue to step 3.

3. Identify which policies affect the target pod

Kubernetes does not provide a direct API to query "which policies affect this pod." You must manually match each policy's podSelector against the pod's labels.

Get the pod's labels:

kubectl get pod <pod> -n <ns> -o jsonpath='{.metadata.labels}'

Get all NetworkPolicies with their full pod selectors:

kubectl get networkpolicy -n <ns> -o custom-columns='NAME:.metadata.name,SELECTOR:.spec.podSelector'

Note: podSelector can use both matchLabels (exact key-value pairs) and matchExpressions (operators like In, NotIn, Exists, DoesNotExist). The command above shows both forms. If the output is truncated, use -o yaml to see the full selector.

A NetworkPolicy affects the pod if:

The policy's podSelector.matchLabels matches a subset of the pod's labels
The policy's podSelector.matchExpressions conditions are satisfied by the pod's labels (e.g., {key: tier, operator: Exists} matches any pod with a tier label)
An empty podSelector ({}) matches ALL pods in the namespace

List the matching policies — these are the ones controlling the pod's traffic.

4. Check for default-deny policies

A common pattern is a namespace-wide "deny all" policy:

kubectl get networkpolicy -n <ns> -o yaml

Look for policies with empty ingress or egress rules:

Default deny all ingress:

spec:
  podSelector: {}     # matches all pods
  policyTypes:
  - Ingress
  # no ingress rules = deny all incoming

Default deny all egress:

spec:
  podSelector: {}
  policyTypes:
  - Egress
  # no egress rules = deny all outgoing

Default deny both:

spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress

If a default-deny policy exists, ALL traffic to/from pods in the namespace is blocked unless another NetworkPolicy explicitly allows it.

5. Determine which directions a policy controls

Before diagnosing ingress or egress, first confirm which direction(s) each matching policy actually controls. This depends on the policyTypes field:

kubectl get networkpolicy <policy-name> -n <ns> -o jsonpath='{.spec.policyTypes}'

`policyTypes` value	Ingress controlled?	Egress controlled?
`[Ingress]`	Yes	No — egress is unrestricted
`[Egress]`	No — ingress is unrestricted	Yes
`[Ingress, Egress]`	Yes	Yes
Omitted entirely	Yes (always implied)	Only if `egress` rules exist

The omitted case is a common trap: If policyTypes is not specified but the policy has ingress rules and no egress rules, only ingress is controlled — egress remains fully open. If both ingress and egress rules are present (even empty), both directions are controlled.

If the connectivity issue is incoming traffic, focus on policies that control ingress (step 6). If outgoing, focus on egress (step 7). Do not waste time analyzing a direction the policy does not control.

6. Diagnose blocked ingress (incoming traffic to the pod)

If external pods or services cannot reach the target pod, check the ingress rules of all matching policies.

For each matching policy:

kubectl get networkpolicy <policy-name> -n <ns> -o yaml

Check the ingress section. Traffic is allowed if the source matches ANY from rule:

from[].podSelector — allows traffic from pods with matching labels in the SAME namespace
from[].namespaceSelector — allows traffic from pods in namespaces with matching labels
from[].podSelector + namespaceSelector (in same from entry) — AND logic: pods must match both selectors
from[].ipBlock — allows traffic from specific CIDR ranges

Common issue: separate vs combined selectors

# AND logic — pod must be in matching namespace AND have matching labels
ingress:
- from:
  - namespaceSelector:
      matchLabels: {env: prod}
    podSelector:
      matchLabels: {role: frontend}

# OR logic — ANY pod in matching namespace OR ANY pod with matching labels
ingress:
- from:
  - namespaceSelector:
      matchLabels: {env: prod}
  - podSelector:
      matchLabels: {role: frontend}

The difference is whether namespaceSelector and podSelector are in the same list item (AND) or separate list items (OR). This is a frequent source of misconfiguration.

Check if the source pod's labels and namespace match any from rule. If not, the ingress is blocked.

Also check the ports section — if specified, only listed ports/protocols are allowed:

ingress:
- from: [...]
  ports:
  - protocol: TCP
    port: 8080

The port field can be a number or a named port (e.g., port: http). If a named port is used, it must match a containerPort name defined in the target pod's spec. If the pod does not define that port name, the rule will not match.

If the source is connecting on a different port, it will be blocked even if the from selector matches.

NodePort / LoadBalancer SNAT issue:

When external traffic enters through a NodePort or LoadBalancer Service, kube-proxy may SNAT the source IP to the node's IP. This means podSelector and namespaceSelector rules in ingress will NOT match the original client or source pod — they will see the node IP instead.

Check the Service's externalTrafficPolicy:

kubectl get svc <service-name> -n <ns> -o jsonpath='{.spec.externalTrafficPolicy}'

Cluster (default) — source IP is SNATed to node IP. Ingress podSelector/namespaceSelector rules cannot match the original source. Use ipBlock with the node CIDR range instead.
Local — original source IP is preserved, but traffic is only routed to pods on the node that received the request.

If the target pod has ingress NetworkPolicy and receives traffic via NodePort/LoadBalancer with externalTrafficPolicy: Cluster, from: podSelector rules will fail silently — the traffic appears to come from a node IP, not a pod IP.

7. Diagnose blocked egress (outgoing traffic from the pod)

If the pod cannot reach other services or external endpoints, check the egress rules.

For each matching policy that includes Egress in policyTypes:

kubectl get networkpolicy <policy-name> -n <ns> -o yaml

Check the egress section. The same selector logic applies as ingress (podSelector, namespaceSelector, ipBlock).

Critical: DNS egress

If any egress NetworkPolicy is applied to a pod, DNS traffic (UDP/TCP port 53) must be explicitly allowed, otherwise all DNS resolution will fail:

egress:
- to:
  - namespaceSelector:
      matchLabels:
        kubernetes.io/metadata.name: kube-system
  ports:
  - protocol: UDP
    port: 53
  - protocol: TCP
    port: 53

Note: The example above targets only kube-system where CoreDNS runs. A broader alternative is namespaceSelector: {} (matches all namespaces), which is simpler but allows port 53 traffic to any namespace. When diagnosing, check whether ANY rule allows UDP/TCP 53 — the specificity of the namespace selector is a security concern but not a functionality blocker.

Symptoms of blocked DNS egress:

nslookup times out from the pod
Service names cannot be resolved but IP-based connections work
Looks identical to a CoreDNS failure but only affects pods with egress policies

If the user reports DNS timeouts and the pod has an egress NetworkPolicy, check DNS port allowance FIRST before investigating CoreDNS with dns-debug.

API Server egress

The second most common egress issue after DNS. Pods that need to call the Kubernetes API (operators, controllers, pods using service account tokens) must be able to reach the API server. The API server endpoint is typically outside the pod network, so podSelector/namespaceSelector rules will not match it — use ipBlock instead.

Find the API server endpoint:

kubectl get endpoints kubernetes -n default

Symptoms of blocked API server egress:

kubectl commands from within the pod time out (but DNS works — service names resolve)
Operators or controllers cannot watch or list resources
Service account token authentication fails
Pod logs show "connection refused" or "i/o timeout" when calling the API

The key difference from DNS blocking: with DNS blocked, name resolution itself fails. With API server blocked, names resolve but the TCP connection to the API server times out.

8. Cross-namespace communication

When pods in different namespaces need to communicate, NetworkPolicies on BOTH sides may need to allow the traffic:

The destination pod's NetworkPolicy must allow ingress from the source namespace/pod
The source pod's NetworkPolicy (if it has egress rules) must allow egress to the destination namespace/pod

Check both sides:

# Destination namespace policies
kubectl get networkpolicy -n <destination-ns>

# Source namespace policies
kubectl get networkpolicy -n <source-ns>

For namespaceSelector to work, the target namespace must have the referenced labels:

kubectl get namespace <ns> --show-labels

If the namespace lacks the expected labels, the namespaceSelector will not match and traffic will be blocked.

Notes

No policy = allow all. NetworkPolicy is not deny-by-default at the cluster level. Only pods explicitly selected by at least one NetworkPolicy have restrictions. This means adding the FIRST NetworkPolicy to a namespace can suddenly break existing communication.
Policies are additive. If policy A allows port 80 and policy B allows port 443 for the same pod, both ports are allowed. Policies never subtract permissions from each other.
policyTypes matters. See step 5 for the full behavior matrix. Misunderstanding which direction a policy controls is a common cause of wasted debugging effort.
CIDR ranges and pod IPs. Using ipBlock with pod CIDR ranges is fragile — pod IPs change. Prefer podSelector / namespaceSelector for in-cluster traffic. ipBlock is best for external IPs. Also check for except subnets within ipBlock — a rule may allow a broad CIDR (e.g., 10.0.0.0/8) but exclude a specific subnet (e.g., except: [10.244.0.0/16]), causing unexpected blocks for IPs in the excluded range.
Service mesh interaction. If the cluster runs Istio, Linkerd, or similar service meshes, traffic may be additionally controlled by the mesh's own policies (AuthorizationPolicy, etc.). NetworkPolicy operates at L3/L4, while service mesh policies typically operate at L7.
GPU clusters: multi-NIC / RDMA traffic is NOT affected by NetworkPolicy. In GPU training clusters, pods typically have multiple network interfaces: a primary NIC (eth0) managed by the CNI, and secondary NICs (net1, etc.) for RDMA/InfiniBand/RoCE provisioned via Multus + SR-IOV or host-device plugin. NetworkPolicy only applies to the primary CNI-managed interface. RDMA/NCCL traffic on secondary interfaces bypasses CNI entirely and is invisible to NetworkPolicy. If a training job's GPU-to-GPU communication (NCCL) fails, NetworkPolicy is NOT the cause — investigate the RDMA network instead. If the same pod cannot reach the API server, download data, or resolve DNS, those go through the primary NIC and CAN be blocked by NetworkPolicy.
Quick verification: To confirm a NetworkPolicy is the cause, test connectivity from a pod in the same namespace that is NOT selected by any NetworkPolicy (or from a different namespace without policies). If the same connection works from the unaffected pod, the NetworkPolicy is confirmed as the blocker.
For cross-reference: if DNS is timing out, check egress rules here first, then use dns-debug. If Service endpoints exist but connections fail, check ingress rules here, then use service-debug.

networkpolicy-debug

More from this repository

More from this repository

NetworkPolicy Connectivity Diagnosis

Key Concepts

Diagnostic Flow

1. Verify CNI supports NetworkPolicy

2. List NetworkPolicies in the namespace

3. Identify which policies affect the target pod

4. Check for default-deny policies

5. Determine which directions a policy controls

6. Diagnose blocked ingress (incoming traffic to the pod)

7. Diagnose blocked egress (outgoing traffic from the pod)

8. Cross-namespace communication

Notes

NetworkPolicy Connectivity Diagnosis

Key Concepts

Diagnostic Flow

1. Verify CNI supports NetworkPolicy

2. List NetworkPolicies in the namespace

3. Identify which policies affect the target pod

4. Check for default-deny policies

5. Determine which directions a policy controls

6. Diagnose blocked ingress (incoming traffic to the pod)

7. Diagnose blocked egress (outgoing traffic from the pod)

8. Cross-namespace communication

Notes