| name | dns-debug |
| description | Use when diagnosing DNS resolution failures, investigating blocked domains, debugging website or app connectivity problems reported by users, querying Blocky DNS query logs, or editing Blocky allowlist/denylist configuration in `kubernetes/apps/dns-private/blocky/`. Covers `./scripts/hops.py dns` subcommands (`search`, `logs`, `blocked`, `test`) and the `log_entries` table in the Blocky CNPG cluster. Triggers on phrases like "site X is broken", "DNS isn't working", "why is this domain blocked", "check Blocky logs", "allowlist this domain", or any edit to `blocky/data/config.yaml`. Do NOT use for authoritative/external-dns troubleshooting (different providers, different tooling). |
DNS Debugging
Diagnose DNS resolution failures using Blocky query logs stored in PostgreSQL. Primary scenario: a
user reports a website is broken and you need to determine whether Blocky is blocking a required
domain.
Tool
./scripts/hops.py dns queries the log_entries table in the Blocky CNPG PostgreSQL cluster via
kubectl exec.
Run ./scripts/hops.py dns --help for full usage.
Quick Reference
./scripts/hops.py dns search homedepot -f 24h
./scripts/hops.py dns logs -c 192.168.3.40 -f 1h
./scripts/hops.py dns logs -c pixel -f 1h
./scripts/hops.py dns blocked -c 192.168.3.40 -f 1h
./scripts/hops.py dns logs -c 192.168.3.40 -d homedepot -f 2h
./scripts/hops.py dns blocked -c kids -f 24h
./scripts/hops.py dns blocked --json -c 192.168.3.40
The -c/--client flag accepts: partial IP, partial device name (from reverse DNS), CIDR notation,
or VLAN name. All matching is case-insensitive.
Diagnostic Workflow
When a user reports "website X is broken" or "app Y is not working":
-
Identify the device. If the user names a specific device, use -c with the device name to
target it directly: ./scripts/hops.py dns blocked -c pixel -f 1h. NEVER assume a VLAN based on
the app or use case; always confirm or search. If the device is unknown, search for the domain to
find it: ./scripts/hops.py dns search <domain> -f 24h and pick the client with the most recent
last_seen timestamp.
-
Get blocked queries for that client to find the offending domain: ./scripts/hops.py dns blocked -c <ip-or-name> -f 1h The blocked domain is often not the main site but a subdomain
(API, CDN, auth service).
-
Verify the block by checking the reason field. It indicates which blocklist group matched
(e.g., ads, threats, social).
-
Determine the fix (see Remediation below).
-
After pushing the fix, Flux applies the change and Blocky reloads automatically (reloader
annotation). Verify resolution: ./scripts/hops.py dns logs -c <ip-or-name> -d <domain> -f 5m
The domain should now show RESOLVED or CACHED instead of BLOCKED.
Remediation
Option A: Add an allowlist entry (single domain fix)
Add an allowlists section to the blocking config in
kubernetes/apps/dns-private/blocky/data/config.yaml. The allowlist group name must match the
denylist group that blocked the domain.
blocking:
denylists:
ads:
- https://...
allowlists:
ads:
- |
alloweddomain.com
api.someservice.com
Allowlists take precedence over denylists within the same group. A domain present in both the deny
and allow list for a group will be allowed.
Option B: Switch to a less aggressive blocklist
If false positives are frequent for a list category, consider switching to a less aggressive
variant. Read the current config to identify which list tier is in use, then check upstream for
available tiers.
Option C: Remove a blocklist group from a client group
If an entire category is causing problems for a VLAN, remove it from clientGroupsBlock for that
VLAN.
Configuration Reference
Read kubernetes/apps/dns-private/blocky/data/config.yaml for current:
- VLAN-to-subnet mappings and which block groups apply to each
- Denylist URLs and their group names
- Any existing allowlist entries
Unimplemented Subcommands
The following subcommands were deferred. If you need one during diagnosis, implement it in
./scripts/hops.py dns following the patterns of the existing subcommands, then update this skill
file to move it from this list to the Quick Reference section above.
- top-domains: Top queried domains by count. Flags:
-f, -c, -l. GROUP BY question_name,
ORDER BY count DESC.
- top-blocked: Top blocked domains by count. Same as top-domains but filtered to
response_type = 'BLOCKED'. Include the reason column to show which blocklist group matched.
- top-clients: Top clients by query volume. Flags:
-f, -l. GROUP BY client_ip,
client_name, ORDER BY count DESC.
- slow: Queries exceeding a duration threshold. Flags:
-f, -c, --threshold (milliseconds,
default 500). Filter duration_ms >= threshold, ORDER BY duration_ms DESC.
Pattern Detection
When adding allowlist entries, check git history for systemic issues:
git log --oneline --invert-grep --author="renovate" \
-- kubernetes/apps/dns-private/blocky/data/config.yaml
If you see 3+ allowlist additions in a short period, the denylist tier may be too aggressive.
Propose downgrading the list tier rather than maintaining a growing allowlist.