一键导入
cortex-troubleshoot
Troubleshoot cortex connection failures, missing logs, unhealthy containers, restart loops, or vague "logs aren't working" reports.
用 Codex 或 Claude 帮你安装 复制这段 Prompt,粘贴到 Codex、Claude 或其他助手里,让它检查 Skill 页面并帮你完成安装。
菜单
Troubleshoot cortex connection failures, missing logs, unhealthy containers, restart loops, or vague "logs aren't working" reports.
用 Codex 或 Claude 帮你安装 复制这段 Prompt,粘贴到 Codex、Claude 或其他助手里,让它检查 Skill 页面并帮你完成安装。
基于 SOC 职业分类
| name | cortex-troubleshoot |
| description | Troubleshoot cortex connection failures, missing logs, unhealthy containers, restart loops, or vague "logs aren't working" reports. |
Diagnose cortex problems systematically. Use the binary's observability counters and existing diagnostic tooling rather than guessing — the codebase exposes most state needed to localize a failure.
Match the user's report against one of these branches and follow only that branch. Don't run every check; that's what cortex-dr is for and it overwhelms when the failure is narrow.
Most common cause: empty / wrong $CLAUDE_PLUGIN_OPTION_SERVER_URL, mismatched $CLAUDE_PLUGIN_OPTION_API_TOKEN, or service not running.
ss -tlnp | grep -E ":$CLAUDE_PLUGIN_OPTION_MCP_PORT" — if empty, the service is down → branch C~/.claude/settings.json, find the pluginConfigs key that starts with cortex@, and inspect options.server_url — empty string is a known footgun (the .mcp.json substitution produces a literal /mcp). Check non-empty, has scheme, no trailing /mcp.curl -sS -o /dev/null -w '%{http_code}' "$CLAUDE_PLUGIN_OPTION_SERVER_URL/mcp".
$CLAUDE_PLUGIN_OPTION_NO_AUTH is true or no bearer/OAuth auth is configured, 200 or MCP protocol-level 400/405 can be normal route evidence.401 for an unauthenticated request.404, the route is wrong or a different server owns that port. If connection refused, branch C.200 while auth is intended to be enabled, flag it as an auth configuration mismatch.curl -sS -X POST -H "Authorization: Bearer $CLAUDE_PLUGIN_OPTION_API_TOKEN" -H "Content-Type: application/json" -H "Accept: application/json, text/event-stream" -d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2025-06-18","capabilities":{},"clientInfo":{"name":"curl","version":"0"}}}' "$CLAUDE_PLUGIN_OPTION_SERVER_URL/mcp". 401 = wrong token. 200 with valid response = server fine, problem is in Claude Code's MCP client config. For OAuth mode, use the OAuth client flow instead of bearer-token curl. Note: verify the MCP protocol version string (2025-06-18) matches the current spec if this test fails unexpectedly.cortex action=hosts. If host is absent, no logs ever arrived → check forwarding config on <host>. If present with old last_seen, forwarding stopped → check rsyslog/forwarder on host.ss -tlnp | grep -E ":(${CLAUDE_PLUGIN_OPTION_SYSLOG_HOST_PORT:-$CLAUDE_PLUGIN_OPTION_SYSLOG_PORT})\\b" should show our process or container port publish. From <host>: nc -zv <our_host> "${CLAUDE_PLUGIN_OPTION_SYSLOG_HOST_PORT:-$CLAUDE_PLUGIN_OPTION_SYSLOG_PORT}" should connect.ssh <host> "sudo journalctl -t rsyslogd -n 30 --no-pager" — look for omfwd errors (DNS resolution, peer closed, EOF on TCP). Common patterns we've seen: stale forwarder pointing at a dead host, idle TCP timeout flapping, missing rsyslog drop-in.ssh <host> "cat /etc/rsyslog.d/99-cortex.conf 2>/dev/null" should contain *.* @@<our_host>:<externally reachable syslog port> (TCP), usually ${CLAUDE_PLUGIN_OPTION_SYSLOG_HOST_PORT:-$CLAUDE_PLUGIN_OPTION_SYSLOG_PORT} — if missing or wrong, use cortex-deploy-dropins.$CLAUDE_PLUGIN_OPTION_FLEET_HOSTS but doesn't see them, check $CLAUDE_PLUGIN_OPTION_DOCKER_INGEST_ENABLED. If false, ingest is off entirely. If true, verify the docker-socket-proxy on that host is reachable: curl -sS http://<host>:2375/_ping should return OK.docker ps --filter name=cortex --format '{{.Status}}'cortex-logs for the last 100 lines, or run docker compose logs manually. Look for: panic messages, port-bind errors (address already in use), DB lock errors, OOM kills.$CLAUDE_PLUGIN_OPTION_SYSLOG_PORT or $CLAUDE_PLUGIN_OPTION_MCP_PORT held by another process. First identify the owner with ss -tulpn/lsof/fuser; only kill or restart anything after the user approves the specific process and impact.cortex stdio process holds it). pgrep -af "cortex" to list candidates; only kill stragglers after approval.docker compose pull to refresh./health works manually: Container is unhealthy because the healthcheck command inside the image is wrong/can't run. Compare image version to what you expect — docker inspect cortex | jq '.[0].Config.Image'.Use cortex-dr for the comprehensive preflight and health check. Its PASS / WARN / FAIL output narrows the problem to a specific check. Then re-enter this skill on the failing check's category.
The binary exposes runtime counters via cortex action=stats and /health. Useful signals:
total_logs not increasing → ingest pipeline is broken, not just MCPwrite_blocked: true → storage budget tripped, oldest logs being purged but can't keep up; check $CLAUDE_PLUGIN_OPTION_MAX_DB_SIZE_MB vs disk freephantom_fts_rows growing → retention purges aren't merging FTS5 cleanly; usually self-recoverslast_ingest_at minutes-stale → forwarders aren't reaching usRuntimeObservability (since v0.13.0): UDP/TCP packets, ingest queue depth, writer flush failures — pull these via the /health endpoint or stats action and use to localize "ingest path" vs "writer path" failurescortex-redeploy over manual Docker commands.This skill should be used when the user asks for a homelab health report, syslog summary, fleet status report, log analysis summary, 'what happened in the last 24 hours', 'show me this week's errors', 'summarize recent activity', or any time-bounded log analysis that should produce a written markdown report.
This skill should be used when the user asks to "search logs", "check errors", "tail logs", "show recent logs", "find log entries", "correlate events", "list hosts", "log stats", "syslog", "check homelab logs", or mentions system logs, syslog, log analysis, or log intelligence across homelab hosts.
Re-run the cortex plugin setup hook with the current userConfig and verify the Docker Compose deployment. Use when the user asks to redeploy cortex, apply plugin config changes immediately, rerun the setup hook, refresh the Docker deployment, or recover after an automated SessionStart/ConfigChange hook did not run.
Deploy rsyslog forwarding drop-ins to configured fleet hosts over SSH. Use when configuring fleet forwarding, repairing missing rsyslog forwarding, or updating forwarding after server_url or syslog port changes.
Run a comprehensive cortex health check covering environment, config quality, storage, ports, service status, HTTP health, MCP actions, listener reachability, Docker ingest, and fleet rsyslog forwarding. Use when the user asks for syslog doctor, deployment diagnostics, first-run preflight, health check, sanity check, or broad deployment verification.
This skill should be used after running cortex action=abuse_investigate to analyze the resulting evidence bundle. Use when the user asks to assess frustration incidents, evaluate abuse signals, analyze agent or user friction, produce a frustration report, or follow up on abuse_investigate results.