name	troubleshoot
description	Investigate runtime or production issues from logs, errors, or unexpected behavior (Marcus + Alex workflow)
disable-model-invocation	true

Troubleshoot: $ARGUMENTS

Unlike /fix-issue (code-level bugs), /troubleshoot investigates runtime/production problems from symptoms.

Step 1 — Gather Symptoms

Identify what's happening:

What errors are users seeing? (status codes, error messages)
When did it start? (after a deploy, config change, or spontaneously?)
Is it affecting all requests or specific endpoints?
Is the health endpoint responding? What status?

Step 2 — Check Infrastructure

# Is the process running? Hit the health endpoint (see project config for URL)
# Check external resource connectivity via device/info endpoint
# Check recent logs — look for ERROR and WARNING entries

Common infrastructure issues:

See the Troubleshooting table in the project config for common symptoms, likely causes, and checks.

Step 3 — Check Configuration

Read the config file and verify env vars are set correctly. See the Environment Variables table in the project config for all settings and their defaults. Common misconfigurations:

Mock mode accidentally enabled in production
Resource identifiers don't match actual hardware
API key not set when it should be
Rate limit too restrictive

Step 4 — Check Application Logs

Look for patterns in log output:

Audit logger entries — are state changes being logged?
WARNING level — resource disconnection, degraded mode
ERROR level — communication failures, unhandled exceptions
Startup messages — did all components initialize?

Step 5 — Reproduce Locally

# Start in mock/dev mode to isolate hardware vs software issues (see dev run command in project config)
# Hit the failing endpoint

If it works in mock mode → hardware/external resource issue
If it fails in mock mode → software bug → switch to /fix-issue

Step 6 — Resolution

If config issue → fix env vars, restart
If hardware/resource issue → check connection, permissions, drivers
If software bug → use /fix-issue to fix the code
If performance issue → use /optimize to profile and fix

Step 7 — Document

Log the incident:

What happened, when, how long it lasted
Root cause identified
Fix applied
Prevention steps (monitoring, alerting, tests)

Más de este repositorio

mismo repositorio

add-device

g-zenr/relay-api

Add a new implementation of the primary protocol/interface (Alex Rivera's workflow)

2026-02-130

add-endpoint

g-zenr/relay-api

Add a new API endpoint following all project standards

2026-02-130

add-feature

g-zenr/relay-api

Plan and implement a complete feature end-to-end across all layers

2026-02-130

audit

g-zenr/relay-api

Full codebase audit — dead code, layer violations, concurrency, observability, code quality

2026-02-130

ci-cd

g-zenr/relay-api

Set up or update GitHub Actions CI/CD pipeline (Marcus Chen's workflow)

2026-02-130

cleanup

g-zenr/relay-api

Remove dead code, unused imports, stale files, and fix code quality issues found by /audit

2026-02-130

# Is the process running? Hit the health endpoint (see project config for URL) # Check external resource connectivity via device/info endpoint # Check recent logs — look for ERROR and WARNING entries