بنقرة واحدة
debug-prod
Investigate production issues using logs, database, and Identity Platform. Read-only by default.
التثبيت باستخدام Codex أو Claude انسخ هذا Prompt والصقه في Codex أو Claude أو مساعد آخر ليراجع صفحة Skill ويثبّتها لك.
القائمة
Investigate production issues using logs, database, and Identity Platform. Read-only by default.
التثبيت باستخدام Codex أو Claude انسخ هذا Prompt والصقه في Codex أو Claude أو مساعد آخر ليراجع صفحة Skill ويثبّتها لك.
استنادا إلى تصنيف SOC المهني
Pure development workflow with test-first development and coverage review. Used by coordinator as a subagent. Never manages beads issues or commits.
Autonomous codebase cruft discovery. Scans for duplication, dead code, leaky abstractions, pattern divergence, and complexity. Files findings as beads issues. Invoked via /refactor-finder.
Single entry point for all implementation work. Triages tasks, manages beads issues, delegates to implementer skill, runs reviewers, creates PRs.
Resolves rebase conflicts by gathering full context from beads issues, git diffs, and surrounding code. Invoked by coordinator and merge-queue after a fast-path rebase fails.
Process open PRs — merge when CI passes, handle rebases, file issues for failures. Run in a dedicated window.
Collaboratively plan epics by exploring the codebase, discussing tradeoffs, filing issues, and running plan review. Invoked via /plan.
| name | debug-prod |
| description | Investigate production issues using logs, database, and Identity Platform. Read-only by default. |
| user_invocable | true |
Investigate production issues by querying logs, database state, and Identity Platform.
NEVER take destructive or mutating actions in production without explicit user approval. This includes:
kubectl apply, kubectl delete, or kubectl editRead-only operations are always safe. When you need to take a mutating action, describe what you want to do, why, and the expected impact — then wait for approval before executing.
If the user has described the issue, start investigating immediately. Do not ask clarifying questions unless the problem description is genuinely ambiguous.
| Resource | Value |
|---|---|
| GCP Project | eval-prod-485520 |
| GKE Cluster | eval-prod-gke (zone: us-east1-b) |
| Cloud SQL | eval-prod-db (private IP: 10.100.0.3) |
| K8s Namespace | default |
| Deployments | go-api, frontend, executor, centrifugo, redis |
| Domain | eval.delquillan.com |
Ensure GCP project is set and GKE credentials are available:
gcloud config set project eval-prod-485520
gcloud container clusters get-credentials eval-prod-gke --zone us-east1-b
Use the sections below based on the type of issue. Run multiple queries in parallel when possible.
Recent logs from a specific service:
# Live logs from go-api (most recent pod)
kubectl logs deployment/go-api --tail=100
# Logs from a specific time window (use Cloud Logging for historical)
gcloud logging read \
'resource.type="k8s_container" AND resource.labels.container_name="go-api" AND timestamp>="2026-01-01T00:00:00Z" AND timestamp<="2026-01-01T01:00:00Z"' \
--limit=100 --format=json
Filter for errors or specific paths:
# API errors (non-healthcheck)
gcloud logging read \
'resource.type="k8s_container" AND resource.labels.container_name="go-api" AND jsonPayload.status>=400 AND NOT jsonPayload.path="/readyz" AND NOT jsonPayload.path="/healthz"' \
--limit=50 --format=json --freshness=1h
# Specific API path
gcloud logging read \
'resource.type="k8s_container" AND resource.labels.container_name="go-api" AND jsonPayload.path:"/auth/accept-invite"' \
--limit=20 --format=json --freshness=1h
# Frontend logs
kubectl logs deployment/frontend --tail=100
# Executor logs
kubectl logs deployment/executor --tail=100
Parse structured log output:
The go-api emits JSON logs. Use python or jq to extract fields:
gcloud logging read '<FILTER>' --limit=50 --format=json > /tmp/logs.json
python3 -c "
import json
with open('/tmp/logs.json') as f:
entries = json.load(f)
for e in entries:
jp = e.get('jsonPayload', {})
path = jp.get('path', '')
if path in ('/readyz', '/healthz', '/metrics'):
continue
print(f'{e[\"timestamp\"]}: {jp.get(\"method\",\"\")} {path} status={jp.get(\"status\",\"\")} msg={jp.get(\"msg\",\"\")}')
"
# Pod status and restarts
kubectl get pods -n default
# Recent events (scheduling failures, OOM kills, etc.)
kubectl get events -n default --sort-by='.metadata.creationTimestamp' | tail -20
# Resource usage
kubectl top pods -n default
Starting the tunnel:
Use the provided proxy script, which creates a socat pod in GKE and port-forwards to localhost:
./scripts/db-proxy.sh # binds to localhost:5433
./scripts/db-proxy.sh 5434 # custom port
The script requires PGPASSWORD. Retrieve it via Terraform:
cd infrastructure/terraform/environments/prod
export PGPASSWORD=$(terraform output -raw cloudsql_database_password)
Or from the Kubernetes secret:
export PGPASSWORD=$(kubectl get secret app-secrets -o jsonpath='{.data.DATABASE_PASSWORD}' | base64 -d)
Connecting:
Always use the read-only reader user for debugging. Only use app if you need write access (which requires user approval).
# Read-only (preferred for debugging)
export PGPASSWORD=$(kubectl get secret app-secrets -o jsonpath='{.data.READER_DATABASE_PASSWORD}' | base64 -d)
psql "host=127.0.0.1 port=5433 dbname=eval user=reader sslmode=require"
# Read-write (only with user approval)
export PGPASSWORD=$(kubectl get secret app-secrets -o jsonpath='{.data.DATABASE_PASSWORD}' | base64 -d)
psql "host=127.0.0.1 port=5433 dbname=eval user=app sslmode=require"
Quick one-off queries (no tunnel needed):
For simple queries, use a temporary pod with the reader user:
# Get reader password
READER_PW=$(kubectl get secret app-secrets -o jsonpath='{.data.READER_DATABASE_PASSWORD}' | base64 -d)
kubectl run psql-tmp --image=postgres:15 --restart=Never --rm -i \
--env="PGPASSWORD=${READER_PW}" \
--command -- psql -h 10.100.0.3 -U reader -d eval --set=sslmode=require \
-c "SELECT ..."
Common diagnostic queries:
-- List users by role
SELECT id, email, role, external_id, namespace_id, created_at FROM users ORDER BY created_at;
-- Check invitations
SELECT id, email, target_role, namespace_id, created_at, consumed_at, revoked_at FROM invitations ORDER BY created_at DESC;
-- Active sessions
SELECT id, class_id, status, created_at FROM sessions WHERE status = 'active';
-- Recent errors or anomalies — check for orphaned references
SELECT u.id, u.email, u.external_id FROM users u
WHERE NOT EXISTS (SELECT 1 FROM namespaces n WHERE n.id = u.namespace_id)
AND u.namespace_id IS NOT NULL;
Look up a user by email:
curl -s -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "x-goog-user-project: eval-prod-485520" \
-H "Content-Type: application/json" \
-d '{"email": ["user@example.com"]}' \
"https://identitytoolkit.googleapis.com/v1/projects/eval-prod-485520/accounts:lookup"
Look up a user by Firebase UID:
curl -s -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "x-goog-user-project: eval-prod-485520" \
-H "Content-Type: application/json" \
-d '{"localId": ["<firebase-uid>"]}' \
"https://identitytoolkit.googleapis.com/v1/projects/eval-prod-485520/accounts:lookup"
An empty response (no users field) means the user does not exist in Identity Platform.
Check Identity Platform configuration:
curl -s -H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "x-goog-user-project: eval-prod-485520" \
"https://identitytoolkit.googleapis.com/v2/projects/eval-prod-485520/config"
Cross-reference DB and Firebase:
A common failure mode is DB/Firebase user mismatch — the user exists in one but not the other. Always check both sides:
external_idexternal_id (= Firebase UID) in Identity Platform/auth/me after Firebase sign-in# Deployment status
kubectl get deployments -n default
# ConfigMap values (non-secret)
kubectl get configmap app-config -o yaml
# Secret keys (list only, don't dump values unnecessarily)
kubectl get secret app-secrets -o jsonpath='{.data}' | python3 -c "import json,sys; [print(k) for k in json.loads(sys.stdin.read())]"
# Ingress / service endpoints
kubectl get ingress,svc -n default
After investigating, present:
Wait for user approval before executing any fix that mutates production state.
kubectl exec into production containers for ad-hoc operations