تشغيل أي مهارة في Manus بنقرة واحدة

debug-troubleshooting

النجوم٠

التفرعات٠

آخر تحديث١٢ فبراير ٢٠٢٦ في ١٥:١٤

Systematic debugging — reproduce, isolate, trace root cause, verify fix. Covers code path tracing, log analysis, binary search for regressions, and hypothesis-driven debugging. Use when: Something is broken and you need to find the root cause in code or configuration. The error is in application logic, a regression was introduced, or behavior doesn't match expectations. Don't use when: The issue is a pod not starting or crashing (use pod-troubleshooting), a Flux reconciliation failure (use flux-debugging), a CI pipeline failure (use ci-diagnosis), or a Ceph/storage issue (use storage-ops). Don't use for code review of proposed changes (use code-review). Outputs: Root cause analysis with specific file:line references, a proposed fix, and verification steps to confirm the fix works.

التثبيت

التثبيت باستخدام Codex أو Claude انسخ هذا Prompt والصقه في Codex أو Claude أو مساعد آخر ليراجع صفحة Skill ويثبّتها لك.

تشغيل في Manus

المصدر

rajsinghtech

rajsinghtech/openclaw-workspace

فتح مستودع GitHub عرض مستودعات المنشئ

تنزيل

تشغيل في Manus

المهن ذات الصلةSOC

استنادا إلى تصنيف SOC المهني

مطوّرو البرمجياتمهن الحاسوب والرياضيات·SOC 15-1252

SKILL.md

readonly

name	Debug Troubleshooting
description	Systematic debugging — reproduce, isolate, trace root cause, verify fix. Covers code path tracing, log analysis, binary search for regressions, and hypothesis-driven debugging. Use when: Something is broken and you need to find the root cause in code or configuration. The error is in application logic, a regression was introduced, or behavior doesn't match expectations. Don't use when: The issue is a pod not starting or crashing (use pod-troubleshooting), a Flux reconciliation failure (use flux-debugging), a CI pipeline failure (use ci-diagnosis), or a Ceph/storage issue (use storage-ops). Don't use for code review of proposed changes (use code-review). Outputs: Root cause analysis with specific file:line references, a proposed fix, and verification steps to confirm the fix works.
requires	["gh","git"]

Debug Troubleshooting

Routing

Use This Skill When

An error message needs to be traced to its source in code
Behavior changed after a commit and you need to find the regression
Application logic is producing wrong results
You need to understand a code path to propose a fix
Someone says "this used to work" or "it's returning the wrong thing"

Don't Use This Skill When

Pod is in CrashLoopBackOff or ImagePullBackOff → use pod-troubleshooting
Flux kustomization won't reconcile → use flux-debugging
CI build/push failed → use ci-diagnosis
You're reviewing a PR, not debugging live behavior → use code-review
Ceph is unhealthy or PVCs are stuck → use storage-ops
You need to understand cluster architecture → use cluster-context

Approach

Reproduce — Understand what's failing and under what conditions
Isolate — Narrow down to the specific component, file, or line
Root cause — Find the actual bug, not just the symptom
Fix — Propose a minimal, targeted fix
Verify — Explain how to confirm the fix works

Code Debugging

Read the error

# Get error from logs
kubectl logs -l app.kubernetes.io/name=openclaw -n openclaw -c openclaw --tail=100

# Get error from CI
gh run view <id> --repo rajsinghtech/<repo> --log-failed

Trace the code path

# Clone and search
git clone https://github.com/rajsinghtech/<repo>.git /tmp/debug
cd /tmp/debug

# Find where the error originates
grep -rn "error message text" .
grep -rn "function_name" .

# For regressions: binary search with git bisect
git bisect start
git bisect bad HEAD
git bisect good <last-known-good-commit>
# Test at each step...

Common patterns

Symptom	Likely Cause
`container "main" not found`	Wrong container name — use `openclaw`
`EBUSY: resource busy`	Atomic write on ConfigMap subPath mount
`manifest invalid`	Pushed via `docker push` instead of `skopeo`
`${VAR}` not resolved	Missing `$${}` escaping for Flux postBuild
`command not found`	Tool not in Dockerfile or wrong PATH

Infrastructure Debugging

Follow the chain: Flux source → Kustomization → Deployment → Pod → Container

# Flux source
flux get source git -A | grep openclaw

# Kustomization
flux get kustomization -A | grep openclaw

# Pod
kubectl get pods -n openclaw -o wide
kubectl describe pod -l app.kubernetes.io/name=openclaw -n openclaw

# Container logs
kubectl logs -l app.kubernetes.io/name=openclaw -n openclaw -c openclaw --tail=50

Root Cause Analysis Template

## Root Cause Analysis

**Issue:** <one-line description>
**Reported:** <how the issue was discovered>
**Impact:** <what's broken and for whom>

### Timeline
1. <event that triggered the issue>
2. <symptoms observed>
3. <investigation steps taken>

### Root Cause
<specific explanation — file, line, logic error>

### Fix
<minimal change needed — include diff or description>

### Verification
<steps to confirm the fix works>

### Prevention
<what would catch this earlier — test, lint rule, CI check>

Compaction Notes

For long debugging sessions:

mkdir -p /tmp/outputs before writing any artifacts
Write intermediate findings to /tmp/outputs/debug-notes.md as you go
Record hypotheses tested and eliminated — don't re-test after compaction
Commit the root cause analysis once found

Edge Cases

Intermittent failures: Check for race conditions, timing-dependent behavior, resource exhaustion
Works locally, fails in cluster: Check env vars, network policies, volume mounts, DNS resolution
Error only in logs, no user-visible symptom: Still investigate — silent errors become loud failures later

المزيد من هذا المستودع

نفس المستودع

code-review

rajsinghtech/openclaw-workspace

Structured PR review — security scan, correctness, consistency, style. Covers diff analysis, comment posting via gh, and priority-based finding reports. Use when: A PR needs review, someone asks for code feedback, or changes need security/correctness validation before merge. Also use for pre-commit review of your own changes. Don't use when: The issue is a runtime pod failure (use pod-troubleshooting), a Flux reconciliation error (use flux-debugging), or a CI build failure (use ci-diagnosis). Don't use for architecture-level design discussions (use architecture-design instead). Outputs: Review comment posted on the PR via `gh pr review`, or a structured findings report grouped by severity (Critical/High/Medium/Low).

2026-02-200

openspec-workflow

rajsinghtech/openclaw-workspace

Spec-driven development workflow — proposals, requirements, design docs, task breakdowns, and implementation using the OpenSpec framework. Use when: Starting a new feature or change that needs planning, someone says "I want to build X", creating proposals or specs, breaking down requirements into tasks, or transitioning from planning to implementation. Don't use when: Debugging or troubleshooting (use appropriate troubleshooting skill). Don't use for Kubernetes manifest changes (use pr-workflow). Don't use for reviewing existing code (use code-review). Outputs: OpenSpec change folder with proposal.md, specs/, design.md, and tasks.md. Implementation follows directly from tasks.md.

2026-02-200

session-review

rajsinghtech/openclaw-workspace

Analyze agent sessions for tool failures, retry patterns, knowledge gaps, context limits, and config drift. Use when: Running periodic session reviews (cron), investigating agent reliability issues, looking for recurring failure patterns, or identifying workspace improvements from real usage. This is the primary skill for Robert's review cron job. Don't use when: You're making changes to fix issues (use workspace-improvement for that). Don't use for live debugging of a current issue (use the appropriate troubleshooting skill). Don't use for code review of PRs (use code-review). Outputs: Session analysis report with categorized findings (tool failures, retries, knowledge gaps, config drift), severity ratings, and proposed fixes. Written to /tmp/outputs/session-review.md for handoff.

2026-02-200

cluster-context

rajsinghtech/openclaw-workspace

OpenClaw pod architecture, volumes, networking, secrets, and provider configuration reference. Use when: Debugging container, mount, networking, or credential issues. Also use when you need to understand pod structure, check which providers are configured, verify volume mounts, or inspect secrets configuration. Don't use when: Debugging pod crashes (use pod-troubleshooting). Don't use for Flux issues (use flux-debugging). Don't use for deploying changes (use gitops-deploy). This is a reference skill, not a diagnostic workflow. Outputs: Architecture reference information. No artifacts — this skill provides context for other skills to use.

2026-02-200

gitops-deploy

rajsinghtech/openclaw-workspace

End-to-end deployment workflow — commit, CI, Flux reconcile, pod restart, verify. Includes ConfigMap changes, Flux postBuild escaping, and SOPS secret management. Use when: You need to deploy changes to the OpenClaw pod — config updates, workspace changes, image rebuilds, or secret rotations. Also use when someone asks "how do I deploy this?" or "push this change live." Don't use when: You're debugging why a deployment failed (use flux-debugging or pod-troubleshooting). Don't use for changes to kubernetes-manifests repo (Dyson's pr-workflow handles that). Don't use for registry/image inspection (use zot-registry). Outputs: Deployed changes verified in the running pod. Confirmation includes CI status, Flux reconciliation state, pod status, and startup logs.

2026-02-200

openclaw-docs-lookup-morty

rajsinghtech/openclaw-workspace

Look up OpenClaw documentation via web_fetch for config validation and verification. Use when: You need to verify a config key, understand OpenClaw configuration options, or check documentation for Kubernetes-specific settings before making changes. Don't use when: The answer is already in CONFIG.md, AGENTS.md, TOOLS.md in your workspace.

2026-02-200

name	Debug Troubleshooting
description	Systematic debugging — reproduce, isolate, trace root cause, verify fix. Covers code path tracing, log analysis, binary search for regressions, and hypothesis-driven debugging. Use when: Something is broken and you need to find the root cause in code or configuration. The error is in application logic, a regression was introduced, or behavior doesn't match expectations. Don't use when: The issue is a pod not starting or crashing (use pod-troubleshooting), a Flux reconciliation failure (use flux-debugging), a CI pipeline failure (use ci-diagnosis), or a Ceph/storage issue (use storage-ops). Don't use for code review of proposed changes (use code-review). Outputs: Root cause analysis with specific file:line references, a proposed fix, and verification steps to confirm the fix works.
requires	["gh","git"]

Debug Troubleshooting

Routing

Use This Skill When

An error message needs to be traced to its source in code
Behavior changed after a commit and you need to find the regression
Application logic is producing wrong results
You need to understand a code path to propose a fix
Someone says "this used to work" or "it's returning the wrong thing"

Don't Use This Skill When

Pod is in CrashLoopBackOff or ImagePullBackOff → use pod-troubleshooting
Flux kustomization won't reconcile → use flux-debugging
CI build/push failed → use ci-diagnosis
You're reviewing a PR, not debugging live behavior → use code-review
Ceph is unhealthy or PVCs are stuck → use storage-ops
You need to understand cluster architecture → use cluster-context

Approach

Reproduce — Understand what's failing and under what conditions
Isolate — Narrow down to the specific component, file, or line
Root cause — Find the actual bug, not just the symptom
Fix — Propose a minimal, targeted fix
Verify — Explain how to confirm the fix works

Code Debugging

Read the error

# Get error from logs
kubectl logs -l app.kubernetes.io/name=openclaw -n openclaw -c openclaw --tail=100

# Get error from CI
gh run view <id> --repo rajsinghtech/<repo> --log-failed

Trace the code path

# Clone and search
git clone https://github.com/rajsinghtech/<repo>.git /tmp/debug
cd /tmp/debug

# Find where the error originates
grep -rn "error message text" .
grep -rn "function_name" .

# For regressions: binary search with git bisect
git bisect start
git bisect bad HEAD
git bisect good <last-known-good-commit>
# Test at each step...

Common patterns

Symptom	Likely Cause
`container "main" not found`	Wrong container name — use `openclaw`
`EBUSY: resource busy`	Atomic write on ConfigMap subPath mount
`manifest invalid`	Pushed via `docker push` instead of `skopeo`
`${VAR}` not resolved	Missing `$${}` escaping for Flux postBuild
`command not found`	Tool not in Dockerfile or wrong PATH

Infrastructure Debugging

Follow the chain: Flux source → Kustomization → Deployment → Pod → Container

# Flux source
flux get source git -A | grep openclaw

# Kustomization
flux get kustomization -A | grep openclaw

# Pod
kubectl get pods -n openclaw -o wide
kubectl describe pod -l app.kubernetes.io/name=openclaw -n openclaw

# Container logs
kubectl logs -l app.kubernetes.io/name=openclaw -n openclaw -c openclaw --tail=50

Root Cause Analysis Template

## Root Cause Analysis

**Issue:** <one-line description>
**Reported:** <how the issue was discovered>
**Impact:** <what's broken and for whom>

### Timeline
1. <event that triggered the issue>
2. <symptoms observed>
3. <investigation steps taken>

### Root Cause
<specific explanation — file, line, logic error>

### Fix
<minimal change needed — include diff or description>

### Verification
<steps to confirm the fix works>

### Prevention
<what would catch this earlier — test, lint rule, CI check>

Compaction Notes

For long debugging sessions:

mkdir -p /tmp/outputs before writing any artifacts
Write intermediate findings to /tmp/outputs/debug-notes.md as you go
Record hypotheses tested and eliminated — don't re-test after compaction
Commit the root cause analysis once found

Edge Cases

Intermittent failures: Check for race conditions, timing-dependent behavior, resource exhaustion
Works locally, fails in cluster: Check env vars, network policies, volume mounts, DNS resolution
Error only in logs, no user-visible symptom: Still investigate — silent errors become loud failures later