| name | debug-addon |
| description | Diagnose KubeBlocks addon database failures from user-provided logs, YAML, debug bundles, local/Git source trees, or live read-only Kubernetes evidence. Use when Codex is asked to investigate KubeBlocks addon installation, Helm rendering, Cluster/Component reconciliation, InstanceSet/Pod/PVC/Service failures, database startup logs, config/script/template issues, backup/restore jobs, custom OpsDefinitions, KubeBlocks 1.0 behavior, or to collect a narrow read-only kubectl evidence bundle. |
Debug Addon
Investigate KubeBlocks addon incidents progressively: identify the boundary of evidence, collect or index only what is needed, classify the failure layer, then report a sourced diagnosis. Write the final report in Chinese unless the user asks for another language.
Non-Negotiable Rules
- Keep Kubernetes access read-only. Do not run
kubectl exec, attach, cp, port-forward, apply, patch, delete, replace, scale, rollout commands, debug pods, or database mutation commands.
- Start narrow. Use the user symptom, bundle summary, events, status conditions, recent log tails, source-map summaries, and targeted searches before loading full files.
- Preserve provenance. Tie every important claim to a user artifact, path, command output, Kubernetes resource, log line, or source location.
- Separate fact from hypothesis. State confidence as high, medium, or low, and give the next validation step.
- Ask for only the single highest-impact missing artifact when evidence is insufficient.
Bundled Resources
scripts/kb-agent.sh: portable wrapper for the Go helper CLI. It prefers $KB_ADDON_DEBUG_AGENT_BIN, then kb-addon-debug-agent on PATH, then the bundled source under tools/kb-addon-debug-agent, with $KB_ADDON_DEBUG_AGENT_REPO only as an explicit development override.
references/bundle-layout.md: supported bundle/archive inputs and progressive artifact review.
references/kubernetes-readonly-policy.md: allowed/disallowed live Kubernetes commands and confirmation rules.
references/investigation-checklist.md: layer-by-layer investigation checklist.
references/failure-taxonomy.md: symptom classes and targeted search terms.
references/addons/kubeblocks-addon-model.md: addon architecture, chart/source layout, KubeBlocks definition resources, scripts, parameters, backup/restore, and ops resources.
references/addons/addon-debugging-guide.md: concrete addon incident playbook and command patterns.
references/kubeblocks/kubeblocks-architecture-0.9.md: KubeBlocks 0.9 controller/API behavior. Read only when version evidence indicates 0.9 or 0.9.x.
references/kubeblocks/kubeblocks-architecture-1.0.md: KubeBlocks 1.0 controller/API behavior. Read only when version evidence indicates 1.0 or 1.0.x.
references/kubeblocks/kubeblocks-architecture-1.1.md: KubeBlocks 1.1 controller/API behavior. Read only when version evidence indicates 1.1 or 1.1.x.
Entry Modes
Choose exactly one mode first:
- Provided-evidence mode: use when the user supplied logs, YAML, bundles, local directories, archives, source paths, screenshots converted to text, or says cluster access is unavailable.
- Live-cluster mode: use when the user explicitly asks Codex to inspect a cluster, provides kubeconfig/kubectl context, or gives only a symptom plus namespace/resource context.
Capture or infer the failing operation, addon/database engine, namespace, Cluster/Component/Pod/Backup/Restore names, time window, recent change, KubeBlocks version, addon chart/source version, and whether live cluster access is available.
Provided-Evidence Workflow
-
If there is a bundle, archive, or evidence directory, index it:
skills/debug-addon/scripts/kb-agent.sh bundle index --input <bundle-or-dir> --output <workdir>/bundle-index
-
Read <workdir>/bundle-index/summary.md, then search for exact errors, resource names, pod names, condition reasons, and timestamps:
skills/debug-addon/scripts/kb-agent.sh bundle search --index <workdir>/bundle-index --query "<error-or-resource>" --format markdown
-
Read references/bundle-layout.md when the artifact shape is unclear or the bundle is large.
-
Before reading source, identify the failing addon and release family from evidence. Prefer Addon CR names/status, Helm release names like kb-addon-<addon>, chart labels/annotations, Chart.yaml, ComponentDefinition/ComponentVersion names, Cluster clusterDef, pod labels, install Job logs, or user-provided addon/version notes.
-
Treat KubeBlocks source as low priority. Use references/kubeblocks/*.md first for controller/API architecture. Read KubeBlocks source only when addon evidence and references are insufficient and the evidence points to KubeBlocks controller internals, API schema mismatch, or a likely KubeBlocks regression.
-
Map only the relevant addon source. Do not map a whole addon monorepo as the analysis target. Use one of:
skills/debug-addon/scripts/kb-agent.sh source map --addon-name <addon> --addon-version <0.9|1.0|1.1> --output <workdir>/source-map
skills/debug-addon/scripts/kb-agent.sh source map --addon <addon-dir-or-repo> --addon-name <addon> --addon-version <0.9|1.0|1.1> --output <workdir>/source-map
The helper tries https://github.com/apecloud/kubeblocks-addons.git and https://github.com/apecloud/apecloud-addons.git on the matching release-<major>.<minor> branch, then falls back to local search roots. If Git/network access fails, provide --local-search <repo-root> or --addon <local-addon-dir-or-repo>.
-
Pass --kubeblocks <path-or-git-url> only for the low-priority KubeBlocks-source cases described above.
-
Read <workdir>/source-map/source-map.md first. Inspect only files from the selected addon path tagged as addon manifests, templates, scripts, config, backup/restore, ops, or files matching the active symptom.
-
If no structured bundle exists, build the evidence chain from the user text, then ask for the single artifact most likely to confirm or disprove the leading hypothesis.
Live-Cluster Workflow
-
Read references/kubernetes-readonly-policy.md before any live collection.
-
Derive the narrowest namespace and selector from the user input. Prefer Cluster name, Component name, pod name, job name, Backup/Restore name, or KubeBlocks labels over broad namespace collection.
-
Generate the collection plan first:
skills/debug-addon/scripts/kb-agent.sh k8s collect --namespace <ns> --selector <selector> --pod <pod> --job <job> --output <workdir>/k8s --dry-run
Omit unknown scope flags.
-
If the user explicitly requested live investigation and the scope is narrow, execute the read-only collection with --confirm using the same scope flags. If no selector/resource scope exists, ask before broad collection; after approval use --confirm --confirm-broad.
-
Read <workdir>/k8s/collection-results.md, then index the collected directory and continue as provided-evidence mode:
skills/debug-addon/scripts/kb-agent.sh bundle index --input <workdir>/k8s --output <workdir>/bundle-index
-
If live collection fails because kubeconfig, permissions, or connectivity are unavailable, fall back to provided-evidence mode and request the minimal missing artifact.
Analysis Flow
- For addon architecture, source layout, Helm rendering, definitions, scripts, parameters, backup/restore, or custom ops, read
references/addons/kubeblocks-addon-model.md.
- For a concrete addon incident, read
references/addons/addon-debugging-guide.md after the addon model and before loading large source files or logs.
- Identify addon name and release family before source mapping. If the release family is missing, infer it from installed KubeBlocks/addon version evidence or ask for that one missing detail.
- If KubeBlocks version evidence is 0.9, 1.0, 1.1, or a matching patch release, read the corresponding
references/kubeblocks/kubeblocks-architecture-<major>.<minor>.md before analyzing controller/API behavior. Version evidence can come from chart/appVersion, controller image tags, kbcli version, bundle metadata, source branches, or installed CRD versions.
- Use
references/failure-taxonomy.md to classify the symptom: install/render, reconciliation, scheduling/images, probes/startup, config render, backup/restore, replication/failover, storage, network/auth.
- Use
references/investigation-checklist.md to avoid jumping from a single log line to a root cause.
- For dataprotection Backup/Restore failures, always inspect the owning Backup/Restore, Job template, selected target pod, BackupPolicy, BackupRepo, pod
nodeSelector, and current node labels. If the action is node-pinned, compare kubernetes.io/hostname in the pod/job template with node labels.
Report Format
Use this structure unless the user asks for something else:
## 结论
## 影响范围
## 证据链
## 可能根因
## 置信度
## 下一步验证命令
## 修复建议
## 未决问题
Prefer concrete commands, file paths, resource names, and timestamped evidence over broad advice.