Execute qualquer Skill no Manus
com um clique
com um clique
Execute qualquer Skill no Manus com um clique
Começartroubleshooting
OpenClaw 问题排查与诊断
Estrelas45
Forks12
Atualizado3 de março de 2026 às 03:28
SKILL.md
readonlyMenu
OpenClaw 问题排查与诊断
OpenClaw 配置管理
OpenClaw 安装与部署
OpenClaw 运维与故障修复
备份策略设计
云备份方案
灾难恢复
| name | troubleshooting |
| description | OpenClaw 问题排查与诊断 |
| version | 1.0.0 |
| author | terminal-skills |
| tags | ["openclaw","troubleshooting","debug","diagnosis","monitoring"] |
OpenClaw 常见问题的排查思路、诊断命令和解决方案。
# API 健康检查
curl -s http://localhost:8080/api/health | jq .
# 详细健康信息
curl -s http://localhost:8080/api/health/detail | jq .
# 组件状态
curl -s http://localhost:8080/api/status | jq .
# 返回示例
# {
# "status": "UP",
# "components": {
# "db": { "status": "UP" },
# "redis": { "status": "UP" },
# "scheduler": { "status": "UP" },
# "workers": { "status": "UP", "count": 3 }
# }
# }
# 集群信息
curl -s http://localhost:8080/api/cluster/info | jq .
# 节点列表
curl -s http://localhost:8080/api/cluster/nodes | jq .
# Worker 状态
curl -s http://localhost:8080/api/workers | jq .
# 检查 Worker 在线状态
curl -s http://localhost:8080/api/workers | jq '.[] | select(.status == "ONLINE")'
# Server 日志
tail -f /opt/openclaw/logs/openclaw-server.log
tail -f /opt/openclaw/logs/error.log
# Worker 日志
tail -f /opt/openclaw/logs/openclaw-worker.log
# 任务执行日志
ls -la /opt/openclaw/logs/tasks/
# Docker 方式查看
docker logs -f openclaw-server
docker logs -f openclaw-worker --tail 100
# 搜索错误日志
grep -i "error\|exception\|failed" /opt/openclaw/logs/openclaw-server.log
# 搜索特定任务日志
grep "taskId=12345" /opt/openclaw/logs/openclaw-server.log
# 搜索特定时间段
grep "2024-01-15 10:" /opt/openclaw/logs/openclaw-server.log
# 统计错误类型
grep -oP 'Exception: \K[^:]+' /opt/openclaw/logs/error.log | sort | uniq -c | sort -rn
# 实时监控错误
tail -f /opt/openclaw/logs/openclaw-server.log | grep -i --color "error\|exception"
# 运行时调整日志级别
curl -X POST http://localhost:8080/api/admin/logging/level \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${TOKEN}" \
-d '{"logger": "com.openclaw.scheduler", "level": "DEBUG"}'
# 查看当前日志级别
curl http://localhost:8080/api/admin/logging/level \
-H "Authorization: Bearer ${TOKEN}"
# 测试数据库连接
mysql -h localhost -u openclaw -p -e "SELECT 1"
# 检查连接数
mysql -u root -p -e "SHOW STATUS LIKE 'Threads_connected'"
mysql -u root -p -e "SHOW PROCESSLIST"
# 检查连接池状态
curl -s http://localhost:8080/api/admin/datasource/pool | jq .
# 检查慢查询
mysql -u root -p -e "SHOW FULL PROCESSLIST" | grep -v Sleep
# 慢查询日志
tail -f /var/log/mysql/slow.log
# 检查锁等待
mysql -u root -p -e "
SELECT * FROM information_schema.INNODB_LOCK_WAITS;
SELECT * FROM information_schema.INNODB_LOCKS;
"
# 检查表状态
mysql -u root -p openclaw -e "SHOW TABLE STATUS"
# 分析查询
mysql -u root -p openclaw -e "EXPLAIN SELECT * FROM task WHERE status = 'RUNNING'"
# 检查任务状态统计
mysql -u root -p openclaw -e "
SELECT status, COUNT(*) as count
FROM task
GROUP BY status
"
# 检查孤儿任务 (Worker 已下线)
mysql -u root -p openclaw -e "
SELECT t.* FROM task t
LEFT JOIN worker w ON t.worker_id = w.id
WHERE t.status = 'RUNNING' AND (w.id IS NULL OR w.status != 'ONLINE')
"
# 测试 Redis 连接
redis-cli -h localhost -p 6379 ping
# 检查连接信息
redis-cli -h localhost -p 6379 info clients
# 检查内存使用
redis-cli -h localhost -p 6379 info memory
# 检查慢日志
redis-cli -h localhost -p 6379 slowlog get 10
# 查看 OpenClaw 相关 key
redis-cli -h localhost -p 6379 keys "openclaw:*"
# 检查特定 key
redis-cli -h localhost -p 6379 get "openclaw:task:12345"
redis-cli -h localhost -p 6379 hgetall "openclaw:worker:worker-1"
# 检查队列长度
redis-cli -h localhost -p 6379 llen "openclaw:task:queue:default"
# 内存分析
redis-cli -h localhost -p 6379 memory usage "openclaw:task:12345"
redis-cli -h localhost -p 6379 debug object "openclaw:task:12345"
# 查询任务详情
curl -s http://localhost:8080/api/tasks/12345 | jq .
# 查询任务执行记录
curl -s http://localhost:8080/api/tasks/12345/executions | jq .
# 查询失败任务
curl -s "http://localhost:8080/api/tasks?status=FAILED&limit=10" | jq .
# 查询超时任务
curl -s "http://localhost:8080/api/tasks?status=TIMEOUT&limit=10" | jq .
# 获取任务执行日志
curl -s http://localhost:8080/api/tasks/12345/log | jq .
# 查看本地任务日志
cat /opt/openclaw/logs/tasks/12345.log
# 实时查看正在执行的任务日志
tail -f /opt/openclaw/logs/tasks/12345.log
# 检查原因
# 1. 没有可用的 Worker
curl -s http://localhost:8080/api/workers | jq '.[] | select(.status == "ONLINE")'
# 2. Worker 负载已满
curl -s http://localhost:8080/api/workers | jq '.[] | {name, runningTasks, maxTasks}'
# 3. 任务队列阻塞
curl -s http://localhost:8080/api/admin/queue/status | jq .
# 4. 任务组没有匹配的 Worker
curl -s http://localhost:8080/api/tasks/12345 | jq '.group'
curl -s http://localhost:8080/api/workers | jq '.[] | select(.group == "specific-group")'
# 查看任务超时配置
curl -s http://localhost:8080/api/tasks/12345 | jq '.timeout'
# 检查任务实际执行时间
curl -s http://localhost:8080/api/tasks/12345/executions | jq '.[] | {startTime, endTime, duration}'
# 增加超时时间(如需要)
curl -X PUT http://localhost:8080/api/tasks/12345 \
-H "Content-Type: application/json" \
-d '{"timeout": 7200000}'
# 查看失败原因
curl -s http://localhost:8080/api/tasks/12345/executions | jq '.[-1] | {status, errorMessage, errorStack}'
# 查看重试记录
curl -s http://localhost:8080/api/tasks/12345/executions | jq '. | length'
# 手动重试任务
curl -X POST http://localhost:8080/api/tasks/12345/retry
# 检查 Worker 日志
tail -f /opt/openclaw/logs/openclaw-worker.log | grep -i "register\|connect"
# 检查网络连通性
telnet openclaw-server 9090
nc -zv openclaw-server 9090
# 检查 gRPC 服务
grpcurl -plaintext localhost:9090 list
# 检查 Server 端日志
grep "worker" /opt/openclaw/logs/openclaw-server.log | tail -50
# 检查心跳配置
grep "heartbeat" /opt/openclaw/conf/worker.yml
# 检查网络延迟
ping -c 10 openclaw-server
# 检查系统资源
top -p $(pgrep -f openclaw-worker)
free -h
df -h
# 检查 JVM 状态
jstat -gc $(pgrep -f openclaw-worker)
# 查看各 Worker 负载
curl -s http://localhost:8080/api/workers | jq '.[] | {name, runningTasks, completedTasks}'
# 检查调度策略
curl -s http://localhost:8080/api/admin/config | jq '.scheduler.strategy'
# 检查 Worker 权重配置
curl -s http://localhost:8080/api/workers | jq '.[] | {name, weight}'
# 查看 Java 进程 CPU
top -H -p $(pgrep -f openclaw-server)
# 线程 dump
jstack $(pgrep -f openclaw-server) > thread_dump.txt
# 分析热点线程
jstack $(pgrep -f openclaw-server) | grep -A 30 "RUNNABLE"
# 查看线程池状态
curl -s http://localhost:8080/api/admin/threadpool | jq .
# JVM 内存使用
jstat -gc $(pgrep -f openclaw-server) 1000
# Heap dump
jmap -dump:format=b,file=heapdump.hprof $(pgrep -f openclaw-server)
# 检查内存泄漏
jmap -histo $(pgrep -f openclaw-server) | head -30
# GC 日志分析
grep "GC" /opt/openclaw/logs/gc.log | tail -50
# 检查连接数
ss -s
netstat -an | grep 8080 | wc -l
# 检查 TIME_WAIT
netstat -an | grep TIME_WAIT | wc -l
# 检查网络延迟
curl -o /dev/null -s -w "Connect: %{time_connect}s\nTTFB: %{time_starttransfer}s\nTotal: %{time_total}s\n" http://localhost:8080/api/health
# 查看当前主节点
curl -s http://localhost:8080/api/cluster/leader | jq .
# 检查选举日志
grep "leader\|election" /opt/openclaw/logs/openclaw-server.log
# 强制重新选举(谨慎使用)
curl -X POST http://localhost:8080/api/admin/cluster/reelect \
-H "Authorization: Bearer ${TOKEN}"
# 检查节点同步状态
curl -s http://localhost:8080/api/cluster/sync/status | jq .
# 手动触发同步
curl -X POST http://localhost:8080/api/admin/cluster/sync \
-H "Authorization: Bearer ${TOKEN}"
# 运行诊断
curl -s http://localhost:8080/api/admin/diagnose | jq .
# 生成诊断报告
curl -s http://localhost:8080/api/admin/diagnose/report > diagnose_report.json
# 导出系统信息
curl -s http://localhost:8080/api/admin/system/info > system_info.json
# Prometheus 指标
curl -s http://localhost:8080/actuator/prometheus
# 关键指标
curl -s http://localhost:8080/actuator/metrics/openclaw.task.completed
curl -s http://localhost:8080/actuator/metrics/openclaw.task.failed
curl -s http://localhost:8080/actuator/metrics/openclaw.worker.active
| 问题现象 | 可能原因 | 排查命令 | 解决方案 |
|---|---|---|---|
| 服务无法启动 | 端口占用/配置错误 | netstat -tlnp, 查看启动日志 | 检查端口,修正配置 |
| 数据库连接失败 | 配置错误/网络问题 | mysql -h host -u user -p | 检查配置,网络 |
| Redis 连接失败 | 配置错误/服务未启动 | redis-cli ping | 检查配置,启动 Redis |
| Worker 注册失败 | 网络不通/配置错误 | telnet server 9090 | 检查网络,配置 |
| 任务执行失败 | 脚本错误/超时 | 查看任务日志 | 修复脚本,调整超时 |
| 任务堆积 | Worker 不足/阻塞 | 查看队列状态 | 扩容 Worker |
| 内存溢出 | 内存配置过小 | jstat -gc, jmap | 增加内存配置 |
| 响应慢 | 数据库慢查询/GC | 慢查询日志,GC 日志 | SQL 优化,JVM 调优 |
#!/bin/bash
# collect_diagnostic.sh - 收集诊断信息
DIAG_DIR="/tmp/openclaw_diag_$(date +%Y%m%d_%H%M%S)"
mkdir -p $DIAG_DIR
# 系统信息
uname -a > $DIAG_DIR/system_info.txt
free -h >> $DIAG_DIR/system_info.txt
df -h >> $DIAG_DIR/system_info.txt
# 服务状态
curl -s http://localhost:8080/api/health > $DIAG_DIR/health.json
curl -s http://localhost:8080/api/cluster/info > $DIAG_DIR/cluster.json
# 日志
tail -1000 /opt/openclaw/logs/openclaw-server.log > $DIAG_DIR/server.log
tail -1000 /opt/openclaw/logs/error.log > $DIAG_DIR/error.log
# 线程 dump
jstack $(pgrep -f openclaw-server) > $DIAG_DIR/thread_dump.txt 2>/dev/null
# 打包
tar -czf $DIAG_DIR.tar.gz -C /tmp $(basename $DIAG_DIR)
echo "诊断信息已保存: $DIAG_DIR.tar.gz"