| name | analyze-local |
| description | Use this skill when a local container won't start, a service is unreachable from the host, a local docker-compose stack is misbehaving, or as the Docker-layer diagnosis step of a local bugfix flow — including when the user describes the symptom without naming Docker — to run a Docker-specific local diagnostic that collects container status, logs, networking, and resource usage and diagnoses issues, applying the SRE or DevOps role for investigation. For multi-scope environment analysis (Docker + Kubernetes + CI runner + drift snapshot + optional auto-fix) use `/env-analyze` instead. |
| context | fork |
| argument-hint | container name, symptom, or issue description |
| allowed-tools | Read Grep Glob Bash |
Analyze Local Docker Environment
Systematic analysis of the local Docker environment. Collects container status, logs, networking, resource usage, and diagnoses issues. Works standalone or as an entry point for local bugfixing.
0. Gather Context
Read CLAUDE.md (or AGENTS.md) at the project root to identify expected services, tech stack, and local development setup (docker-compose files, service dependencies).
1. Clarify the Goal
Ask the user:
- What is the problem or question? (e.g., "container won't start", "service unreachable", "high memory usage", "just want a health check")
- Which services are affected? (specific container names, or "all")
- When did the issue start? (after a code change, config update, restart, or unknown)
If invoked as part of a bugfix flow — extract the problem statement from the parent context instead of asking.
2. Apply Appropriate Role
Select and apply the role based on the problem type:
| Problem Type | Primary Role | Rationale |
|---|
| Container crashes, restarts, health checks | Agent(sre-engineer) | Reliability, observability, troubleshooting |
| Networking, DNS, port conflicts, connectivity | Agent(sre-engineer) | K8s/Docker networking diagnostics |
| Dockerfile, image builds, compose config | Agent(devops-engineer) | Container orchestration, Docker expertise |
| CI/CD pipeline failures in local env | Agent(devops-engineer) | Build and deploy pipeline expertise |
| Resource exhaustion (CPU, memory, disk) | Agent(sre-engineer) | Capacity, resource management |
| Application errors visible in logs | Stack-specific role | Agent(java-engineer), Agent(python-engineer), Agent(frontend-engineer) based on service stack |
| General / unclear | Agent(software-engineer) | Broad debugging methodology |
Announce the applied role to the user. If multiple problem types are present, apply multiple roles.
3. Collect Environment Snapshot
Run the following diagnostic commands to gather the current state. Present results as a structured summary.
3a. Docker Daemon and System
// turbo
docker version
docker info --format '{{.OperatingSystem}} | Containers: {{.Containers}} (Running: {{.ContainersRunning}}, Stopped: {{.ContainersStopped}}) | Images: {{.Images}}'
docker system df
Record: Docker version, OS, total containers, disk usage.
3b. Container Status
// turbo
docker ps -a --format "table {{.ID}}\t{{.Names}}\t{{.Image}}\t{{.Status}}\t{{.Ports}}\t{{.State}}"
Record: For each container — name, image, status (Up/Exited/Restarting), ports, uptime.
Flag issues:
- Containers in
Exited or Restarting state
- Containers with
unhealthy health status
- Missing containers that should be running (ask user for expected services)
3c. Docker Compose (if applicable)
If a docker-compose.yml or compose.yaml is present in the project:
// turbo
docker compose ps -a
docker compose config --services
Record: Compose project name, service list, which services are up/down.
3d. Logs for Problematic Containers
For each container flagged in 3b (or the user-specified service):
docker logs --tail 100 --timestamps <container_name>
Record: Last 100 lines of logs. Look for:
- Error messages, stack traces, exceptions
- Connection refused / timeout errors
- OOM killed signals
- Configuration errors (missing env vars, wrong paths)
3e. Networking
// turbo
docker network ls
docker network inspect <network_name>
For connectivity issues:
docker exec <container> ping -c 2 <target_host>
docker exec <container> nslookup <service_name>
docker port <container>
Record: Networks, container IP assignments, port mappings, DNS resolution.
3f. Resource Usage
// turbo
docker stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.MemPerc}}\t{{.NetIO}}\t{{.BlockIO}}"
Record: CPU %, memory usage/limit, network I/O, block I/O per container.
Flag issues:
- Memory usage > 80% of limit
- CPU consistently > 90%
- Containers without memory limits set
3g. Volumes and Mounts
// turbo
docker volume ls
docker inspect --format '{{range .Mounts}}{{.Type}}: {{.Source}} -> {{.Destination}} ({{.Mode}}){{"\n"}}{{end}}' <container_name>
Record: Volume mounts, bind mounts, permissions (ro/rw).
3h. Health and Local Telemetry
Even on a single Docker host, name the methodology applied — this matches the production approach and surfaces gaps:
- USE Method (Brendan Gregg) — for
docker stats reads: Utilization (CPU%, MemPerc), Saturation (memory at limit, swap usage, blocked I/O), Errors (restart count, OOMKilled flag). Reference.
- RED Method (Tom Wilkie) — for any container exposing HTTP: Rate, Errors, Duration. Apply it locally if the stack mirrors prod (Prometheus exporters, OTel collector). Reference.
- For Golden Signals, RED, USE deep-dive and full method/problem matrix → see
analyze-prod skill, Step 4h.
Docker-specific telemetry commands beyond Step 3:
// turbo
docker stats --no-stream
docker compose logs -f --since 10m
docker inspect --format='{{json .State.Health}}' <container>
docker inspect --format='{{.State.OOMKilled}} {{.State.ExitCode}} {{.State.RestartCount}}' <container>
If the local stack mirrors prod observability (Promtail/Loki + Grafana, Prometheus + cadvisor, Jaeger/Tempo via OTel collector) — query those directly using the same patterns documented in analyze-prod Step 4i.
4. Analyze Findings
Using the applied role's expertise, analyze the collected data:
- Correlate: Match the user's problem statement with the diagnostic data
- Identify root cause: Use the applied role's debugging methodology
- Check common causes (in order of likelihood):
<common_issues>
- Container won't start: Missing env vars, wrong image tag, port conflict, entrypoint error, missing volume mount
- Service unreachable: Wrong port mapping, network mismatch, DNS not resolving, firewall/security group, service not listening on 0.0.0.0
- Container restarts repeatedly: OOM killed (check
docker inspect for OOMKilled), health check failing, application crash loop, dependency not ready
- Slow performance: Resource limits too low, no memory limit (swapping), disk I/O bottleneck, too many containers for available resources
- Build failures: Dockerfile syntax, missing build context files, base image unavailable, layer cache invalidation
- Volume/data issues: Permission denied (user mismatch), stale volume data, bind mount path wrong on host
</common_issues>
5. Present Diagnosis
Structure the diagnosis as:
## Environment Summary
- Docker: [version], [OS]
- Containers: [running]/[total] | Compose: [yes/no]
- Disk usage: [used/available]
## Findings
### [Issue 1: title]
- **Symptom**: what was observed
- **Evidence**: specific log lines, metrics, or status
- **Root cause**: why it's happening
- **Severity**: critical / warning / info
### [Issue 2: title]
...
## Recommendations
1. [Fix for issue 1] — [command or config change]
2. [Fix for issue 2] — [command or config change]
...
## Environment Health: [HEALTHY | DEGRADED | CRITICAL]
6. Fix or Escalate
Based on the diagnosis:
- If fix is straightforward (restart, config change, env var): Propose the fix and apply after user approval
- If fix requires code changes: Transition to the appropriate stack-specific role and apply the bugfix following the role's debugging methodology
- If fix requires infrastructure changes (Dockerfile, compose, networking): Apply with
Agent(devops-engineer) patterns
- If root cause is unclear: Propose additional diagnostic steps (increase log verbosity, attach to container, profile resource usage)
After applying any fix, re-run the relevant diagnostic commands from Step 3 to verify the fix resolved the issue.
7. Summary
Present the completed analysis:
- Problem: original user report
- Role(s) applied: which roles were used
- Root cause: what was found
- Fix applied: what was changed (or "no fix needed — environment is healthy")
- Verification: confirmation that the issue is resolved
- Prevention: recommendations to avoid recurrence (e.g., add health checks, set resource limits, pin image versions)
Integration
- Called by:
/bugfix (environment diagnostics step)
- Roles:
Agent(devops-engineer), Agent(sre-engineer)