| name | agent-debugger |
| description | Use when a program crashes, a test fails, or code produces wrong results and reading the source isn't enough to see why. Lets you pause execution at any line and inspect the actual runtime state, variable values, types, call stacks, to find what went wrong. Can attach to running servers by PID — no restart or code changes needed. |
| allowed-tools | Bash(npx -y agent-debugger:*), Bash(agent-debugger:*) |
Agent Debugger
A debugger for AI agents. Set breakpoints, inspect state, evaluate expressions, test fixes in-place. Attach to running servers by PID — no restart, no code changes, no manual setup.
Philosophy
The debugger is a scalpel, not a flashlight. You don't turn it on to look around. You turn it on to make one precise cut — confirm or kill a specific hypothesis about why the program is broken. If you're "exploring" in the debugger, you've already lost.
Every session starts before the debugger. Read the code. Read the traceback. Form a theory. Know exactly what breakpoint you'll set and what eval you'll run before you type a single command. The debugger is the experiment, not the investigation.
eval is the only command that matters. vars, step, stack, source — these are all setup. The eval is the actual experiment. It's where you test your hypothesis against reality. Everything else is scaffolding to get you to the right eval at the right moment.
Half of all bugs don't need a debugger. Read the traceback. Read the code. Check the types. Grep for the error message. Look at git blame. Most bugs surrender to careful reading. Reach for the debugger only when the bug depends on runtime state you can't determine statically.
The Rules
-
Read first, debug second. Never start a debug session without reading the relevant code and forming a hypothesis. The debugger confirms theories — it doesn't generate them.
-
One breakpoint, one question. Each breakpoint should answer a specific question. "Is x a string here?" "Is balance negative after this call?" "Does this branch execute?" If you can't articulate the question, you're not ready to debug.
-
Eval, don't dump. vars dumps everything and answers nothing. eval "type(data['age'])" answers exactly one question. Prefer eval. Always.
-
Never step through loops. A loop with 100 iterations is 100 step commands. A conditional breakpoint is 1 command. Use --break "file:line:i == 50" to jump straight to the iteration that matters.
-
Two strikes, new theory. If your hypothesis was wrong twice, stop. Your mental model of the code is broken, not the debugger session. Close, re-read the code, form a completely different theory, then start a new session with different breakpoints. Continuing to probe the same area has exponentially diminishing returns.
-
Test the fix before writing it. The debugger gives you a live REPL in the exact context of the bug. Use eval to run your proposed fix expression before editing any code. If it works in eval, it'll work in the code.
-
Prove the fix, write the test. After fixing, re-run the program to verify. Then write the smallest possible test that catches the bug. A fix without a test is a fix that will regress.
-
Close the session. Always. A stale session blocks the next one.
Bootstrap
- If
agent-debugger is available globally, use it directly.
- Otherwise, use
npx -y agent-debugger (zero-install, no prompts).
Commands
agent-debugger start <script> --break file:line[:condition] [--runtime path] [--args ...]
npx -y agent-debugger start <script> --break file:line[:condition] [--runtime path] [--args ...]
agent-debugger attach --pid <PID> [--break file:line]
agent-debugger attach [host:]port [--break file:line]
agent-debugger eval <expression>
agent-debugger vars
agent-debugger step [into|out]
agent-debugger continue
agent-debugger stack
agent-debugger break file:line[:cond]
agent-debugger source
agent-debugger status
agent-debugger close
Multiple --break flags supported. Conditions are expressions: --break "app.py:42:len(items) > 10".
Debugging a Running Server
Use attach --pid to debug any running Python server without restarting it or changing code. debugpy is auto-installed if missing. Virtualenvs are auto-detected.
Alternative: start the server with debugpy yourself and attach by port:
python -m debugpy --listen 5678 -m uvicorn app:main
agent-debugger attach 5678 --break routes.py:42
Supported Languages
| Language | Extension | Adapter | Requirement |
|---|
| Python | .py | debugpy | Auto-installed on attach. Or: pip install debugpy |
| JavaScript/TypeScript | .js/.ts | Node Inspector | Node.js |
| Go | .go | Delve | go install github.com/go-delve/delve/cmd/dlv@latest |
| Rust/C/C++ | .rs/.c/.cpp | CodeLLDB | CODELLDB_PATH env var |
The Playbook
These are not suggestions. These are the right way to handle each class of bug.
Start vs Attach — Choose First
Before anything else, decide how to connect:
- The process is already running (server, daemon, worker, long-lived service) →
attach --pid. Always. Don't restart it — you'll lose the state you need to inspect.
- You need to run a script from scratch (CLI tool, test file, one-off script) →
start <script>.
If you're debugging a web server (uvicorn, Flask, FastAPI, Django, Express, etc.), a background worker, or any long-running process — attach is the default, not start. Find the PID, attach, set breakpoints, trigger the code path, inspect.
ps aux | grep uvicorn
agent-debugger attach --pid 12345 --break routes.py:42
curl localhost:8000/api/endpoint
agent-debugger continue
agent-debugger eval "request.body"
agent-debugger close
agent-debugger start app.py --break "app.py:25"
Type Bugs
A value has the wrong type somewhere in the pipeline. Don't step through — go straight to the suspect and ask.
agent-debugger start app.py --break "app.py:25"
agent-debugger eval "type(data['age'])"
agent-debugger eval "int(data['age'])"
agent-debugger close
Two commands after the breakpoint. Done.
Data Pipeline Bugs
Something in a batch is wrong. Don't look at individual records — assert the shape of the whole batch.
agent-debugger start etl.py --break "etl.py:90"
agent-debugger eval "all(isinstance(v, int) for v in result.values())"
agent-debugger eval "[k for k,v in result.items() if not isinstance(v, int)]"
agent-debugger close
One breakpoint, two evals. The first asks "is anything wrong?", the second asks "what exactly?"
Loop Bugs (The Wolf Fence)
A loop processes N items and something goes wrong at an unknown iteration. Binary search it.
agent-debugger start app.py --break "app.py:45:i == 500"
agent-debugger eval "is_valid(result)"
agent-debugger close
agent-debugger start app.py --break "app.py:45:i == 750"
agent-debugger eval "is_valid(result)"
agent-debugger close
agent-debugger start app.py --break "app.py:45:i == 625"
~10 iterations to find the bug in 1000 items. Not 1000 step commands.
Invariant Violations
You know what should never happen. Tell the debugger to catch the exact moment it does.
agent-debugger start bank.py --break "bank.py:68:account.balance < 0"
agent-debugger start pipeline.py --break "pipeline.py:30:not isinstance(value, (int, float))"
agent-debugger start app.py --break "app.py:55:len(results) > 100"
If it hits, you've caught the crime in progress. If it doesn't hit, your theory was wrong — move on.
Recursion / Deep Call Chains
The stack tells you how you arrived. The eval tells you why you're wrong.
agent-debugger start tree.py --break "tree.py:22"
agent-debugger stack
agent-debugger eval "current_depth"
agent-debugger eval "max_depth"
agent-debugger close
"Where Does This Bad Data Come From?"
You found bad data downstream. Pivot upstream.
agent-debugger start app.py --break "handler.py:55"
agent-debugger eval "data['age']"
agent-debugger close
agent-debugger start app.py --break "loader.py:22"
agent-debugger eval "raw_row"
agent-debugger close
Don't fix the symptom at the handler. Fix the cause at the loader.
"Which of These 3 Functions Is the Culprit?"
Set breakpoints at all suspects. The runtime tells you which one fires.
ps aux | grep uvicorn
agent-debugger attach --pid 12345 \
--break "auth.py:30" \
--break "validate.py:55" \
--break "handler.py:80"
curl localhost:8000/api/endpoint
agent-debugger continue
agent-debugger eval "request.payload"
agent-debugger close
For scripts, same idea with start:
agent-debugger start app.py \
--break "auth.py:30" \
--break "validate.py:55" \
--break "handler.py:80"
agent-debugger eval "request.payload"
agent-debugger close
Testing a Fix In-Place
You think you know the fix. Prove it before editing.
agent-debugger eval "total + int(data['age'])"
agent-debugger eval "int(data['age'])"
agent-debugger eval "sum(int(d['age']) if isinstance(d['age'], str) else d['age'] for d in users)"
agent-debugger close
Falsifying Your Theory
Design evals that would break your hypothesis, not confirm it. Confirmation bias is the #1 debugging trap.
agent-debugger eval "isinstance(data['age'], str)"
agent-debugger eval "isinstance(users[0]['age'], str)"
agent-debugger eval "isinstance(users[1]['age'], str)"
agent-debugger eval "users[2]"
Never Do This
Never step blindly. If you're running step more than 3 times in a row, you need a breakpoint, not more steps.
Never start without reading code. The debugger doesn't find bugs. You find bugs by reading code and forming theories. The debugger just confirms them.
Never dump vars when you have a question. vars is for the rare case when you genuinely don't know what variables exist. If you have a theory, eval tests it directly.
Never debug timing bugs with the debugger. Pausing execution changes timing. Race conditions disappear under observation. Use logging.
Never keep going after 2 failed hypotheses. Close. Re-read. Rethink. Your mental model is wrong, and more debugger commands won't fix your mental model.
Never leave a session open. agent-debugger close. Always. Every time.
Never fix without verifying. Run the program after the fix. If you can, toggle the fix to prove causation. Then write a test.
Notes
- Use absolute paths for breakpoints
- One session at a time —
close before starting another
attach --pid auto-installs debugpy — no manual setup needed
attach --pid requires lldb (macOS, included with Xcode CLI tools) or gdb (Linux)
- Program stdout goes to the daemon — use
eval to inspect output values