| name | agent-tui |
| description | Main Agents: Do NOT use this skill directly. If you need to test the TUI, invoke the `tui_tester` subagent. Drive terminal UI (TUI) applications programmatically for testing, automation, and inspection. Use when: automating CLI/TUI interactions, regression testing terminal apps, or verifying interactive behavior. Also use when: user asks "what is agent-tui", "what does agent-tui do", "demo agent-tui", "show me agent-tui", "how does agent-tui work", or wants to see it in action.
|
🚨 CRITICAL: macOS Daemon Workaround & Gemini CLI Usage 🚨
When using agent-tui in this macOS environment, the default background daemonization process crashes, causing Connection refused (os error 61) errors.
You MUST start the daemon manually shielded from TTY hangups before running any agent-tui commands. Using nohup is insufficient; you must use tmux to provide a fully isolated pseudo-terminal.
To support parallel runs, only restart the daemon if it is not currently running:
if ! agent-tui sessions >/dev/null 2>&1; then
tmux kill-session -t agent-tui 2>/dev/null || true
agent-tui daemon stop 2>/dev/null || true
rm -f /tmp/agent-tui*
tmux new-session -d -s agent-tui 'agent-tui daemon start --foreground > /tmp/agent-tui-daemon.log 2>&1'
sleep 1
fi
Session ID vs PID (Crucial for Reconnection)
When agent-tui run returns JSON, it includes both a session_id and a pid. The pid is purely informational (the OS process ID of the child command). You do not use the pid to reconnect or issue commands. You must always use the session_id (e.g., --session <id>).
If the daemon crashes (os error 61), the pseudo-terminal is destroyed. Even if the child pid survives as an orphan, you cannot reconnect to it. You must restart the daemon using the workaround above and start a completely new session.
Testing the Gemini CLI
When testing the Gemini CLI with agent-tui, there are several strict requirements to ensure deterministic and accurate behavior:
- Build Before Running:
agent-tui runs the built JS files, not TypeScript. You MUST run npm run build or npm run build:all after making code changes and before launching the CLI with agent-tui.
- Bypass Trust Modals: Always pass
GEMINI_CLI_TRUST_WORKSPACE=true in the environment. If you don't, any new project-level agents or extensions will trigger a full-screen "Acknowledge and Enable" modal. This modal steals focus, swallows automation keystrokes, and causes agent-tui wait commands to time out.
- Isolated Environments: If you need to test without real user credentials or existing agents interfering, isolate the global settings using
GEMINI_CLI_HOME=<some-test-dir>.
- Testing State Deltas (e.g., Reloads): If you are testing features that report deltas (e.g.,
/agents reload outputting "1 new local subagent"), you MUST:
- Start the CLI first so it establishes its baseline registry.
- Use a separate shell command (outside of
agent-tui) to write the new agent .md/.toml file.
- Use
agent-tui type and press to trigger the /agents reload command inside the running session.
- (If you add the files before starting the CLI, they become part of the baseline and won't trigger the delta logic).
env GEMINI_CLI_TRUST_WORKSPACE=true GEMINI_CLI_HOME=test-gemini-home agent-tui run -d "$(pwd)" node packages/cli/dist/index.js
Terminal Automation Mastery
Prerequisites
- Supported OS: macOS or Linux (Windows not supported yet).
- Verify install:
agent-tui --version
If not installed, use one of:
curl -fsSL https://raw.githubusercontent.com/pproenca/agent-tui/master/install.sh | sh
npm i -g agent-tui
pnpm add -g agent-tui
bun add -g agent-tui
cargo install --git https://github.com/pproenca/agent-tui.git --path cli/crates/agent-tui
If you used the install script, ensure ~/.local/bin is on your PATH.
Philosophy: Why Terminal Automation Is Different
Terminal UIs are stateless from the observer's perspective. Unlike web browsers with a persistent DOM, terminal automation works with a constantly-refreshed character grid. This fundamental difference shapes everything:
| Web Automation | Terminal Automation |
|---|
| DOM persists across interactions | Screen buffer is redrawn constantly |
| Selectors are stable | Text positions may shift |
| Query once, act many times | Must re-verify before EVERY action |
| Network events signal completion | Must detect visual stability |
The Core Insight: agent-tui gives you vision without memory. Each screenshot is a fresh observation. Previous state means nothing after the UI changes. This isn't a limitation—it's the nature of terminal interaction.
Mental Model: The Feedback Loop
Think of terminal automation as a closed-loop control system:
┌──────────────────────────────────────────────┐
│ │
▼ │
OBSERVE ──► DECIDE ──► ACT ──► WAIT ──► VERIFY ───┘
│ │
│ │
└─────── NEVER skip ◄────────────────────┘
Each phase is mandatory. Skipping verification is the #1 cause of flaky automation.
The "Fresh Eyes" Principle
Every time you need to interact with the UI:
- Take a fresh screenshot — your previous one is now stale
- Locate your target visually — text positions may have changed
- Verify the state — the UI may have changed unexpectedly
- Act only when stable — animations and loading states cause failures
This feels slower, but it's the only reliable approach. Optimistic reuse of stale state causes intermittent failures that are painful to debug.
Critical Rules (Non-Negotiable)
RULE 1: Atomic Execution (No Pipelining)
You are FORBIDDEN from chaining commands with && (e.g., type "x" && press Enter && wait). Modals or UI updates can intercept your keystrokes. You MUST execute one atomic action, wait, screenshot, and verify before taking the next action in a new turn.
RULE 2: Re-snapshot after EVERY action
The UI state is invalidated by any change. Always take a fresh screenshot before acting again.
RULE 3: Never act on unstable UI
If the UI is animating, loading, or transitioning, wait --stable first. Acting during transitions because race conditions.
RULE 4: Verify before claiming success
Use wait "expected text" --assert to confirm outcomes. Don't assume an action worked—prove it.
RULE 5: Error Recovery
If a wait command times out, DO NOT blindly restart or kill the session. Execute screenshot to visually diagnose what unexpected UI element (modal, error dialog, lost focus) intercepted the flow.
RULE 6: Clean up sessions
Always end with agent-tui kill. Orphaned sessions consume resources and can interfere with future runs.
Decision Framework
Which Screenshot Mode?
Use screenshot --format json when parsing automation output, or plain screenshot for human readable text.
How to Wait?
What are you waiting for?
│
├─► Specific text to appear
│ └─► `wait "text" --assert` (fails if not found)
│
├─► Specific text to disappear
│ └─► `wait "text" --gone --assert`
│
├─► UI to stop changing (animations, loading)
│ └─► `wait --stable`
│
└─► Multiple conditions
└─► Chain waits sequentially
How to Act?
What do you need to do?
│
├─► Type text into the terminal
│ └─► `type "text"`
│
├─► Send keyboard shortcuts/navigation
│ └─► `press Ctrl+C` or `press ArrowDown Enter`
Core Workflow
The canonical automation loop:
agent-tui run <command> [-- args...]
agent-tui screenshot --format json
agent-tui type "text"
agent-tui press Enter
agent-tui wait "Expected" --assert
agent-tui kill
Anti-Patterns (What NOT to Do)
❌ Acting During Animation/Loading
agent-tui run my-app
agent-tui screenshot --format json
agent-tui type "value"
agent-tui run my-app
agent-tui wait --stable
agent-tui screenshot --format json
agent-tui type "value"
❌ Assuming Success Without Verification
agent-tui type "value"
agent-tui press Enter
agent-tui type "value"
agent-tui press Enter
agent-tui wait "Success" --assert
❌ Skipping Cleanup
agent-tui run my-app
agent-tui run my-app
agent-tui kill
Before You Start: Clarify Requirements
Before automating any TUI, gather this information:
- Command: What exactly to run? (
my-app --flag or npm start?)
- Success criteria: What text/state indicates success?
- Input sequence: What keystrokes/data to enter, in what order?
- Safety: Is it safe to submit forms, delete data, etc.?
- Auth: Does it need login? Test credentials?
- Live preview: Does the user want to watch? (
agent-tui live start --open)
If any of these are unclear, ask before running.
Demo Mode: Showing What agent-tui Can Do
When a user asks what agent-tui is, wants a demo, or asks "show me how it works":
- Don't explain—demonstrate. Actions speak louder than words.
- Use the live preview so they can watch in real-time.
- Run
top—it's universal and shows dynamic real-time updates.
Quick demo trigger phrases:
- "What is agent-tui?" / "What does agent-tui do?"
- "Demo agent-tui" / "Show me agent-tui"
- "How does agent-tui work?" / "See it in action"
Failure Recovery
| Symptom | Diagnosis | Solution |
|---|
| "Text not found" | Stale view or text moved | Re-snapshot, locate text again |
| Wait times out | UI didn't reach expected state | Check screenshot, verify expectations |
| "Daemon not running" | Daemon crashed or not started | agent-tui daemon start |
| Unexpected layout | Wrong terminal size | agent-tui resize --cols 120 --rows 40 |
| Session unresponsive | App crashed or hung | agent-tui kill, then re-run |
| Repeated failures | Something fundamentally wrong | Stop after 3-5 attempts, ask user |
Self-Discovery: Use --help
You don't need to memorize every flag. The CLI is self-documenting:
agent-tui --help
agent-tui run --help
agent-tui screenshot --help
agent-tui wait --help
When in doubt, ask the CLI. This skill teaches when and why to use commands. For exact flags and syntax, --help is authoritative.
Quick Reference
agent-tui run <cmd> [-- args]
agent-tui screenshot
agent-tui screenshot --format json
agent-tui press Enter
agent-tui press Ctrl+C
agent-tui type "text"
agent-tui wait "text" --assert
agent-tui wait "text" --gone --assert
agent-tui wait --stable
agent-tui sessions
agent-tui live start --open
agent-tui kill