| name | pilotty |
| description | Automates terminal TUI applications (vim, htop, lazygit, dialog) through managed PTY sessions. Use when the user needs to interact with terminal apps, edit files in vim/nano, navigate TUI menus, click terminal buttons/checkboxes, or automate CLI workflows with interactive prompts. |
| allowed-tools | Bash(pilotty:*) |
Terminal Automation with pilotty
CRITICAL: Argument Positioning
All flags (--name, -s, --format, etc.) MUST come BEFORE positional arguments:
pilotty spawn --name myapp vim file.txt
pilotty key -s myapp Enter
pilotty snapshot -s myapp --format text
pilotty spawn vim file.txt --name myapp
pilotty key Enter -s myapp
This is the #1 cause of agent failures. When in doubt: flags first, then command/args.
Quick start
pilotty spawn vim file.txt
pilotty wait-for "file.txt"
pilotty snapshot
pilotty key i
pilotty type "Hello, World!"
pilotty key Escape
pilotty kill
Core workflow
- Spawn:
pilotty spawn <command> starts the app in a background PTY
- Wait:
pilotty wait-for <text> ensures the app is ready
- Snapshot:
pilotty snapshot returns screen state with detected UI elements
- Understand: Parse
elements[] to identify buttons, inputs, toggles
- Interact: Use keyboard commands (
key, type) to navigate and interact
- Re-snapshot: Check
content_hash to detect screen changes
Commands
Session management
pilotty spawn <command>
pilotty spawn --name myapp <cmd>
pilotty kill
pilotty kill -s myapp
pilotty list-sessions
pilotty daemon
pilotty shutdown
pilotty examples
Screen capture
pilotty snapshot
pilotty snapshot --format compact
pilotty snapshot --format text
pilotty snapshot -s myapp
HASH=$(pilotty snapshot | jq '.content_hash')
pilotty key Enter
pilotty snapshot --await-change $HASH
pilotty snapshot --await-change $HASH --settle 50
Input
pilotty type "hello"
pilotty type -s myapp "text"
pilotty key Enter
pilotty key Ctrl+C
pilotty key Escape
pilotty key Tab
pilotty key F1
pilotty key Alt+F
pilotty key Up
pilotty key -s myapp Ctrl+S
pilotty key "Ctrl+X m"
pilotty key "Escape : w q Enter"
pilotty key "a b c" --delay 50
pilotty key -s myapp "Tab Tab Enter"
Interaction
pilotty click 5 10
pilotty click -s myapp 10 20
pilotty scroll up
pilotty scroll down 5
pilotty scroll up 10 -s myapp
Terminal control
pilotty resize 120 40
pilotty resize 80 24 -s myapp
pilotty wait-for "Ready"
pilotty wait-for "Error" -r
pilotty wait-for "Done" -t 5000
pilotty wait-for "~" -s editor
Global options
| Option | Description |
|---|
-s, --session <name> | Target specific session (default: "default") |
--format <fmt> | Snapshot format: full, compact, text |
-t, --timeout <ms> | Timeout for wait-for and await-change (default: 30000) |
-r, --regex | Treat wait-for pattern as regex |
--name <name> | Session name for spawn command |
--delay <ms> | Delay between keys in a sequence (default: 0, max: 10000) |
--await-change <hash> | Block snapshot until content_hash differs |
--settle <ms> | Wait for screen to be stable for this many ms (default: 0) |
Environment variables
PILOTTY_SESSION="mysession"
PILOTTY_SOCKET_DIR="/tmp/pilotty"
RUST_LOG="debug"
Snapshot Output
The snapshot command returns structured JSON with detected UI elements:
{
"snapshot_id": 42,
"size": { "cols": 80, "rows": 24 },
"cursor": { "row": 5, "col": 10, "visible": true },
"text": "Settings:\n [x] Notifications [ ] Dark mode\n [Save] [Cancel]",
"elements": [
{ "kind": "toggle", "row": 1, "col": 2, "width": 3, "text": "[x]", "confidence": 1.0, "checked": true },
{ "kind": "toggle", "row": 1, "col": 20, "width": 3, "text": "[ ]", "confidence": 1.0, "checked": false },
{ "kind": "button", "row": 2, "col": 2, "width": 6, "text": "[Save]", "confidence": 0.8 },
{ "kind": "button", "row": 2, "col": 10, "width": 8, "text": "[Cancel]", "confidence": 0.8 }
],
"content_hash": 12345678901234567890
}
Use --format text for a plain text view with cursor indicator:
--- Terminal 80x24 | Cursor: (5, 10) ---
bash-3.2$ [_]
The [_] shows cursor position. Use the text content to understand screen state and navigate with keyboard commands.
Element Detection
pilotty automatically detects interactive UI elements in terminal applications. Elements provide read-only context to help understand UI structure.
Element Kinds
| Kind | Detection Patterns | Confidence | Fields |
|---|
| toggle | [x], [ ], [*], ☑, ☐ | 1.0 | checked: bool |
| button | Inverse video, [OK], <Cancel>, (Submit) | 1.0 / 0.8 | focused: bool (if true) |
| input | Cursor position, ____ underscores | 1.0 / 0.6 | focused: bool (if true) |
Element Fields
| Field | Type | Description |
|---|
kind | string | Element type: button, input, or toggle |
row | number | Row position (0-based from top) |
col | number | Column position (0-based from left) |
width | number | Width in terminal cells (CJK chars = 2) |
text | string | Text content of the element |
confidence | number | Detection confidence (0.0-1.0) |
focused | bool | Whether element has focus (only present if true) |
checked | bool | Toggle state (only present for toggles) |
Confidence Levels
| Confidence | Meaning |
|---|
| 1.0 | High confidence: Cursor position, inverse video, checkbox patterns |
| 0.8 | Medium confidence: Bracket patterns [OK], <Cancel> |
| 0.6 | Lower confidence: Underscore input fields ____ |
Wait for Screen Changes (Recommended)
Stop guessing sleep durations! Use --await-change to wait for the screen to actually update:
HASH=$(pilotty snapshot | jq '.content_hash')
pilotty key Enter
pilotty snapshot --await-change $HASH
pilotty snapshot --await-change $HASH --settle 100
Flags:
| Flag | Description |
|---|
--await-change <HASH> | Block until content_hash differs from this value |
--settle <MS> | After change detected, wait for screen to be stable for MS |
-t, --timeout <MS> | Maximum wait time (default: 30000) |
Why this is better than sleep:
sleep 1 is a guess - too short causes race conditions, too long slows automation
--await-change waits exactly as long as needed - no more, no less
--settle handles apps that render progressively (show partial, then complete)
Waiting for Streaming AI Responses
When interacting with AI-powered TUIs (like opencode, etc.) that stream responses, you need a longer --settle time since the screen keeps updating as tokens arrive:
HASH=$(pilotty snapshot -s myapp | jq -r '.content_hash')
pilotty type -s myapp "write me a poem about ai agents"
pilotty key -s myapp Enter
pilotty snapshot -s myapp --await-change "$HASH" --settle 3000 -t 60000
pilotty scroll -s myapp up 10
pilotty snapshot -s myapp --format text
Key parameters for streaming:
--settle 2000-3000: AI responses have pauses between chunks; 2-3 seconds ensures streaming is truly done
-t 60000: Extend timeout beyond the 30s default for longer generations
- The settle timer resets on each screen change, so it naturally waits until streaming stops
Manual Change Detection
For manual polling (not recommended), use content_hash directly:
SNAP1=$(pilotty snapshot)
HASH1=$(echo "$SNAP1" | jq -r '.content_hash')
pilotty key Tab
SNAP2=$(pilotty snapshot)
HASH2=$(echo "$SNAP2" | jq -r '.content_hash')
if [ "$HASH1" != "$HASH2" ]; then
echo "Screen changed - re-analyze elements"
fi
Using Elements Effectively
Elements are read-only context for understanding the UI. Use keyboard navigation for reliable interaction:
pilotty snapshot | jq '.elements'
pilotty key Tab
pilotty key Space
pilotty key Enter
pilotty snapshot | jq '.elements[] | select(.kind == "toggle")'
Key insight: Use elements to understand WHAT is on screen, use keyboard to interact with it.
Navigation Approach
pilotty uses keyboard-first navigation, just like a human would:
pilotty snapshot --format text
pilotty key Tab
pilotty key Enter
pilotty key Escape
pilotty key Up
pilotty key Space
pilotty type "search term"
pilotty key Enter
pilotty click 5 10
Key insight: Parse the snapshot text and elements to understand what's on screen, then use keyboard commands to navigate. This works reliably across all TUI applications.
Example: Edit file with vim
pilotty spawn --name editor vim /tmp/hello.txt
pilotty wait-for -s editor "hello.txt"
HASH=$(pilotty snapshot -s editor | jq '.content_hash')
pilotty key -s editor i
pilotty type -s editor "Hello from pilotty!"
pilotty snapshot -s editor --await-change $HASH --settle 50
pilotty key -s editor "Escape : w q Enter"
pilotty list-sessions
Alternative using individual keys:
pilotty key -s editor Escape
pilotty type -s editor ":wq"
pilotty key -s editor Enter
Example: Dialog checklist interaction
pilotty spawn --name opts dialog --checklist "Select features:" 12 50 4 \
"notifications" "Push notifications" on \
"darkmode" "Dark mode theme" off \
"autosave" "Auto-save documents" on \
"telemetry" "Usage analytics" off
pilotty snapshot -s opts --settle 200
SNAP=$(pilotty snapshot -s opts)
echo "$SNAP" | jq '.elements[] | select(.kind == "toggle")'
HASH=$(echo "$SNAP" | jq '.content_hash')
pilotty key -s opts Down
pilotty key -s opts Space
pilotty snapshot -s opts --await-change $HASH | jq '.elements[] | select(.kind == "toggle") | {text, checked}'
pilotty key -s opts Enter
pilotty kill -s opts
Example: Form filling with elements
pilotty spawn --name form my-form-app
pilotty snapshot -s form | jq '.elements'
pilotty type -s form "myusername"
pilotty key -s form Tab
pilotty type -s form "mypassword"
pilotty key -s form Tab
pilotty key -s form Space
pilotty key -s form Tab
pilotty key -s form Enter
pilotty snapshot -s form --format text
Example: Monitor with htop
pilotty spawn --name monitor htop
pilotty wait-for -s monitor "CPU"
pilotty snapshot -s monitor --format text
pilotty key -s monitor F9
pilotty key -s monitor q
pilotty kill -s monitor
Example: Interact with AI TUI (opencode, etc.)
AI-powered TUIs stream responses, requiring special handling:
pilotty spawn --name ai opencode
pilotty wait-for -s ai "Ask anything" -t 15000
HASH=$(pilotty snapshot -s ai | jq -r '.content_hash')
pilotty type -s ai "explain the architecture of this codebase"
pilotty key -s ai Enter
pilotty snapshot -s ai --await-change "$HASH" --settle 3000 -t 60000 --format text
pilotty scroll -s ai up 20
pilotty snapshot -s ai --format text
pilotty kill -s ai
Gotchas with AI apps:
- Use
--settle 2000-3000 because AI responses pause between chunks
- Extend timeout with
-t 60000 for complex prompts
- Long responses may scroll the terminal; use
scroll up to see the beginning
- The settle timer resets on each screen update, so it waits for true completion
Sessions
Each session is isolated with its own:
- PTY (pseudo-terminal)
- Screen buffer
- Child process
pilotty spawn --name monitoring htop
pilotty spawn --name editor vim file.txt
pilotty snapshot -s monitoring
pilotty key -s editor Ctrl+S
pilotty list-sessions
pilotty kill -s editor
The first session spawned without --name is automatically named default.
Important: The --name flag must come before the command. Everything after the command is passed as arguments to that command.
Daemon Architecture
pilotty uses a background daemon for session management:
- Auto-start: Daemon starts on first command
- Auto-stop: Shuts down after 5 minutes with no sessions
- Session cleanup: Sessions removed when process exits (within 500ms)
- Shared state: Multiple CLI calls share sessions
You rarely need to manage the daemon manually.
Error Handling
Errors include actionable suggestions:
{
"code": "SESSION_NOT_FOUND",
"message": "Session 'abc123' not found",
"suggestion": "Run 'pilotty list-sessions' to see available sessions"
}
{
"code": "SPAWN_FAILED",
"message": "Failed to spawn process: command not found",
"suggestion": "Check that the command exists and is in PATH"
}
Common Patterns
Reliable action + wait (recommended)
HASH=$(pilotty snapshot | jq '.content_hash')
pilotty key Enter
pilotty snapshot --await-change $HASH --settle 50
Wait then act
pilotty spawn my-app
pilotty wait-for "Ready"
pilotty snapshot
Check state before action
pilotty snapshot --format text | grep "Error"
pilotty key Enter
Check for specific element
pilotty snapshot | jq '.elements[] | select(.kind == "toggle") | {text, checked}' | head -1
pilotty snapshot | jq '.elements[] | select(.row == 5 and .col == 10)'
Retry on timeout
pilotty wait-for "Ready" -t 5000 || {
pilotty snapshot --format text
}
Deep-dive Documentation
For detailed patterns and edge cases, see:
Ready-to-use Templates
Executable workflow scripts:
Usage:
./templates/vim-workflow.sh /tmp/myfile.txt "File content here"
./templates/dialog-interaction.sh
./templates/multi-session.sh
./templates/element-detection.sh