com um clique
android
// Control an Android phone remotely — navigate apps, tap, type, swipe, and automate Uber, WhatsApp, Spotify, Maps, Settings, Tinder
// Control an Android phone remotely — navigate apps, tap, type, swipe, and automate Uber, WhatsApp, Spotify, Maps, Settings, Tinder
| name | android |
| description | Control an Android phone remotely — navigate apps, tap, type, swipe, and automate Uber, WhatsApp, Spotify, Maps, Settings, Tinder |
| version | 1.0.0 |
| metadata | {"hermes":{"tags":["android","phone","automation","accessibility"],"category":"android"}} |
You can control an Android phone remotely using the android_* tools. The phone runs a companion app called Hermes Bridge which exposes an HTTP API. You communicate with it over the network — no USB, no ADB, no physical connection needed.
Hermes Agent (this server) ──HTTP──> Hermes Bridge app (Android phone)
├── Reads screen via AccessibilityService
├── Performs taps, types, swipes
└── Authenticated via pairing code
When the user wants to connect their phone, ask for their pairing code — a 6-character code shown in the Hermes Bridge app (e.g. K7V3NP).
Then call:
android_setup("<pairing_code>")
This does two things:
Relay the user_instructions field from the result directly to the user. It contains the server IP and port they need to type into the phone app.
After the user taps Connect on their phone, the phone connects to this server via WebSocket. Call android_ping() to verify the connection is live.
Do NOT ask about:
Just ask for the pairing code, call setup, and relay the instructions.
You have these 38 tools. Use them by name — they are function calls.
android_ping() — check if phone is connected and respondingandroid_setup(pairing_code) — start relay and configure connectionandroid_read_screen(include_bounds=False) — get the full accessibility tree as JSON. Returns every visible UI element with text, className, nodeId, clickable, etc. Always call this before interacting.android_screenshot() — capture a screenshot as base64 PNG. Use when the accessibility tree doesn't show enough (canvas apps, image-heavy UIs).android_current_app() — get the package name and activity of the foreground app.android_open_app(package) — launch any app by package name. This is the primary way to open apps. Do NOT try to find and tap app icons. Example: android_open_app("com.instagram.android")android_get_apps() — list all installed apps with package names. Use this if you don't know the package name.android_tap(x, y, node_id) — tap by coordinates or node ID. Prefer node_id from read_screen.android_tap_text(text, exact=False) — tap the first element matching text. Most convenient for buttons, menu items, links.android_type(text, clear_first=False) — type into the currently focused input field. Tap the field first.android_swipe(direction, distance="medium") — swipe up/down/left/right. Distances: short, medium, long.android_scroll(direction, node_id=None) — scroll a specific element or the whole screen.android_press_key(key) — press a key. Options: back, home, recents, power, volume_up, volume_down, enter, delete, tab, escape, search, notificationsandroid_wait(text, class_name, timeout_ms=5000) — poll until an element appears. Use after navigation or loading.For any task, follow this pattern and then STOP:
android_open_app(package) — open the appandroid_read_screen() — see what's on screenandroid_read_screen() or android_screenshot() — verify the resultandroid_open_app(package) — never try to find and tap the icon on the home screen or app drawer.android_read_screen() over android_screenshot() — read_screen is faster and structured. Only use screenshot when the accessibility tree is insufficient (canvas/image-heavy apps).android_tap_text("Button Text") over coordinates — it's more reliable.android_get_apps() and search the results.android_press_key("back"). Go home: android_press_key("home").| App | Package |
|---|---|
| Uber | com.ubercab |
| Bolt | com.bolt.client |
| com.whatsapp | |
| Spotify | com.spotify.music |
| Google Maps | com.google.android.apps.maps |
| Chrome | com.android.chrome |
| Gmail | com.google.android.gm |
| com.instagram.android | |
| X/Twitter | com.twitter.android |
| Tinder | com.tinder |
| Settings | com.android.settings |
android_open_app("com.ubercab")android_wait(text="Where to?", timeout_ms=8000)android_tap_text("Where to?")android_type("<destination>", clear_first=True)android_wait(text="<destination keyword>") then tap suggestionandroid_read_screen() — read price and car optionsandroid_tap_text("UberX") then android_tap_text("Confirm UberX")android_wait(text="Finding your driver", timeout_ms=10000)Pitfalls: Uber may block accessibility taps on some versions — fall back to screenshot + coordinates. Always mention surge pricing to user.
android_open_app("com.whatsapp")android_wait(text="Chats")android_tap_text("<contact name>")android_tap_text("New chat") → type contact name → tap matchandroid_tap_text("Type a message")android_type("<message text>")android_tap_text("Send") or android_press_key("enter")Pitfalls: Message input is android.widget.EditText. Read screen after typing to verify before sending.
android_open_app("com.spotify.music")android_wait(text="Search", timeout_ms=8000)android_tap_text("Search")android_wait(class_name="android.widget.EditText")android_type("<query>", clear_first=True)android_wait(text="Songs", timeout_ms=5000)android_read_screen() then tap desired resultPlayback: android_tap_text("Play"), android_tap_text("Next"), android_tap_text("Pause")
Pitfalls: Spotify uses custom views — screenshot may be more useful than read_screen.
android_open_app("com.google.android.apps.maps")android_wait(text="Search here", timeout_ms=8000)android_tap_text("Search here")android_type("<destination>", clear_first=True)android_tap_text("Directions")android_read_screen() — report time, distance, route to userandroid_tap_text("Start")Pitfalls: Maps uses heavy canvas rendering — prefer android_screenshot(). Exit navigation with android_press_key("back").
android_open_app("com.android.settings")android_wait(text="Settings", timeout_ms=5000)android_read_screen() to find specific togglesPitfalls: Settings UI varies across manufacturers (Samsung, Pixel, Xiaomi). Always read_screen to discover actual labels. Use android_scroll("down") if setting not visible.
android_open_app("com.tinder")android_wait(timeout_ms=8000)android_read_screen() + android_screenshot() — Tinder is image-heavyIMPORTANT: Always confirm with user before swiping or messaging.
android_swipe("right")android_swipe("left")android_swipe("up")Pitfalls: Tinder uses custom UI — accessibility tree is limited, prefer screenshots. "It's a Match!" popup: tap anywhere to dismiss.