一键在 Manus 中运行任何 Skill

$pwd:

browser-control-ios-touch-fallback

Name: Browser Control Ios Touch Fallback
Author: cybersemics

// Fallback for iOS native touch dispatch when the wdio MCP's standard touch tools (`tap_element`, `swipe`, `drag_and_drop`, `mobile:` commands) produce the wrong behaviour for what a real finger would do. Posts the legacy JSONWP `/touch/perform` endpoint via curl. Use sparingly — only when a standard tool has visibly misfired, or for a recipe in the catalogue below.

在 Manus 中运行

$ git log --oneline --stat

stars:355

forks:137

updated:2026年5月27日 18:16

文件资源管理器

2 个文件

SKILL.md

readonly

name	browser-control-ios-touch-fallback
description	Fallback for iOS native touch dispatch when the wdio MCP's standard touch tools (`tap_element`, `swipe`, `drag_and_drop`, `mobile:` commands) produce the wrong behaviour for what a real finger would do. Posts the legacy JSONWP `/touch/perform` endpoint via curl. Use sparingly — only when a standard tool has visibly misfired, or for a recipe in the catalogue below.
allowed-tools	["bash","wdio"]

This skill is a fallback, not the default. The standard wdio-mcp touch tools (tap_element, swipe, drag_and_drop, and mobile: commands via execute_script) cover the overwhelming majority of iOS interactions and you should reach for them first. Drop down to this skill only when:

A specific gesture is documented below in Recipes (e.g. double-tap-to-select).
You tried a standard tool and the result was visibly wrong — the field blurred when you expected a selection, the gesture didn't register at all, the recognizer interpreted it as a different gesture than what a real finger would have produced.

The reason this layer exists at all: a few iOS WKWebView gesture recognizers (notably double-tap-to-select) reject the synthetic touches that mobile: <cmd> and W3C /actions dispatch — they only fire for touches delivered through XCUITest's tap primitive via the legacy JSONWP route.

Mechanism

Appium-XCUITest still proxies the JSONWP TouchAction endpoint at POST /session/{sid}/touch/perform. It accepts a sequence of single-touch actions and dispatches them through XCUICoordinate, which iOS's recognizers honour.

A helper script encapsulates the auth, hub URL, and session-ID lookup so the agent only needs to pass the action sequence:

.github/skills/browser-control-ios-touch-fallback/touch-perform.sh '<actions JSON>'

<actions JSON> is the inner actions array — the wrapper {"actions": …} is added by the script.

Inputs (already set up by `browser-control-ios` during bootstrap)

The helper reads these — browser-control-ios writes/exports them when the session is opened, so you shouldn't need to do anything yourself:

Session ID at /tmp/em-bs-session.txt
BROWSERSTACK_USERNAME and BROWSERSTACK_ACCESS_KEY in env (from GitHub Actions secrets in the Copilot runner)
Hub URL — hard-coded in the helper

If any of these is missing the helper exits with a clear error; surface it rather than trying to recreate them here.

Action primitives (verified)

Action	Body	What it does
`tap`	`{"action":"tap","options":{"x":N,"y":N}}`	Atomic XCUICoordinate.tap at a screen point
`press`	`{"action":"press","options":{"x":N,"y":N}}`	Touch-down at a screen point
`wait`	`{"action":"wait","options":{"ms":N}}`	Pause
`release`	`{"action":"release"}`	Touch-up matching the most recent `press`

Likely supported but not verified on this stack: moveTo, longPress, cancel. Probe before relying on them — they're standard JSONWP but the Appium-XCUITest proxy may not expose every one.

Multi-touch (pinch / two-finger tap) uses a separate endpoint, /touch/multi/perform, which is unverified here.

Coordinates

/touch/perform consumes native screen points. For em's full-screen Capacitor webview on iPhone they happen to be identity with CSS-px coords from getBoundingClientRect (the WebView spans the full 393×852-point screen at origin (0, 0), visualViewport.offsetTop/Left == 0, scale == 1). So a (cx, cy) from web works as-is.

Critical gotcha — re-fetch coordinates after every focus-state change. When the iOS keyboard opens or closes, em's React layout shifts the editable by ~26 px (toolbar / safe-area adjustment driven by state.isKeyboardOpen). The same data-editable will have cy=138 blurred and cy=164 focused. Tapping the wrong frame's coords lands outside the word and blurs the field.

Pattern:

// in webview context, via execute_script
var e = document.querySelector('[data-editable]')
var r = e.getBoundingClientRect()
return JSON.stringify({ cx: Math.round(r.x + r.width / 2), cy: Math.round(r.y + r.height / 2) })

Run this after any tap that changes focus, before using the coords for the next gesture.

If the Capacitor app ever ships with the webview NOT full-screen, or visualViewport scale != 1, you'll need to add the WebView's native origin (query //XCUIElementTypeWebView) and account for visualViewport.offsetLeft/offsetTop * scale. Identity only holds while the WebView spans the screen.

Recipes

Double-tap-to-select a word

Verified: two tap actions, 100 ms gap, at the focused-state word center. iOS dispatches as a double-tap, WebKit's recognizer selects the word, and the native Cut | Copy | Paste | Replace… menu appears.

.github/skills/browser-control-ios-touch-fallback/touch-perform.sh \
  '[{"action":"tap","options":{"x":'$CX','"y":'$CY'}},
    {"action":"wait","options":{"ms":100}},
    {"action":"tap","options":{"x":'$CX','"y":'$CY'}}]'

The higher-level recipe (focus → re-fetch coords → dispatch → verify) lives in interaction-ios-select-text; reach for that one for the text-selection use case rather than calling this directly.

Why not just use this for every touch?

Two reasons:

No selector form. Every call needs an explicit (x, y), which means an execute_script for getBoundingClientRect plus the curl — two calls instead of one tap_element('#foo'). The MCP's selector-based tools hide the coordinate-management cost.
Re-fetch burden. Focus-state shifts mean stale coords from any prior call are dangerous. The fewer raw coordinate calls, the fewer chances to get this wrong.

Keep this as a fallback. The standard tools are faster, less brittle, and recognized in the same way for most gestures.

related-skills.json

同仓库

plan.md

from "cybersemics/em"

ALWAYS USE THIS SKILL before writing implementation code for any non-trivial change. Produces a written architectural plan grounded in the existing codebase, then critiques it — both stages, one agent, one atomic unit.

2026-05-27355

browser-control-ios.md

from "cybersemics/em"

iOS environment bring-up for driving the em Capacitor app on BrowserStack App Automate via the wdio MCP — native (XCUITest) AND web (WKWebView) in one session. Invoked by browser-control when the target is `ios`; not normally called directly.

2026-05-27355

interaction-ios-select-text.md

from "cybersemics/em"

Select a word inside an em editable on iOS App Automate so the native `Cut | Copy | Paste` edit menu appears and `window.getSelection()` reflects the word. Use when an iOS issue's repro depends on having text selected within a thought (formatting commands, edit-menu-driven flows, drag-blocking, etc.). Composes on top of `browser-control-ios-touch-fallback` for the underlying touch dispatch.

2026-05-27355

ci-monitor.md

from "cybersemics/em"

Use this skill after pushing commits or when asked about CI status or to fix failing tests. It monitors GitHub Actions workflow runs for the current branch, waits for completion, returns which checks passed or failed with error details, and provides a methodology for iterating until all checks pass.

2026-04-27355

puppeteer-update-snapshots.md

from "cybersemics/em"

Regenerate Puppeteer image snapshots using Docker. Use this skill when Puppeteer tests fail due to missing or outdated snapshots. Only use if the UI change was intentional, matches the user’s request or if you otherwise deem it to be necessary. NEVER use this skill to mask legitimate failures. ALWAYS explain to the user why you felt you needed to update snapshots.

2026-04-27355

test-diagnosis.md

from "cybersemics/em"

Use this skill when CI checks have failed. It analyzes the failure logs, identifies the specific failing test or build step, categorizes the failure type, and provides guidance on how to fix it. Use in combination with the CI Monitor skill.

2026-04-27355

package.json

"author": "cybersemics"

"repository": "cybersemics/em"

打开 GitHub 仓库查看创作者相关仓库

$ install --global

$ download --local

在 Manus 中运行

$ useful --forSOC

软件质量保证分析师与测试员计算机与数学类职业15-1253L4

name	browser-control-ios-touch-fallback
description	Fallback for iOS native touch dispatch when the wdio MCP's standard touch tools (`tap_element`, `swipe`, `drag_and_drop`, `mobile:` commands) produce the wrong behaviour for what a real finger would do. Posts the legacy JSONWP `/touch/perform` endpoint via curl. Use sparingly — only when a standard tool has visibly misfired, or for a recipe in the catalogue below.
allowed-tools	["bash","wdio"]

A specific gesture is documented below in Recipes (e.g. double-tap-to-select).
You tried a standard tool and the result was visibly wrong — the field blurred when you expected a selection, the gesture didn't register at all, the recognizer interpreted it as a different gesture than what a real finger would have produced.

Mechanism

A helper script encapsulates the auth, hub URL, and session-ID lookup so the agent only needs to pass the action sequence:

.github/skills/browser-control-ios-touch-fallback/touch-perform.sh '<actions JSON>'

<actions JSON> is the inner actions array — the wrapper {"actions": …} is added by the script.

Inputs (already set up by `browser-control-ios` during bootstrap)

The helper reads these — browser-control-ios writes/exports them when the session is opened, so you shouldn't need to do anything yourself:

Session ID at /tmp/em-bs-session.txt
BROWSERSTACK_USERNAME and BROWSERSTACK_ACCESS_KEY in env (from GitHub Actions secrets in the Copilot runner)
Hub URL — hard-coded in the helper

If any of these is missing the helper exits with a clear error; surface it rather than trying to recreate them here.

Action primitives (verified)

Action	Body	What it does
`tap`	`{"action":"tap","options":{"x":N,"y":N}}`	Atomic XCUICoordinate.tap at a screen point
`press`	`{"action":"press","options":{"x":N,"y":N}}`	Touch-down at a screen point
`wait`	`{"action":"wait","options":{"ms":N}}`	Pause
`release`	`{"action":"release"}`	Touch-up matching the most recent `press`

Likely supported but not verified on this stack: moveTo, longPress, cancel. Probe before relying on them — they're standard JSONWP but the Appium-XCUITest proxy may not expose every one.

Multi-touch (pinch / two-finger tap) uses a separate endpoint, /touch/multi/perform, which is unverified here.

Coordinates

Pattern:

// in webview context, via execute_script
var e = document.querySelector('[data-editable]')
var r = e.getBoundingClientRect()
return JSON.stringify({ cx: Math.round(r.x + r.width / 2), cy: Math.round(r.y + r.height / 2) })

Run this after any tap that changes focus, before using the coords for the next gesture.

Recipes

Double-tap-to-select a word

.github/skills/browser-control-ios-touch-fallback/touch-perform.sh \
  '[{"action":"tap","options":{"x":'$CX','"y":'$CY'}},
    {"action":"wait","options":{"ms":100}},
    {"action":"tap","options":{"x":'$CX','"y":'$CY'}}]'

Why not just use this for every touch?

Two reasons:

No selector form. Every call needs an explicit (x, y), which means an execute_script for getBoundingClientRect plus the curl — two calls instead of one tap_element('#foo'). The MCP's selector-based tools hide the coordinate-management cost.
Re-fetch burden. Focus-state shifts mean stale coords from any prior call are dangerous. The fewer raw coordinate calls, the fewer chances to get this wrong.

Keep this as a fallback. The standard tools are faster, less brittle, and recognized in the same way for most gestures.

browser-control-ios-touch-fallback

Mechanism

Inputs (already set up by browser-control-ios during bootstrap)

Action primitives (verified)

Coordinates

Recipes

Double-tap-to-select a word

Why not just use this for every touch?

同仓库更多 Skills

Mechanism

Inputs (already set up by browser-control-ios during bootstrap)

Action primitives (verified)

Coordinates

Recipes

Double-tap-to-select a word

Why not just use this for every touch?

同仓库更多 Skills

Inputs (already set up by `browser-control-ios` during bootstrap)

Inputs (already set up by `browser-control-ios` during bootstrap)