| name | pwa-device-testing |
| description | This skill should be used when the user asks to "test on device", "test on emulator", "run emulator", "launch AVD", "test PWA", "test on Android", "test on mobile", "verify on real device", "check on phone", or discusses testing a feature on an actual device or emulator rather than headless Playwright. Also use when validating features that headless browsers cannot cover (biometric, PWA install, Chrome autofill, touch gestures, password managers). Use proactively when a feature has been implemented that touches any of these capabilities. |
PWA Device Testing
Headless Playwright tests cover logic and layout but cannot validate:
- Chrome password manager / autofill behavior
- WebAuthn biometric (fingerprint, face)
- PWA install-to-homescreen and standalone mode
- Real touch gestures, virtual keyboard, IME
- Service worker update UX on actual Chrome
- CSS safe-area-inset rendering on notched devices
This skill provides the correct setup, known pitfalls, and ready-to-use templates for testing on real Chrome via Android emulator.
Quick Start
scripts/setup-avd.sh
scripts/run-appium-tests.sh
scripts/run-appium-tests.sh --suite my-label
npm run test:emulator
IMPORTANT: ALWAYS run Appium/emulator tests via scripts/run-appium-tests.sh, never bare npx playwright test --config=playwright.appium.config.js. The script handles screen recording, ANR dialog dismissal, archival to test-history/, and ffprobe validation. Without it, test runs produce no video evidence for review.
Fast-gate tests (tsc, eslint, headless npx playwright test) are fine to run directly -- they don't touch the emulator and don't need recording.
npm run test:emulator (via scripts/run-emulator-tests.sh) handles the legacy CDP path: boots emulator if needed, enables Chrome debugging, sets up port forwarding, runs Playwright over CDP.
Architecture
Host machine Android Emulator (Pixel 7, API 35)
+-----------------------+ +---------------------------+
| MobiSSH server :8081 |<--adb rev-->| Chrome tab: localhost:8081|
| Playwright test runner|--CDP:9222-->| Chrome DevTools socket |
+-----------------------+ +---------------------------+
- CDP connection: Playwright
connectOverCDP() to real Chrome via ADB-forwarded DevTools port
- Port forwarding:
adb reverse tcp:8081 tcp:8081 so emulator's localhost reaches host
- Single worker CDP: One CDP connection per test file, fresh tab per test
Interaction Design Principles (learned the hard way)
Emulator tests must be faithful proxies for human interaction
Every dialog, prompt, and overlay that appears during a test flow must be handled the way a real user would handle it: see it, understand what it's asking, and dismiss it appropriately. This includes both app-owned dialogs (host key accept, vault setup) and Chrome-native UI (save password bar, add username suggestion, notification prompt).
If a test can't click a button, a real user can't either. The test is surfacing a real UX bug.
Never use force: true to work around click interception
When Playwright reports <div class="vault-dialog">...</div> intercepts pointer events, the correct response is to fix the CSS/layout so the button is actually clickable, NOT to bypass Playwright's actionability checks with { force: true }. Using force: true:
- Hides real interaction bugs from the test suite
- Creates a test that passes while the actual user flow is broken
- Masks layout overflow on mobile viewports
Common causes of click interception on mobile:
align-items: center on position: fixed; inset: 0 overlays -- when the dialog content is taller than the viewport, content overflows and sibling elements intercept clicks. Fix: align-items: flex-start + overflow-y: auto on the overlay, margin: auto 0 on the dialog for vertical centering that still allows scrolling.
- Chrome-native UI (password save bar, username suggestion) appearing as a layer between your app dialog and the click target. Fix: suppress Chrome autofill on test fields with
autocomplete="off", pre-grant notifications, use --disable-fre flag.
- Keyboard pushing elements up so labels overlap buttons. Fix: ensure sufficient spacing, or scroll the button into view first.
Feature removal is a valid outcome
The selection overlay feature (#55) went through 6+ commits, was feature-flagged off, still caused interference with core scroll behavior (#143), and was ultimately removed entirely (600 lines deleted in 0ac4010). This validates the "know when to quit" principle: if every fix introduces a new bug, the abstraction is wrong. Strip it, ship the working core, and re-approach later with a cleaner design.
Always verify server version before asking user to test
scripts/verify-test-ready.sh [--url https://mobissh.tailbe5094.ts.net]
Checks server currency, HTTP endpoint hash match, and prints the
?reset=1 URL for SW cache busting. Exit 1 = not ready.
Critical Pitfalls (learned the hard way)
Chrome DevTools socket requires set-debug-app
Handled automatically by scripts/run-emulator-tests.sh and
scripts/run-appium-tests.sh. If running manually:
adb shell am set-debug-app --persistent com.android.chrome
then force-stop and relaunch Chrome.
No browser.newContext() on Android Chrome
Android Chrome's CDP exposes a single default browser context. Calling browser.newContext() throws: Protocol error (Target.createBrowserContext): Failed to create browser context.
Correct pattern:
const context = browser.contexts()[0];
const page = await context.newPage();
Shared localStorage across tests
Since all tabs share the single default context, localStorage is shared. Every test fixture MUST clear localStorage AND reload before the test runs. The app reads localStorage on init (panel state, vault, profiles), so clearing alone doesn't help if the app already initialized with stale state:
await page.goto(BASE_URL, { waitUntil: 'domcontentloaded' });
await page.evaluate(() => localStorage.clear());
await page.reload({ waitUntil: 'domcontentloaded' });
Vault snapshot fixture (skip per-test vault setup)
Creating the vault from scratch per test wastes ~5s (keyboard dismiss, modal wait, form fill). The vaultSnapshot worker-scoped fixture creates the vault once per test file and snapshots the localStorage keys. The emulatorPage fixture restores the snapshot and auto-unlocks:
vaultSnapshot: [async ({ cdpBrowser }, use) => {
await use(snapshot);
}, { scope: 'worker' }],
emulatorPage: async ({ cdpBrowser, vaultSnapshot }, use) => {
await page.addInitScript(() => {
Object.defineProperty(window, '__appReady', { });
});
await page.reload({ waitUntil: 'domcontentloaded' });
await page.waitForFunction(() => window.__vaultUnlocked === true);
},
Key insight: The vault unlock bar doesn't appear on startup — it appears later when ensureVaultKeyWithUI() is called during profile operations. Checking for the bar after reload will find needsUnlock=false (race condition). The addInitScript + __appReady hook approach intercepts the app boot to unlock the vault before any UI flow triggers it.
Use emulatorPage for tests that need a pre-configured vault. Use cleanPage for tests that exercise vault setup from scratch.
Playwright outputDir isolation (CRITICAL)
Playwright clears its outputDir on each run. The default is test-results/, which wipes emulator recordings, report.json, frames, and any other non-Playwright artifacts.
Every Playwright config MUST set a dedicated outputDir:
playwright.config.js → test-results/headless
playwright.emulator.config.js → test-results/playwright-emulator
playwright.appium.config.js → test-results-appium
playwright.browserstack.config.js → test-results/browserstack
Without this, run-emulator-tests.sh writes recording.mp4 and report.json to test-results/emulator/, then the next Playwright run wipes them.
workers: 1 is mandatory for CDP
Parallel Playwright workers each try to interact with the same single Chrome instance over CDP. This causes "Target page, context or browser has been closed" across all tests. Always set workers: 1 in the emulator config:
module.exports = defineConfig({
workers: 1,
});
Inject page state AFTER navigation, not before
Any page.evaluate() state injection (WS spies, test globals) done before page.goto() gets destroyed by the navigation. Always inject on the live, already-loaded page:
await page.evaluate(() => { window.__spy = []; });
await page.goto(BASE_URL);
await page.goto(BASE_URL, { waitUntil: 'domcontentloaded' });
await page.evaluate(() => { window.__spy = []; });
Use actionTimeout for fast selector failure
Without actionTimeout, a bad selector (e.g. #connectBtn that doesn't exist) waits the full test timeout (60s), then cleanup closes the page, producing a misleading "Target page closed" error. Set a short action timeout so bad selectors fail fast with the actual error:
use: {
actionTimeout: 10_000,
}
Elements may not have IDs
Don't assume HTML elements have IDs. Use semantic/structural selectors:
await page.locator('#connectBtn').click();
await page.locator('#connectForm button[type="submit"]').click();
Page.screencastFrame CDP doesn't work on Android emulator
The Page.startScreencast / Page.screencastFrame CDP API returns 0 frames on Android emulator Chrome. It works on desktop Chrome but the emulator's GPU pipeline doesn't produce screencast frames. Don't rely on frame count assertions. Screenshots via page.screenshot() work fine as an alternative.
Worker-scoped CDP connection is mandatory
Creating a new connectOverCDP() per test destabilises the DevTools socket. After ~4-5 connect/disconnect cycles, the connection drops with "Target page, context or browser has been closed." Use a worker-scoped fixture:
cdpBrowser: [async ({}, use) => {
const browser = await chromium.connectOverCDP(`http://127.0.0.1:${CDP_PORT}`);
await use(browser);
browser.close();
}, { scope: 'worker' }],
Chrome nag modals block test visibility on first launch
Handled by scripts/run-appium-tests.sh (pre-grants notifications,
sets Chrome flags) and the Appium fixture's modal-dismiss step. If
writing a new test runner, replicate the three-layer defense:
adb shell pm grant com.android.chrome android.permission.POST_NOTIFICATIONS
- Chrome
--disable-fre --no-first-run --no-default-browser-check flags
- Fixture-level modal dismiss with 2s timeout fallback
KVM group membership requires session reload
After sudo usermod -aG kvm $USER, the current shell doesn't pick up the new group. Use sg kvm -c 'emulator ...' or start a new login session.
AVD config uses = (with spaces)
The config.ini generated by avdmanager uses key = value (space-equals-space), not key=value. Sed patterns without spaces silently fail and the fallback echo creates duplicate keys. The setup script uses a set_avd_prop helper that handles both formats.
WebAuthn biometric toggle in headless test browsers
prfAvailable() returns true in Playwright's headless Chromium (PublicKeyCredential API exists) but navigator.credentials.create() hangs forever (no authenticator). In headless tests, uncheck the biometric toggle via page.evaluate:
await page.evaluate(() => {
const cb = document.getElementById('vaultEnableBio');
if (cb) cb.checked = false;
});
The CSS toggle hides the checkbox with opacity:0; width:0; height:0, so Playwright's isVisible() returns false and uncheck() silently skips it. Always use page.evaluate for CSS-hidden form elements.
Real SSH Integration via Docker
For features that need a live SSH connection (gestures, terminal buffer, command execution), use the Docker test-sshd container instead of WebSocket mocks:
docker compose -f docker-compose.test.yml up -d test-sshd
ssh -p 2222 testuser@localhost
The sshd-fixture.js helper starts the container automatically and exposes credentials to tests. The setupRealSSHConnection(page, sshServer) helper in fixtures.js handles: SSRF bypass for localhost, WS URL rewriting (localhost→10.0.2.2), connect form fill, host key acceptance, and waiting for connected state. Vault setup is handled by the vaultSnapshot + emulatorPage fixture (no per-test vault creation).
Touch Gesture Testing
Gesture helpers use CDP Input.dispatchTouchEvent which goes through Chrome's real input pipeline and fires DOM touch events faithfully. An in-page touch visualizer draws green dots/trails at finger positions so gestures are visible in screen recordings (Android's pointer_location overlay doesn't register CDP touches).
Helpers in tests/emulator/fixtures.js:
swipe(page, selector, startX, startY, endX, endY, steps) -- single-finger swipe via CDP
pinch(page, selector, startDist, endDist, steps) -- two-finger pinch via CDP
sendCommand(page, cmd) -- type into IME input char-by-char
The touch visualizer is injected automatically by swipe() and pinch(). Green dots show current finger positions, small trail dots persist for 2s to show the gesture path.
Verify gesture effects through app state, not visual diffs:
const vp = await page.evaluate(() => window.__testTerminal.buffer.active.viewportY);
const msgs = await page.evaluate(() => window.__mockWsSpy.filter(...));
const font = await page.evaluate(() => window.__testTerminal.options.fontSize);
Exploratory Interaction Testing (script-first, then assert)
Before writing test assertions, use the emulator to explore what actually happens when a user performs a gesture. This inverts the typical TDD approach: instead of starting with expected behavior and checking if it matches, start by simulating the physical interaction, capturing what the device shows, and reviewing the visual evidence to discover the domain of possibilities.
The workflow
1. Script the interaction textually. Before writing any test code, describe the interaction as a sequence of physical actions:
- Navigate to Settings panel
- Place two fingers on screen, 200px apart
- Move fingers apart to 400px (pinch out / zoom in)
- Observe: does the bottom bar stay anchored? Does content scale?
- Release fingers
- Observe: does layout return to normal?
2. Translate to emulator script. Use the gesture helpers to replay the interaction on the emulator, with screen recording running but NO assertions:
test('explore: pinch zoom on settings panel', async ({ emulatorPage: page }) => {
await page.locator('[data-panel="settings"]').click();
await page.waitForSelector('#panel-settings.active');
await screenshot(page, testInfo, '01-before-pinch');
await pinch(page, '#panel-settings', 100, 300);
await page.waitForTimeout(500);
await screenshot(page, testInfo, '02-during-zoom');
await page.waitForTimeout(1000);
await screenshot(page, testInfo, '03-after-zoom');
});
3. Capture the video baseline. Run the test, extract frames, and review them:
scripts/run-emulator-tests.sh explore.spec.js
scripts/extract-test-frames.sh --test "explore"
4. Analyze the frames. Read the extracted PNGs to understand what actually happened:
- Did the layout break? Where did elements move?
- Did the browser do something unexpected (native zoom, address bar animation)?
- Is there a Chrome-native prompt or overlay appearing?
- What does the user actually see at each step?
5. Form expectations from evidence. Only after seeing what the device does, write assertions:
- If the layout broke, the assertion documents the fix target
- If it worked correctly, the assertion locks in the behavior
- If something unexpected happened, investigate before asserting
6. Iterate against the baseline. After making code changes:
- Re-run the same interaction
- Extract new frames
- Compare visually against the baseline frames
- Repeat until the new frames match the imagined expectation
Why this matters
The AI cannot predict how Chrome on Android will behave with a specific gesture on a specific layout. The emulator is a real device running real Chrome -- it has its own opinions about what pinch zoom does, how visualViewport reports changes, when the address bar hides, etc.
By scripting the interaction first and observing the result, you:
- Discover behaviors you didn't anticipate (Chrome ignoring
user-scalable=no, keyboard pushing dialogs off-screen)
- Build assertions from evidence, not assumptions
- Create a visual baseline that makes regressions immediately visible
- Avoid the trap of writing a test that passes but doesn't match what a real user sees
Example: discovering the #139 pinch zoom bug
Without exploratory testing, you'd guess "pinch zoom should work because I set user-scalable=no." With it:
- Script: pinch on Settings panel
- Capture: frame shows bottom bar expanded over content
- Discover: Chrome ignores
user-scalable=no, visualViewport.resize fires, handler shrinks #app
- Fix: guard with
vv.scale === 1
- Re-capture: frame shows layout intact during zoom
- Assert: bottom bar height unchanged after pinch
Test Maturation Phases
Integration tests on real devices go through distinct phases. Tune verbosity and strictness to match the current phase.
Phase 1: Bootstrapping
- Reporter:
['list', { printSteps: true }] for maximum visibility
actionTimeout: 10_000 for fast failure on wrong selectors
retries: 0 so every failure is visible and investigated
screenshot: 'on' for every test (attached to HTML report)
- Assertions should be tight and specific
- Every failure gets root-caused, not retried away
Phase 2: Stabilization (after core workflows are reliable)
- Keep verbose reporter but consider adding
retries: 1 for flaky infra
- Start grouping related assertions (e.g. one test for "connect and verify state" instead of separate connect/verify)
- Add
trace: 'retain-on-failure' for post-mortem debugging
Phase 3: Maintenance (stable test suite)
- Switch to
['line'] reporter for compact output
retries: 2 for infrastructure flakiness (emulator hiccups, CDP drops)
- Loosen assertions that break on minor UI changes (check behavior not exact values)
- Guard against false positives: if a test hasn't failed in 20 runs, verify it can still detect regressions
MobiSSH is in Phase 2 (31 Appium tests passing, line reporter active, worker-scoped sessions stable). Phase 1 guidance is retained for new test areas or other projects using this skill.
Key principle: never skip Phase 1. The debugging cost of silent failures in integration tests is 10x higher than in unit tests.
Templates and Scripts
Everything needed to add Android emulator testing to a new project is in the skill assets. Copy, adapt the marked variables, and run.
Setup (one-time):
scripts/setup-avd.sh -- Installs Android SDK, creates AVD, tunes config. Adapt AVD_NAME, SYSTEM_IMAGE, DEVICE_PROFILE.
Test infrastructure:
assets/run-emulator-tests-template.sh -- Full test runner: boots emulator, sets up CDP, starts screen recording, runs Playwright, collects baseline screenshots + video. Adapt APP_PORT, APP_SERVER_CMD, AVD_NAME.
assets/emulator-config-template.js -- Playwright config for CDP connection. Adapt SERVER_PORT, SERVER_CMD.
assets/emulator-fixtures-template.js -- Worker-scoped CDP browser, per-test tab with localStorage isolation + reload pattern.
assets/emulator-test-template.js -- Single test file with screenshot helpers and common patterns (vault setup, form interaction).
Baseline results:
Test artifacts are split between two directories to prevent Playwright's auto-cleanup from destroying recordings and reports:
test-results/emulator/ -- pipeline artifacts (NOT managed by Playwright):
recording.mp4 -- full screen recording from screenrecord
report.json -- Playwright JSON reporter output with per-test timing
frames/ -- extracted video frames at test-critical moments (see below)
workflow-report.html -- narrative HTML report with failure styling and video seek
test-results/playwright-emulator/ -- Playwright-managed artifacts (cleared each run):
- Per-test subdirectories with screenshots and traces
Review server (tools/review-server/serve.js on port 9090) displays both:
- Dashboard, Emulator, Frames, Recordings tabs with per-test screenshots
- Mark Golden / File Issue archival features
- Per-test issue buttons (files GitHub issue with video timestamps)
- Filter tabs (All/Failed/Passed) with seek-to-video links
Add to .gitignore:
playwright-report/
playwright-report-emulator/
test-results/
!test-results/emulator/
Post-Run Video Frame Extraction
Playwright's failure screenshots only show the DOM state at timeout -- after the problem has already manifested. The screen recording captures everything, but a 3-minute video is useless for debugging without timestamps.
scripts/extract-test-frames.sh bridges this gap: it reads the JSON test report, correlates each test's wall-clock timing with the video timeline, and extracts PNG frames at critical moments via ffmpeg. This gives the AI (or a human) a visual timeline of what was actually on screen when each test ran.
Usage:
scripts/extract-test-frames.sh
scripts/extract-test-frames.sh --failed
scripts/extract-test-frames.sh --test "vault"
Output example:
test-results/emulator/frames/
smoke-page-loads-and-renders-MobiSSH-shell-0-before.png
smoke-page-loads-and-renders-MobiSSH-shell-1-midpoint.png
smoke-page-loads-and-renders-MobiSSH-shell-2-end-passed.png
gestures-vertical-swipe-scrolls-terminal-0-before.png
gestures-vertical-swipe-scrolls-terminal-1a-quarter.png # failed: extra frames
gestures-vertical-swipe-scrolls-terminal-1-midpoint.png
gestures-vertical-swipe-scrolls-terminal-1b-threequarter.png
gestures-vertical-swipe-scrolls-terminal-2-end-failed.png
Why this matters for AI-assisted debugging:
When the AI reviews a test failure, it can read the extracted frames to understand whether:
- The app was showing the expected dialog (host key, vault) or an unexpected interruption (Chrome "Save password?" bar, notification prompt)
- A button click failed because of a real layout bug vs. a Chrome-native overlay
- The test flow reached the right screen before timing out
- The emulator was responsive or frozen
This is the exit route from "guess why it failed from the error message" to "see what the user would have seen."
Uniform Recording Review
scripts/extract-test-frames.sh extracts frames aligned to test boundaries. But sometimes you need to see what happened between tests or across the full recording:
- Diagnosing what Chrome showed during page load or vault setup (before the first test's start timestamp)
- Understanding gaps between tests (teardown, reload, navigation)
- When test-aligned frame timing is off (video/report clock skew)
- Reviewing the full user-visible flow end-to-end
scripts/review-recording.sh samples the recording at uniform intervals and dumps frames to test-results/emulator/review/.
scripts/review-recording.sh
scripts/review-recording.sh --interval 3
scripts/review-recording.sh --recording path
Use this instead of writing one-off ffmpeg -ss commands. Read the output frames with the Read tool to visually inspect the recording.
Narrative Workflow Report
After frame extraction, scripts/generate-workflow-report.py generates a self-contained HTML report that combines:
- Per-test step screenshots (named screenshots from
screenshot() helper, embedded as base64)
- Video frames extracted at critical moments (before, midpoint, end, plus quarter-points for failures)
- Error messages for failed tests
- Full video embed with download link
- Clickable images that expand on click for detailed inspection
Usage:
python3 scripts/generate-workflow-report.py
python3 scripts/generate-workflow-report.py --open
scripts/run-emulator-tests.sh
Output: test-results/emulator/workflow-report.html -- a single ~4MB HTML file with all images embedded (no external dependencies, viewable offline).
This is the primary artifact for human review of exploratory workflow tests. Present it to the user after every emulator test run. The report answers: "What did the user see at each step of this workflow?"
Prerequisites: ffmpeg and jq must be installed. The emulator Playwright config must include the JSON reporter (already configured):
reporter: [
['list', { printSteps: true }],
['html', { open: 'never', outputFolder: 'playwright-report-emulator' }],
['json', { outputFile: 'test-results/emulator/report.json' }],
],
Testing Checklists
Vault / Credential Testing
Biometric / WebAuthn Testing
adb emu finger touch 1
PWA Install Testing
Touch / IME Testing
Useful ADB Commands
adb devices
adb shell getprop ro.build.version.release
adb reverse tcp:8081 tcp:8081
adb forward tcp:9222 localabstract:chrome_devtools_remote
curl -sf http://127.0.0.1:9222/json/version
adb emu finger touch 1
adb shell input text 'hello'
adb exec-out screencap -p > /tmp/screenshot.png
adb logcat -s chromium
adb shell am start -a android.intent.action.VIEW -d 'http://localhost:8081'
adb emu kill
Troubleshooting
Emulator won't start / KVM error: Add user to kvm group: sudo usermod -aG kvm $USER. Then sg kvm -c 'emulator -avd MobiSSH_Pixel7' or re-login.
CDP not reachable after Chrome launch: Run adb shell am set-debug-app --persistent com.android.chrome, force-stop Chrome, relaunch it, then re-forward: adb forward tcp:9222 localabstract:chrome_devtools_remote.
Port forwarding not working: adb reverse --list to verify. If empty, re-run adb reverse tcp:8081 tcp:8081.
Tests fail with "Target page, context or browser has been closed": CDP connection is being recreated per test. Use worker-scoped cdpBrowser fixture (see fixtures.js).
WebAuthn not prompting for fingerprint: Chrome 120+ required. Emulator needs fingerprint enrolled in Settings > Security > Fingerprint (use adb emu finger touch 1 during setup flow).
When to Use This vs Playwright
| Scenario | Tool |
|---|
| Logic, state, DOM structure | Playwright |
| CSS layout on specific viewports | Playwright |
| Chrome autofill / password manager | Emulator |
| WebAuthn / biometric | Emulator |
| PWA install + standalone mode | Emulator |
| Real touch/swipe/IME | Emulator |
| Service worker update flow | Emulator |
| Visual regression on notched devices | Emulator |
Rule of thumb: if the feature involves browser-native UI that Playwright's headless Chromium doesn't have (password manager, biometric prompt, install banner), use the emulator.