원클릭으로
daytona-flow-validator
Daytona UI flow validation loop. Use when validating real app behavior, checking a Daytona flow, proving a bug is fixed, or deciding pass/fail from CDP snapshots, screenshots, and assertions.
메뉴
Daytona UI flow validation loop. Use when validating real app behavior, checking a Daytona flow, proving a bug is fixed, or deciding pass/fail from CDP snapshots, screenshots, and assertions.
Daytona cloud instance, Den server, OpenWork Cloud, Marketplace onboarding. Use when the user asks to run, launch, start, validate, or record a Daytona cloud/Den instance for OpenWork Cloud flows.
Daytona development environment overview. Use when the user asks about Daytona setup, Daytona toolbox, dev environment, noVNC, CDP, server sandbox, secrets volume, Electron sandbox, standalone Chrome, validation, or artifacts volume.
Daytona seeded cloud demo, demo credentials, Acme Robotics seed. Use when the user asks to spin up, keep running, seed, or prepare an OpenWork Cloud/Den Daytona demo instance.
Run OpenWork UI evals on a Daytona sandbox or local Electron instance. Handles sandbox creation, service startup, and eval execution via CDP browser tools.
Daytona Electron sandbox testing with CDP/noVNC. Use when the user says test on Daytona, run Electron on Daytona, Daytona dry run, test Electron remotely, reproduce on Daytona, or validate a real desktop flow.
Daytona recording volume, screenshots, artifacts, and validation evidence. Use when the user says record Daytona, recording volume, artifacts volume, screenshots, proof, PR evidence, before/after video, or validate behavior visually.
| name | daytona-flow-validator |
| description | Daytona UI flow validation loop. Use when validating real app behavior, checking a Daytona flow, proving a bug is fixed, or deciding pass/fail from CDP snapshots, screenshots, and assertions. |
Use this skill to decide whether a Daytona Electron or browser flow actually works. Launching the sandbox is separate. This skill owns the feedback loop.
Never report success from a click, script return value, or recording alone. Validate the same path a human would take, using CDP to drive Chrome or Electron instead of replacing the journey with hidden state changes. Every meaningful action must follow this loop:
browser_snapshot first.browser_click or browser_fill against snapshot UIDs whenever possible.browser_snapshot.If any assertion is missing, the flow is not validated yet.
Use browser_eval, direct API calls, localStorage writes, filesystem edits, or
database changes only when a human-visible path is impossible, unavailable in the
current product, or needed as setup for data that the UI cannot create yet. When
you use one of these shortcuts, say so in the report and do not let it replace the
visible click-by-click demo claim.
For founder, designer, PR, or eval evidence, record the journey as a reviewer would experience it:
For a UI flow, collect all of these when feasible:
browser_list shows the intended target.navigator.userAgent contains Electron/ for desktop flows, or does not for standalone Chrome flows.daytona exec process/log/health check for sidecars, Den, worker proxy, or mock servers.daytona-recording-artifacts for how to produce the index.Frame proof is the default. Video is the exception for interactions that need motion. When the user says "test this on Daytona" and UI is involved, always produce frame-by-frame HTML proof unless the user explicitly asks for video.
Use this structure for each step in the final report:
Step: <what was attempted>
Before: <snapshot/eval showed X>
Action: <tool/selector used>
After: <snapshot/eval showed Y>
Assertion: pass/fail because <observable signal>
Evidence: <screenshot path or artifact URL if captured>
Start with browser_snapshot for normal UI controls because it gives stable
UIDs for browser_click and browser_fill. Treat browser_eval as an escape
hatch, not the default interaction mechanism. Use browser_eval when:
Even when browser_eval is necessary, keep the user-visible state coherent:
observe before, perform the minimal hidden action, then observe the visible result
with browser_snapshot or a screenshot.
Prefer synthetic paste for the OpenWork composer:
(function pasteComposer(text) {
var editor = document.querySelector('[contenteditable="true"][data-lexical-editor="true"]');
if (!editor) return { ok: false, reason: 'composer not found' };
editor.focus();
var data = new DataTransfer();
data.setData('text/plain', text);
editor.dispatchEvent(new ClipboardEvent('paste', { bubbles: true, cancelable: true, clipboardData: data }));
return { ok: true, text: editor.innerText };
})('Reply with exactly: Daytona validation OK')
Then assert the Run button is enabled before clicking it.
Use CDP for renderer UI first. When the flow opens native Linux UI that CDP cannot control, such as GTK file pickers, OS permission dialogs, XFCE windows, or Electron-native dialogs, switch to desktop automation inside the sandbox.
Check/install the tools:
daytona exec "$SANDBOX" -- "bash -lc 'if ! command -v xdotool >/dev/null 2>&1; then sudo apt-get update && sudo apt-get install -y xdotool wmctrl; fi'"
Inspect the real desktop window state before acting:
daytona exec "$SANDBOX" -- "bash -lc 'DISPLAY=:99 wmctrl -l; DISPLAY=:99 xdotool getactivewindow getwindowname 2>/dev/null || true'"
Native file picker pattern:
daytona exec "$SANDBOX" -- "bash -lc 'DISPLAY=:99 xdotool search --name \"Authorize folder\" windowactivate; DISPLAY=:99 xdotool mousemove 760 151 click 1 key ctrl+a type --delay 1 -- \"/workspace/hello\" key Return; sleep 1; DISPLAY=:99 xdotool mousemove 1465 927 click 1'"
Rules for native desktop automation:
wmctrl -l before and after to prove the expected native window opened or
closed.Authorize folder over coordinates when possible.Escape before capturing evidence.Close stale native dialogs before recording or screenshots:
daytona exec "$SANDBOX" -- 'bash -lc '\''DISPLAY=:99 xdotool search --name "Authorize folder" windowclose %@ 2>/dev/null || true; sleep 1; DISPLAY=:99 wmctrl -l'\'''
Use browser screenshots for renderer state:
browser_screenshot({ browser_url: CDP_URL, target_id: TARGET_ID })
Use Daytona display screenshots for noVNC/window state:
daytona exec "$SANDBOX" -- 'bash .devcontainer/capture-daytona-screenshot.sh'
Do not treat screenshots as the only assertion. Inspect text/state with CDP too.
Before publishing, commenting, or reporting a screenshot URL, open the saved
image and visually verify it matches the claim. Use webfetch on the artifact
URL, Read on the local PNG path, or another image-capable viewer. A screenshot
is not valid evidence until the image itself has been inspected.
For every screenshot, assert these visual checks:
If any check fails, mark the evidence as failed, fix the visible state, capture a new screenshot, inspect the new image, and only then share it. If bad evidence was already posted, post a superseding correction that clearly says the earlier screenshot was invalid.
Before every Daytona display screenshot, run a native-window check and fail fast if a picker is present:
daytona exec "$SANDBOX" -- 'bash -lc '\''DISPLAY=:99 wmctrl -l | tee /tmp/windows-before-shot.txt; ! grep -q "Authorize folder" /tmp/windows-before-shot.txt; DISPLAY=:99 .devcontainer/capture-daytona-screenshot.sh --output /daytona-artifacts/screenshots/<flow>/<step>.png'\'''
If a Chromium or Electron window is intentionally part of the shot, activate the
right window first with wmctrl -a "OpenWork - Dev" or
wmctrl -a "OpenWork Cloud - Chromium" so the screenshot shows the intended
journey step.
When a step fails:
browser_snapshot or document.body.innerText.Common logs:
daytona exec "$SANDBOX" -- 'tail -120 /tmp/electron.log'
daytona exec "$SANDBOX" -- 'tail -120 /tmp/vite.log'
daytona exec "$SERVER_SANDBOX" -- 'tail -120 /tmp/den-api.log'
For Den Web flows specifically:
Checking account, Loading your workspace, or
Checking workspace access, verify whether the client hydrated by trying a
real browser_click/browser_fill, not only browser_eval.next build +
next start before declaring the app broken.Use one of these verdicts:
Passed: every expected outcome has an observable assertion and frame-by-frame proof is published.Failed: at least one assertion disproves the expected outcome.Incomplete: the sandbox/tooling failed, evidence is missing, or only a recording/screenshot was collected without frame proof.