بنقرة واحدة
run-evals
// Run OpenWork UI evals on a Daytona sandbox or local Electron instance. Handles sandbox creation, service startup, and eval execution via CDP browser tools.
// Run OpenWork UI evals on a Daytona sandbox or local Electron instance. Handles sandbox creation, service startup, and eval execution via CDP browser tools.
| name | run-evals |
| description | Run OpenWork UI evals on a Daytona sandbox or local Electron instance. Handles sandbox creation, service startup, and eval execution via CDP browser tools. |
Run the OpenWork UI evaluation flows against a real Electron app. Prefer a fresh Daytona sandbox for each run, with a local test fallback when Daytona is unavailable.
daytona CLI installed and logged in (daytona login)daytona organization use "Different AI").devcontainer/ files exist in the repoopenwork-eval-secrets
populated once with bash .devcontainer/setup-daytona-secrets-volume.sh .newtokenUse the repo helper unless you need to debug a specific Daytona step manually:
daytona organization use "Different AI"
bash .devcontainer/test-on-daytona.sh <branch-or-commit>
The helper creates a fresh VNC-capable Daytona sandbox from the reusable
openwork-eval-vnc snapshot when present, falls back to the VNC Dockerfile when
needed, mounts the reusable openwork-eval-secrets:/daytona-secrets volume,
mounts the reusable openwork-eval-pnpm-store pnpm cache volume, starts
XFCE/noVNC, Vite, and Electron with Daytona-safe graphics flags, waits for CDP,
then prints the CDP and noVNC URLs.
Refresh the snapshot when dependencies or base setup change:
bash .devcontainer/create-daytona-openwork-snapshot.sh
The snapshot intentionally excludes node_modules to stay below Daytona's 20 GB
snapshot limit. Dependency installs reuse the pnpm store volume.
For OpenAI/provider eval coverage, create/populate the volume once before the first run:
bash .devcontainer/setup-daytona-secrets-volume.sh .newtoken
Do not print the key. Future eval sandboxes reuse the same volume.
Use the Electron CDP URL printed by test-on-daytona.sh with the browser tools:
browser_list({ browser_url: "<CDP_URL>" })
→ should show "OpenWork" page target
If browser_list fails, inspect /tmp/electron.log. The real CDP success
marker is Chromium's DevTools listening on ws://127.0.0.1:9825/..., not just
OpenWork's Electron CDP exposed line.
If the app shows the Welcome page, create a workspace:
Create directory on sandbox:
daytona exec "$SANDBOX" 'mkdir -p /workspace/hello'
Follow the workspace creation flow from evals/daytona-flows.md Flow 1:
{ key: "selectedFolder", value: "/workspace/hello" }Read the eval file from evals/ and execute each step using the browser tools.
For each step:
browser_evaluate / browser_click / browser_screenshot callClicking buttons:
browser_evaluate({ browser_url: URL, expression: "(function() { var btns = document.querySelectorAll('button'); for (var i = 0; i < btns.length; i++) { if (btns[i].textContent.indexOf('BUTTON_TEXT') !== -1) { btns[i].click(); return 'clicked'; } } return 'not found'; })()" })
Typing in Lexical editors:
browser_evaluate({ browser_url: URL, expression: "(function() { var e = document.querySelector('[contenteditable=true]'); e.focus(); document.execCommand('insertText', false, 'YOUR TEXT'); return 'typed'; })()" })
Injecting folder path (bypass native picker):
Use the __reactFiber$ → CreateWorkspaceModal reducer dispatch with { key: "selectedFolder", value: "/path" }. Full code in evals/daytona-flows.md Flow 1 Step 5.
Checking page state:
browser_evaluate({ browser_url: URL, expression: "document.body.innerText.substring(0, 500)" })
Screenshots:
browser_screenshot({ browser_url: URL })
Always include a local fallback in the result. Use it when Daytona is down, quota-limited, or the sandbox cannot expose CDP. At minimum, run the closest local verification commands and report that the Daytona path was unavailable.
pnpm install
pnpm --filter @openwork/app typecheck
pnpm --filter @openwork/app build
For UI flow verification, start the local app and attach browser tools to the local Electron CDP endpoint, then run the same eval steps from evals/.
pnpm dev
Report clearly whether the result came from Daytona or the local fallback.
daytona delete "$SANDBOX"
Test the real Electron app on Daytona: create sandbox, start services, connect via CDP, create workspaces, drive sessions, and verify settings. Use when the user says 'test on Daytona', 'run the app on Daytona', 'Daytona dry run', 'test Electron remotely', or 'reproduce on Daytona'.
Guide users through the get started setup and OpenWork Browser demo.
Workspace guide to introduce OpenWork and onboard new users.
Publish the openwork-orchestrator npm package with clean git hygiene. Triggers when user mentions: - "openwork-orchestrator npm publish" - "publish openwork-orchestrator" - "bump openwork-orchestrator"
Step through versioning, tagging, and verification
Debug OpenWork sidecars, config, and audit trail