with one click
run-evals
// Run OpenWork UI evals on a Daytona sandbox or local Electron instance. Handles sandbox creation, service startup, and eval execution via CDP browser tools.
// Run OpenWork UI evals on a Daytona sandbox or local Electron instance. Handles sandbox creation, service startup, and eval execution via CDP browser tools.
Guide users through browser automation setup using Chrome DevTools MCP only. Use when the user asks to set up browser automation, Chrome DevTools MCP, browser MCP, or runs the browser-setup command.
Manages Cargo.lock file updates and resolves --locked flag issues in CI/CD. Triggers when user mentions: - "cargo test --locked failed" - "cannot update the lock file" - "Cargo.lock is out of date" - "PR failed with --locked error" - "fix Cargo.lock"
Publish the openwork-orchestrator npm package with clean git hygiene. Triggers when user mentions: - "openwork-orchestrator npm publish" - "publish openwork-orchestrator" - "bump openwork-orchestrator"
SolidJS reactivity + UI state patterns for OpenWork
Debug OpenWork sidecars, config, and audit trail
Core context and guardrails for OpenWork native app
| name | run-evals |
| description | Run OpenWork UI evals on a Daytona sandbox or local Electron instance. Handles sandbox creation, service startup, and eval execution via CDP browser tools. |
Run the OpenWork UI evaluation flows against a real Electron app ā either on a Daytona cloud sandbox or a local instance.
daytona CLI installed and logged in (daytona login)daytona organization use "Different AI").devcontainer/ files exist in the repoCheck for existing sandbox first:
daytona list
If no openwork-test sandbox, create one:
daytona create \
--name openwork-test \
--dockerfile .devcontainer/Dockerfile \
--context .devcontainer/Dockerfile \
--context .devcontainer/start-display.sh \
--context .devcontainer/start-services.sh \
--class large \
--memory 8 \
--auto-stop 60 \
--public \
--target us
daytona exec openwork-test 'bash /workspace/.devcontainer/start-services.sh'
Wait for it to start (this runs in background, may timeout ā that's OK).
# Get CDP URL
daytona preview-url openwork-test -p 9825
Then use the browser tools to verify:
browser_list({ browser_url: "<CDP_URL>" })
ā should show "OpenWork" page target
If the app shows the Welcome page, create a workspace:
Create directory on sandbox:
daytona exec openwork-test 'mkdir -p /workspace/hello'
Follow the workspace creation flow from evals/daytona-flows.md Flow 1:
{ key: "selectedFolder", value: "/workspace/hello" }Read the eval file from evals/ and execute each step using the browser tools.
For each step:
browser_evaluate / browser_click / browser_screenshot callClicking buttons:
browser_evaluate({ browser_url: URL, expression: "(function() { var btns = document.querySelectorAll('button'); for (var i = 0; i < btns.length; i++) { if (btns[i].textContent.indexOf('BUTTON_TEXT') !== -1) { btns[i].click(); return 'clicked'; } } return 'not found'; })()" })
Typing in Lexical editors:
browser_evaluate({ browser_url: URL, expression: "(function() { var e = document.querySelector('[contenteditable=true]'); e.focus(); document.execCommand('insertText', false, 'YOUR TEXT'); return 'typed'; })()" })
Injecting folder path (bypass native picker):
Use the __reactFiber$ ā CreateWorkspaceModal reducer dispatch with { key: "selectedFolder", value: "/path" }. Full code in evals/daytona-flows.md Flow 1 Step 5.
Checking page state:
browser_evaluate({ browser_url: URL, expression: "document.body.innerText.substring(0, 500)" })
Screenshots:
browser_screenshot({ browser_url: URL })
daytona stop openwork-test # preserves state for re-runs
daytona delete openwork-test # full cleanup