Run any Skill in Manus with one click

$pwd:

run-evals

Name: Run Evals
Author: jasonkneen

// Run OpenWork UI evals on a Daytona sandbox or local Electron instance. Handles sandbox creation, service startup, and eval execution via CDP browser tools.

Run Skill in Manus

$ git log --oneline --stat

stars:2

forks:0

updated:May 13, 2026 at 22:26

SKILL.md

readonly

name	run-evals
description	Run OpenWork UI evals on a Daytona sandbox or local Electron instance. Handles sandbox creation, service startup, and eval execution via CDP browser tools.

Skill: Run Evals

Run the OpenWork UI evaluation flows against a real Electron app — either on a Daytona cloud sandbox or a local instance.

When to use

User says "run evals on Daytona" or "run this flow on Daytona"
User wants to verify a UI change end-to-end
User wants to test the onboarding, session, or settings flows

Prerequisites

daytona CLI installed and logged in (daytona login)
Using the "Different AI" org (daytona organization use "Different AI")
The .devcontainer/ files exist in the repo

Workflow

Step 1: Create sandbox (if not running)

Check for existing sandbox first:

daytona list

If no openwork-test sandbox, create one:

daytona create \
  --name openwork-test \
  --dockerfile .devcontainer/Dockerfile \
  --context .devcontainer/Dockerfile \
  --context .devcontainer/start-display.sh \
  --context .devcontainer/start-services.sh \
  --class large \
  --memory 8 \
  --auto-stop 60 \
  --public \
  --target us

Step 2: Start services

daytona exec openwork-test 'bash /workspace/.devcontainer/start-services.sh'

Wait for it to start (this runs in background, may timeout — that's OK).

Step 3: Verify

# Get CDP URL
daytona preview-url openwork-test -p 9825

Then use the browser tools to verify:

browser_list({ browser_url: "<CDP_URL>" })
→ should show "OpenWork" page target

Step 4: Create a workspace (if on welcome page)

If the app shows the Welcome page, create a workspace:

Create directory on sandbox:

daytona exec openwork-test 'mkdir -p /workspace/hello'

Follow the workspace creation flow from evals/daytona-flows.md Flow 1:
- Click "Get started" → "Local workspace"
- Inject path via React fiber dispatch: { key: "selectedFolder", value: "/workspace/hello" }
- Click "Create Workspace"
- Wait 10s for opencode sidecar to boot

Step 5: Run the requested eval

Read the eval file from evals/ and execute each step using the browser tools.

For each step:

Execute the browser_evaluate / browser_click / browser_screenshot call
Verify the expected outcome
Report pass/fail

Key techniques

Clicking buttons:

browser_evaluate({ browser_url: URL, expression: "(function() { var btns = document.querySelectorAll('button'); for (var i = 0; i < btns.length; i++) { if (btns[i].textContent.indexOf('BUTTON_TEXT') !== -1) { btns[i].click(); return 'clicked'; } } return 'not found'; })()" })

Typing in Lexical editors:

browser_evaluate({ browser_url: URL, expression: "(function() { var e = document.querySelector('[contenteditable=true]'); e.focus(); document.execCommand('insertText', false, 'YOUR TEXT'); return 'typed'; })()" })

Injecting folder path (bypass native picker): Use the __reactFiber$ → CreateWorkspaceModal reducer dispatch with { key: "selectedFolder", value: "/path" }. Full code in evals/daytona-flows.md Flow 1 Step 5.

Checking page state:

browser_evaluate({ browser_url: URL, expression: "document.body.innerText.substring(0, 500)" })

Screenshots:

browser_screenshot({ browser_url: URL })

Teardown

daytona stop openwork-test    # preserves state for re-runs
daytona delete openwork-test  # full cleanup

related-skills.json

same repository

browser-setup-devtools.md

from "jasonkneen/openwork"

Guide users through browser automation setup using Chrome DevTools MCP only. Use when the user asks to set up browser automation, Chrome DevTools MCP, browser MCP, or runs the browser-setup command.

2026-04-222

cargo-lock-manager.md

from "jasonkneen/openwork"

Manages Cargo.lock file updates and resolves --locked flag issues in CI/CD. Triggers when user mentions: - "cargo test --locked failed" - "cannot update the lock file" - "Cargo.lock is out of date" - "PR failed with --locked error" - "fix Cargo.lock"

2026-03-192

openwork-orchestrator-npm-publish.md

from "jasonkneen/openwork"

Publish the openwork-orchestrator npm package with clean git hygiene. Triggers when user mentions: - "openwork-orchestrator npm publish" - "publish openwork-orchestrator" - "bump openwork-orchestrator"

2026-03-192

solidjs-patterns.md

from "jasonkneen/openwork"

SolidJS reactivity + UI state patterns for OpenWork

2026-02-022

openwork-debug.md

from "jasonkneen/openwork"

Debug OpenWork sidecars, config, and audit trail

2026-01-292

openwork-core.md

from "jasonkneen/openwork"

Core context and guardrails for OpenWork native app

2026-01-272

package.json

"author": "jasonkneen"

"repository": "jasonkneen/openwork"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Data ScientistsComputer and Mathematical Occupations15-2051L4

name	run-evals
description	Run OpenWork UI evals on a Daytona sandbox or local Electron instance. Handles sandbox creation, service startup, and eval execution via CDP browser tools.

Skill: Run Evals

Run the OpenWork UI evaluation flows against a real Electron app — either on a Daytona cloud sandbox or a local instance.

When to use

User says "run evals on Daytona" or "run this flow on Daytona"
User wants to verify a UI change end-to-end
User wants to test the onboarding, session, or settings flows

Prerequisites

daytona CLI installed and logged in (daytona login)
Using the "Different AI" org (daytona organization use "Different AI")
The .devcontainer/ files exist in the repo

Workflow

Step 1: Create sandbox (if not running)

Check for existing sandbox first:

daytona list

If no openwork-test sandbox, create one:

daytona create \
  --name openwork-test \
  --dockerfile .devcontainer/Dockerfile \
  --context .devcontainer/Dockerfile \
  --context .devcontainer/start-display.sh \
  --context .devcontainer/start-services.sh \
  --class large \
  --memory 8 \
  --auto-stop 60 \
  --public \
  --target us

Step 2: Start services

daytona exec openwork-test 'bash /workspace/.devcontainer/start-services.sh'

Wait for it to start (this runs in background, may timeout — that's OK).

Step 3: Verify

# Get CDP URL
daytona preview-url openwork-test -p 9825

Then use the browser tools to verify:

browser_list({ browser_url: "<CDP_URL>" })
→ should show "OpenWork" page target

Step 4: Create a workspace (if on welcome page)

If the app shows the Welcome page, create a workspace:

Create directory on sandbox:

daytona exec openwork-test 'mkdir -p /workspace/hello'

Follow the workspace creation flow from evals/daytona-flows.md Flow 1:
- Click "Get started" → "Local workspace"
- Inject path via React fiber dispatch: { key: "selectedFolder", value: "/workspace/hello" }
- Click "Create Workspace"
- Wait 10s for opencode sidecar to boot

Step 5: Run the requested eval

Read the eval file from evals/ and execute each step using the browser tools.

For each step:

Execute the browser_evaluate / browser_click / browser_screenshot call
Verify the expected outcome
Report pass/fail

Key techniques

Clicking buttons:

browser_evaluate({ browser_url: URL, expression: "(function() { var btns = document.querySelectorAll('button'); for (var i = 0; i < btns.length; i++) { if (btns[i].textContent.indexOf('BUTTON_TEXT') !== -1) { btns[i].click(); return 'clicked'; } } return 'not found'; })()" })

Typing in Lexical editors:

browser_evaluate({ browser_url: URL, expression: "(function() { var e = document.querySelector('[contenteditable=true]'); e.focus(); document.execCommand('insertText', false, 'YOUR TEXT'); return 'typed'; })()" })

Checking page state:

browser_evaluate({ browser_url: URL, expression: "document.body.innerText.substring(0, 500)" })

Screenshots:

browser_screenshot({ browser_url: URL })

Teardown

daytona stop openwork-test    # preserves state for re-runs
daytona delete openwork-test  # full cleanup

run-evals

Skill: Run Evals

When to use

Prerequisites

Workflow

Step 1: Create sandbox (if not running)

Step 2: Start services

Step 3: Verify

Step 4: Create a workspace (if on welcome page)

Step 5: Run the requested eval

Key techniques

Teardown

More from this repository

More from this repository

Skill: Run Evals

When to use

Prerequisites

Workflow

Step 1: Create sandbox (if not running)

Step 2: Start services

Step 3: Verify

Step 4: Create a workspace (if on welcome page)

Step 5: Run the requested eval

Key techniques

Teardown