| name | evals-context |
| description | Provides context about the Roo Code evals system structure in this monorepo. Use when tasks mention "evals", "evaluation", "eval runs", "eval exercises", or working with the evals infrastructure. Helps distinguish between the evals execution system (packages/evals, apps/web-evals) and the public website evals display page (apps/web-roo-code/src/app/evals). |
Evals Codebase Context
When to Use This Skill
Use this skill when the task involves:
- Modifying or debugging the evals execution infrastructure
- Adding new eval exercises or languages
- Working with the evals web interface (apps/web-evals)
- Modifying the public evals display page on roocode.com
- Understanding where evals code lives in this monorepo
When NOT to Use This Skill
Do NOT use this skill when:
- Working on unrelated parts of the codebase (extension, webview-ui, etc.)
- The task is purely about the VS Code extension's core functionality
- Working on the main website pages that don't involve evals
Key Disambiguation: Two "Evals" Locations
This monorepo has two distinct evals-related locations that can cause confusion:
| Component | Path | Purpose |
|---|
| Evals Execution System | packages/evals/ | Core eval infrastructure: CLI, DB schema, Docker configs |
| Evals Management UI | apps/web-evals/ | Next.js app for creating/monitoring eval runs (localhost:3446) |
| Website Evals Page | apps/web-roo-code/src/app/evals/ | Public roocode.com page displaying eval results |
| External Exercises Repo | Roo-Code-Evals | Actual coding exercises (NOT in this monorepo) |
Directory Structure Reference
packages/evals/ - Core Evals Package
packages/evals/
āāā ARCHITECTURE.md # Detailed architecture documentation
āāā ADDING-EVALS.md # Guide for adding new exercises/languages
āāā README.md # Setup and running instructions
āāā docker-compose.yml # Container orchestration
āāā Dockerfile.runner # Runner container definition
āāā Dockerfile.web # Web app container
āāā drizzle.config.ts # Database ORM config
āāā src/
ā āāā index.ts # Package exports
ā āāā cli/ # CLI commands for running evals
ā ā āāā runEvals.ts # Orchestrates complete eval runs
ā ā āāā runTask.ts # Executes individual tasks in containers
ā ā āāā runUnitTest.ts # Validates task completion via tests
ā ā āāā redis.ts # Redis pub/sub integration
ā āāā db/
ā ā āāā schema.ts # Database schema (runs, tasks)
ā ā āāā queries/ # Database query functions
ā ā āāā migrations/ # SQL migrations
ā āāā exercises/
ā āāā index.ts # Exercise loading utilities
āāā scripts/
āāā setup.sh # Local macOS setup script
apps/web-evals/ - Evals Management Web App
apps/web-evals/
āāā src/
ā āāā app/
ā ā āāā page.tsx # Home page (runs list)
ā ā āāā runs/
ā ā ā āāā new/ # Create new eval run
ā ā ā āāā [id]/ # View specific run status
ā ā āāā api/runs/ # SSE streaming endpoint
ā āāā actions/ # Server actions
ā ā āāā runs.ts # Run CRUD operations
ā ā āāā tasks.ts # Task queries
ā ā āāā exercises.ts # Exercise listing
ā ā āāā heartbeat.ts # Controller health checks
ā āāā hooks/ # React hooks (SSE, models, etc.)
ā āāā lib/ # Utilities and schemas
apps/web-roo-code/src/app/evals/ - Public Website Evals Page
apps/web-roo-code/src/app/evals/
āāā page.tsx # Fetches and displays public eval results
āāā evals.tsx # Main evals display component
āāā plot.tsx # Visualization component
āāā types.ts # EvalRun type (extends packages/evals types)
This page displays eval results on the public roocode.com website. It imports types from @roo-code/evals but does NOT run evals.
Architecture Overview
The evals system is a distributed evaluation platform that runs AI coding tasks in isolated VS Code environments:
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā Web App (apps/web-evals) āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā
ā ā ā
ā ā¼ ā
ā PostgreSQL āāāāāāŗ Controller Container ā
ā ā ā ā
ā ā¼ ā¼ ā
ā Redis āāāāāŗ Runner Containers (1-25 parallel) ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
Key components:
- Controller: Orchestrates eval runs, spawns runners, manages task queue (p-queue)
- Runner: Isolated Docker container with VS Code + Roo Code extension + language runtimes
- Redis: Pub/sub for real-time events (NOT task queuing)
- PostgreSQL: Stores runs, tasks, metrics
Common Tasks Quick Reference
Adding a New Eval Exercise
- Add exercise to Roo-Code-Evals repo (external)
- See
packages/evals/ADDING-EVALS.md for structure
Modifying Eval CLI Behavior
Edit files in packages/evals/src/cli/:
Modifying the Evals Web Interface
Edit files in apps/web-evals/src/:
Modifying the Public Evals Display Page
Edit files in apps/web-roo-code/src/app/evals/:
Database Schema Changes
- Edit
packages/evals/src/db/schema.ts
- Generate migration:
cd packages/evals && pnpm drizzle-kit generate
- Apply migration:
pnpm drizzle-kit migrate
Running Evals Locally
pnpm evals
Ports (defaults):
- PostgreSQL: 5433
- Redis: 6380
- Web: 3446
Testing
cd packages/evals && npx vitest run
cd apps/web-evals && npx vitest run
Key Types/Exports from @roo-code/evals
The package exports are defined in packages/evals/src/index.ts:
- Database queries:
getRuns, getTasks, getTaskMetrics, etc.
- Schema types:
Run, Task, TaskMetrics
- Used by both
apps/web-evals and apps/web-roo-code