Run any Skill in Manus with one click

$pwd:

generate-verifiers-env

Name: Generate Verifiers Env
Author: adithya-s-k

// Builds a Verifiers (PrimeIntellect) variant of an RL environment. Use whenever someone asks to scaffold a Verifiers env, port to Verifiers, build an in-process toolkit, set up a `vf.ToolEnv` with a Rubric, or wire up a TRL `GRPOTrainer` rollout. Verifiers is the right framework when the user wants in-process tools (no HTTP server), structured tool calling driven by plain Python functions, composable reward rubrics with multiple grader functions, fast iteration with no Docker, or the cleanest path from prototype to TRL training. Output is a runnable `<env_dir>/verifiers/` folder with `env.py` (toolkit + standalone tool functions + `create_verifiers_env`), `rollout.py`, and `pyproject.toml`. Use for prompts like "make a verifiers env for X", "wrap my game in verifiers", or "set up a vf.ToolEnv".

Run Skill in Manus

$ git log --oneline --stat

stars:136

forks:15

updated:May 6, 2026 at 10:21

File Explorer

2 files

SKILL.md

readonly

related-skills.json

same repository

rl-env-from-description.md

from "adithya-s-k/RL_Envs_101"

Turns a user's plain-English description of an RL training environment into runnable code across the four target frameworks — OpenEnv, OpenReward (ORS), Verifiers, and NeMo Gym. Use whenever someone describes an environment they want to build ("I want to train an agent that does X", "make an env where the model has to Y"), asks to scaffold a new env, asks to port an existing env to one of these frameworks, or asks how to design tools/rewards/state for a new env. Use even when the user does not explicitly say "RL environment" — descriptions like "agent that browses the web", "tool-calling agent for SQL", or "game-playing agent" all qualify. Drives the full flow — clarifying interview, env-name selection, shared-domain extraction, per-framework implementation, and rollout-based smoke tests.

2026-05-06136

generate-nemo-gym-env.md

from "adithya-s-k/RL_Envs_101"

Builds a NeMo Gym (NVIDIA) variant of an RL environment. Use whenever someone asks to scaffold a NeMo Gym Resources Server, port an existing env to NeMo Gym, expose tools as `app.post()` endpoints with cookie-based sessions, add a post-episode `/verify` reward grader, or deploy a NeMo Gym env to HF Spaces. NeMo Gym is the right framework when the user wants HTTP+REST with cookie session handling, raw `requests`-driven rollouts (no SDK client), Ray-based orchestration, or NVIDIA NeMo / TRL training integration with a `responses_create_params` + `ground_truth` dataset format. Output is a runnable `<env_dir>/nemo_gym/` folder with `server.py`, `pyproject.toml`, `Dockerfile`, `configs/<env>.yaml`, and `rollout.py`. Use for prompts like "wrap my env in NeMo Gym", "make a NeMo resources server for X", or "add a post-episode grader to my env".

2026-05-06136

generate-openenv-env.md

from "adithya-s-k/RL_Envs_101"

Builds an OpenEnv (Meta) variant of an RL environment. Use whenever someone asks to scaffold an OpenEnv server, port an existing env to OpenEnv, add MCP tools to an env, or deploy an OpenEnv to HF Spaces. OpenEnv is the right framework when the user wants HTTP+MCP, structured tool calls discovered via `list_tools()`, an optional Gradio UI, sandbox-backed sessions, or deployment as a Docker container / HF Space. Output is a runnable `<env_dir>/openenv/` folder with `server/app.py`, `server/<env>_environment.py`, `pyproject.toml`, `Dockerfile`, and `rollout.py`. Use for prompts like "wrap my game in OpenEnv", "make an MCP env for X", or "add the openenv variant".

2026-05-06136

generate-ors-env.md

from "adithya-s-k/RL_Envs_101"

Builds an Open Reward Standard (ORS) variant of an RL environment using the official `openreward` Python package. Use whenever someone asks to scaffold an ORS env, port to OpenReward, add per-tool-call rewards, deploy to OpenReward.ai, or wrap an existing env in the ORS protocol. ORS is the right framework when the user wants HTTP+REST+SSE, rewards arriving inline with each tool call (not post-episode), task-spec-driven sessions, splits (train/val/test), or deployment to OpenReward.ai or HF Spaces. Output is a runnable `<env_dir>/ors/` folder with `server.py`, `tasks.py`, `pyproject.toml`, `Dockerfile.spaces`, and `rollout.py`. Use for prompts like "wrap my env in ORS", "make an OpenReward env for X", or "add per-call reward to my env".

2026-05-06136

package.json

"author": "adithya-s-k"

"repository": "adithya-s-k/RL_Envs_101"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Data ScientistsComputer and Mathematical Occupations15-2051L4

name

generate-verifiers-env

description

Builds a Verifiers (PrimeIntellect) variant of an RL environment. Use whenever someone asks to scaffold a Verifiers env, port to Verifiers, build an in-process toolkit, set up a `vf.ToolEnv` with a Rubric, or wire up a TRL `GRPOTrainer` rollout. Verifiers is the right framework when the user wants in-process tools (no HTTP server), structured tool calling driven by plain Python functions, composable reward rubrics with multiple grader functions, fast iteration with no Docker, or the cleanest path from prototype to TRL training. Output is a runnable `<env_dir>/verifiers/` folder with `env.py` (toolkit + standalone tool functions + `create_verifiers_env`), `rollout.py`, and `pyproject.toml`. Use for prompts like "make a verifiers env for X", "wrap my game in verifiers", or "set up a vf.ToolEnv".

generate-verifiers-env

Build the Verifiers variant of an env. Verifiers is in-process — no HTTP server, no Docker, no HF Space. The trainer (or a manual rollout) imports tool functions directly from env.py.

Concept

PrimeIntellect Verifiers is a Python library — not a server framework. It provides vf.ToolEnv (multi-turn rollout), vf.Rubric (composable async graders), and adapters into TRL GRPOTrainer. The trainer or rollout owns the LLM client; the env owns the tools and the grader.

When the user has a shared domain module (<domain>.py) and wants a Verifiers variant, wrap it as a toolkit class plus standalone tool functions. Don't duplicate domain logic.

Archetypes

Archetype	Hallmarks
Pure-Python game	One `@tool`-style function, terminal reward via rubric checking the trajectory.
Stateful sandbox in-process	Toolkit owns the sandbox (E2B, browser); `initialize()` is lazy; `cleanup()` is mandatory in `finally`.
Vision env	Drive the toolkit manually (skip `vf.ToolEnv` since vision content blocks aren't first-class in verifiers' rollout). Send the screenshot in the user message each turn.

Two consumption paths (always provide both)

Path A — `DesktopToolkit`-style class (used by TRL adapter + manual rollout)

class WordleToolkit:
    def __init__(self): ...
    def initialize(self): ...     # lazy E2B / state init
    def cleanup(self): ...        # kill sandbox
    def reset(self): ...          # new episode
    def guess(self, word: str) -> str:
        """Submit a 5-letter word guess. Returns colored feedback."""
        ...

Public methods are introspected as tools by the TRL adapter. Docstrings become tool descriptions.

Path B — `vf.ToolEnv` for native verifiers `env.evaluate(client, model)`

def create_verifiers_env():
    import verifiers as vf
    from datasets import Dataset
    dataset = Dataset.from_list([{"question": t["task"], "answer": t["expected_output"]} for t in TASKS])
    async def correctness(completion, answer, **kwargs) -> float:
        # read from the completion trajectory; return 0.0–1.0
        ...
    rubric = vf.Rubric(funcs=[correctness])
    return vf.ToolEnv(tools=TOOL_FUNCTIONS, max_turns=8, dataset=dataset, rubric=rubric, system_prompt="...")

TOOL_FUNCTIONS is a list of plain Python functions (not bound methods). They can share state via a module-level toolkit instance.

Recommended file layout

The user picks the actual paths. The canonical shape:

<env_dir>/verifiers/
├── pyproject.toml      # verifiers + e2b-* + datasets + python-dotenv + openai
├── __init__.py
├── env.py              # Toolkit class + standalone tool fns + create_verifiers_env()
├── rollout.py          # Drives the toolkit manually with the openai client
└── README.md

Implementation order

1. The toolkit class

__init__ takes config (api_key="", app="firefox", etc.). Don't create the sandbox here — too eager.
initialize() is the lazy creation hook. Always call it from each tool method.
cleanup() kills the sandbox. Always call it from finally in the rollout.
reset() calls cleanup() + reinitializes. Used between episodes by the TRL adapter.
Each tool method:
- takes typed args (used for OpenAI tool-schema generation via inspect)
- has a docstring (becomes the tool description — first paragraph only)
- calls self.initialize() first, mutates state, returns a string

2. Standalone tool functions for `vf.ToolEnv`

Module-level shared toolkit, plus thin wrappers:

_shared: Optional[WordleToolkit] = None
def _kit():
    global _shared
    if _shared is None:
        _shared = WordleToolkit()
    return _shared

def guess(word: str) -> str:
    """Submit a 5-letter word guess."""
    return _kit().guess(word)

TOOL_FUNCTIONS = [guess]

Why both? The TRL adapter wants the toolkit class (per-rollout instance, isolated state). vf.ToolEnv wants free functions. Don't pick one — provide both.

3. The rubric

Rubrics are composable graders. Each grader is async def func(completion, answer, **kwargs) -> float. Combine multiple in a vf.Rubric(funcs=[...]) and they're averaged (or weighted, see verifiers docs).

For a single-criterion env, one grader suffices:

async def correctness(completion, answer, **kwargs) -> float:
    if not completion: return 0.0
    last = completion[-1].get("content", "") if isinstance(completion[-1], dict) else str(completion[-1])
    return 1.0 if answer.strip() in last.strip() else 0.0

For multi-criterion (e.g. computer-use envs that need both terminate(success) AND a state check):

async def correctness(completion, answer, **kwargs) -> float:
    seen_success = any("terminated: success" in str(m) for m in completion)
    seen_expected = any(answer in str(m) for m in completion)
    return 1.0 if (seen_success and seen_expected) else (0.5 if seen_success else 0.0)

4. Rollout — `rollout.py`

Build OpenAI tool schemas from the function signatures + docstrings via inspect:

def func_to_openai_tool(fn):
    sig = inspect.signature(fn)
    hints = get_type_hints(fn)
    doc = (fn.__doc__ or "").strip().split("\n\n")[0]
    properties, required = {}, []
    for name, p in sig.parameters.items():
        ann = hints.get(name, str)
        origin = get_origin(ann)
        if origin in (list, "list"):
            inner = get_args(ann)
            properties[name] = {"type": "array", "items": {"type": "integer" if (inner and inner[0] is int) else "string"}}
        elif ann is int:    properties[name] = {"type": "integer"}
        elif ann is float:  properties[name] = {"type": "number"}
        elif ann is bool:   properties[name] = {"type": "boolean"}
        else:               properties[name] = {"type": "string"}
        if p.default is inspect.Parameter.empty:
            required.append(name)
    return {"type": "function", "function": {
        "name": fn.__name__, "description": doc,
        "parameters": {"type": "object", "properties": properties, "required": required},
    }}

This pattern works for any toolkit. Use it as the standard adapter from Python signatures to OpenAI tool schemas.

For multimodal envs, drive the toolkit manually (don't use vf.ToolEnv since vision-content blocks aren't first-class in verifiers' rollout). Send the latest screenshot in the user message every turn:

text, b64 = kit._ctrl.screenshot()      # if you exposed _ctrl
messages.append({"role": "user", "content": [
    {"type": "text", "text": "Latest screenshot:"},
    {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{b64}"}},
]})

Validation gates

Toolkit imports cleanly — uv run python -c "from env import DesktopToolkit, TOOL_FUNCTIONS"
vf.ToolEnv builds — uv run python -c "from env import create_verifiers_env; env = create_verifiers_env(); print(env)"
Manual rollout — MAX_TURNS=3 uv run python rollout.py runs end-to-end. Hits a real backend (E2B or whatever the env uses).

Gotchas

ModuleNotFoundError: attrs — e2b-desktop transitively needs attrs but doesn't pin it. Add attrs>=23.0 to dependencies.
TypedDict vs dataclass for verifiers data structures — most are TypedDicts. Access by key, not attribute. (Same trap exists in skyrl-gym; we hit it during the desktop_env port.)
Tool-schema **kwargs is forbidden — vLLM (used by some trainers) can't introspect **kwargs for JSON schema generation. Define explicit params, even if empty.
Don't return huge strings — verifiers passes the result through to the model verbatim. A 100KB log dump will blow your context. Truncate / summarize in the tool method.

Reference

references/architecture.md — vf.ToolEnv internals + Rubric composition + TRL adapter shape

Official documentation

PrimeIntellect-ai/verifiers — source repo
Prime Intellect Verifiers docs
verifiers/docs/environments.md — ToolEnv / StatefulToolEnv / MultiTurnEnv reference
verifiers on PyPI — latest is 0.1.9+ (Jan 2026)

generate-verifiers-env

More from this repository

More from this repository

generate-verifiers-env

Concept

Archetypes

Two consumption paths (always provide both)

Path A — DesktopToolkit-style class (used by TRL adapter + manual rollout)

Path B — vf.ToolEnv for native verifiers env.evaluate(client, model)

Recommended file layout

Implementation order

1. The toolkit class

2. Standalone tool functions for vf.ToolEnv

3. The rubric

4. Rollout — rollout.py

Validation gates

Gotchas

Reference

Official documentation

generate-verifiers-env

Concept

Archetypes

Two consumption paths (always provide both)

Path A — DesktopToolkit-style class (used by TRL adapter + manual rollout)

Path B — vf.ToolEnv for native verifiers env.evaluate(client, model)

Recommended file layout

Implementation order

1. The toolkit class

2. Standalone tool functions for vf.ToolEnv

3. The rubric

4. Rollout — rollout.py

Validation gates

Gotchas

Reference

Official documentation

Path A — `DesktopToolkit`-style class (used by TRL adapter + manual rollout)

Path B — `vf.ToolEnv` for native verifiers `env.evaluate(client, model)`

2. Standalone tool functions for `vf.ToolEnv`

4. Rollout — `rollout.py`

Path A — `DesktopToolkit`-style class (used by TRL adapter + manual rollout)

Path B — `vf.ToolEnv` for native verifiers `env.evaluate(client, model)`

2. Standalone tool functions for `vf.ToolEnv`

4. Rollout — `rollout.py`