Execute qualquer Skill no Manus
com um clique

Execute qualquer Skill no Manus com um clique

$pwd:

generate-ors-env

Name: Generate Ors Env
Author: adithya-s-k

// Builds an Open Reward Standard (ORS) variant of an RL environment using the official `openreward` Python package. Use whenever someone asks to scaffold an ORS env, port to OpenReward, add per-tool-call rewards, deploy to OpenReward.ai, or wrap an existing env in the ORS protocol. ORS is the right framework when the user wants HTTP+REST+SSE, rewards arriving inline with each tool call (not post-episode), task-spec-driven sessions, splits (train/val/test), or deployment to OpenReward.ai or HF Spaces. Output is a runnable `<env_dir>/ors/` folder with `server.py`, `tasks.py`, `pyproject.toml`, `Dockerfile.spaces`, and `rollout.py`. Use for prompts like "wrap my env in ORS", "make an OpenReward env for X", or "add per-call reward to my env".

Executar no Manus

$ git log --oneline --stat

stars:136

forks:15

updated:6 de maio de 2026 às 10:21

Explorador de arquivos

2 arquivos

SKILL.md

readonly

related-skills.json

mesmo repositório

rl-env-from-description.md

from "adithya-s-k/RL_Envs_101"

Turns a user's plain-English description of an RL training environment into runnable code across the four target frameworks — OpenEnv, OpenReward (ORS), Verifiers, and NeMo Gym. Use whenever someone describes an environment they want to build ("I want to train an agent that does X", "make an env where the model has to Y"), asks to scaffold a new env, asks to port an existing env to one of these frameworks, or asks how to design tools/rewards/state for a new env. Use even when the user does not explicitly say "RL environment" — descriptions like "agent that browses the web", "tool-calling agent for SQL", or "game-playing agent" all qualify. Drives the full flow — clarifying interview, env-name selection, shared-domain extraction, per-framework implementation, and rollout-based smoke tests.

2026-05-06136

generate-nemo-gym-env.md

from "adithya-s-k/RL_Envs_101"

Builds a NeMo Gym (NVIDIA) variant of an RL environment. Use whenever someone asks to scaffold a NeMo Gym Resources Server, port an existing env to NeMo Gym, expose tools as `app.post()` endpoints with cookie-based sessions, add a post-episode `/verify` reward grader, or deploy a NeMo Gym env to HF Spaces. NeMo Gym is the right framework when the user wants HTTP+REST with cookie session handling, raw `requests`-driven rollouts (no SDK client), Ray-based orchestration, or NVIDIA NeMo / TRL training integration with a `responses_create_params` + `ground_truth` dataset format. Output is a runnable `<env_dir>/nemo_gym/` folder with `server.py`, `pyproject.toml`, `Dockerfile`, `configs/<env>.yaml`, and `rollout.py`. Use for prompts like "wrap my env in NeMo Gym", "make a NeMo resources server for X", or "add a post-episode grader to my env".

2026-05-06136

generate-openenv-env.md

from "adithya-s-k/RL_Envs_101"

Builds an OpenEnv (Meta) variant of an RL environment. Use whenever someone asks to scaffold an OpenEnv server, port an existing env to OpenEnv, add MCP tools to an env, or deploy an OpenEnv to HF Spaces. OpenEnv is the right framework when the user wants HTTP+MCP, structured tool calls discovered via `list_tools()`, an optional Gradio UI, sandbox-backed sessions, or deployment as a Docker container / HF Space. Output is a runnable `<env_dir>/openenv/` folder with `server/app.py`, `server/<env>_environment.py`, `pyproject.toml`, `Dockerfile`, and `rollout.py`. Use for prompts like "wrap my game in OpenEnv", "make an MCP env for X", or "add the openenv variant".

2026-05-06136

generate-verifiers-env.md

from "adithya-s-k/RL_Envs_101"

Builds a Verifiers (PrimeIntellect) variant of an RL environment. Use whenever someone asks to scaffold a Verifiers env, port to Verifiers, build an in-process toolkit, set up a `vf.ToolEnv` with a Rubric, or wire up a TRL `GRPOTrainer` rollout. Verifiers is the right framework when the user wants in-process tools (no HTTP server), structured tool calling driven by plain Python functions, composable reward rubrics with multiple grader functions, fast iteration with no Docker, or the cleanest path from prototype to TRL training. Output is a runnable `<env_dir>/verifiers/` folder with `env.py` (toolkit + standalone tool functions + `create_verifiers_env`), `rollout.py`, and `pyproject.toml`. Use for prompts like "make a verifiers env for X", "wrap my game in verifiers", or "set up a vf.ToolEnv".

2026-05-06136

package.json

"author": "adithya-s-k"

"repository": "adithya-s-k/RL_Envs_101"

Abrir repositório GitHub Ver repositórios do creator

$ install --global

$ download --local

Executar no Manus

$ useful --forSOC

Cientistas de dadosInformática e Matemática15-2051L4

name

generate-ors-env

description

Builds an Open Reward Standard (ORS) variant of an RL environment using the official `openreward` Python package. Use whenever someone asks to scaffold an ORS env, port to OpenReward, add per-tool-call rewards, deploy to OpenReward.ai, or wrap an existing env in the ORS protocol. ORS is the right framework when the user wants HTTP+REST+SSE, rewards arriving inline with each tool call (not post-episode), task-spec-driven sessions, splits (train/val/test), or deployment to OpenReward.ai or HF Spaces. Output is a runnable `<env_dir>/ors/` folder with `server.py`, `tasks.py`, `pyproject.toml`, `Dockerfile.spaces`, and `rollout.py`. Use for prompts like "wrap my env in ORS", "make an OpenReward env for X", or "add per-call reward to my env".

generate-ors-env

Build the ORS variant of an env using the official openreward >= 0.1.33 package (the ors-sdk name is a common mistake — it does not exist on PyPI).

Concept

ORS is the Open Reward Standard (openrewardstandard.io) — an HTTP REST + Server-Sent Events protocol for agent envs. Reward arrives inline with every ToolOutput, which is the framework's defining feature compared to OpenEnv (external/post-hoc reward) and NeMo Gym (post-episode /verify).

When the user has a shared domain module (<domain>.py) and wants an ORS variant, never duplicate domain logic into the framework folder — wrap it.

Archetypes

Archetype	Hallmarks
Pure-Python game	Single `@tool`, `tasks.py` with N task dicts forming the `train` split, terminal reward via `finished=True`.
Stateful sandbox	`setup()` allocates resources from `task_spec`; `teardown()` frees them; per-tool reward stubs.
Vision / computer-use	`ImageBlock(data=<base64>, mimeType="image/png")` returns; `terminate(status)` tool emits the terminal reward.

Imports — exactly these

Server side:

from openreward.environments import (
    Environment, Server, tool, ToolOutput, TextBlock, Split, ImageBlock,
)

Client side (rollouts):

from openreward import EnvironmentsAPI
api = EnvironmentsAPI(base_url=URL, api_key="")
env = api.get(ENV_NAME)

Don't use OpenReward(api_key=..., base_url=...) even though it's the high-level client. It prepends matrix. / api. / construct. subdomains to the base URL — that breaks HF Space URLs. EnvironmentsAPI talks to base_url verbatim.

Architecture

<env_dir>/ors/
├── pyproject.toml         # openreward>=0.1.33 + e2b-* (if needed) + pydantic
├── __init__.py
├── Dockerfile             # local dev image
├── Dockerfile.spaces      # HF Space (port 7860, single-stage pip install)
├── README.spaces.md       # HF Space frontmatter
├── server.py              # the Environment subclass + main()
├── tasks.py               # list of dicts (task_spec for each task)
├── rollout.py             # or rollout_openai.py + rollout_qwen.py
└── README.md              # one-page dev README

Implementation order

1. Tasks file — `tasks.py`

A list of plain dicts. Each dict becomes a task_spec per session. ORS auto-wraps these into Task objects on list_tasks().

TASKS = [
    {"answer": "apple", "task": "Guess the 5-letter word."},
    # ...
]

2. The Environment subclass — `server.py`

from pydantic import BaseModel
from openreward.environments import Environment, Server, tool, ToolOutput, TextBlock, Split

class GuessInput(BaseModel):
    word: str

class WordleORS(Environment):
    def __init__(self, task_spec=None, secrets=None, **kw):
        super().__init__(task_spec=task_spec or {}, secrets=secrets or {})
        self._game = None

    def setup(self):                      # called on first tool invocation
        self._game = WordleGame(self.task_spec.get("answer"))

    def teardown(self):                   # called on session delete
        self._game = None

    @classmethod
    def list_splits(cls): return [Split(name="train", type="train")]

    @classmethod
    def list_tasks(cls, split): return TASKS

    def get_prompt(self):
        return [TextBlock(text="Play Wordle. Guess the 5-letter word.")]

    @tool
    def guess(self, params: GuessInput) -> ToolOutput:
        feedback = self._game.guess(params.word)
        return ToolOutput(
            blocks=[TextBlock(text=feedback)],
            reward=self._game.reward,
            finished=self._game.done,
        )

Key contracts:

Tools take a params: PydanticModel as the second arg. ORS uses the model's JSON schema as the tool's input_schema.
Empty inputs still need a Pydantic model (class _Empty(BaseModel): pass). Don't omit the param.
ToolOutput.blocks is [TextBlock | ImageBlock]. For images: ImageBlock(data=<base64>, mimeType="image/png"). Vision models actually see this.
reward is float | None. None means "no reward this step"; 0.0 means "stepped, scored zero". For pure terminal reward, return None everywhere except in the last ToolOutput.
finished=True ends the session. Pair with reward=1.0 (or whatever) to give the rollout a clean stop.
task_spec is a dict you read from self.task_spec — no schema validation. If you want validation, do it in setup().

3. Server entry point — `server.py` main

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("--port", type=int, default=8080)
    parser.add_argument("--host", type=str, default="0.0.0.0")
    args = parser.parse_args()
    Server([WordleORS]).run(host=args.host, port=args.port)

The endpoint name is auto-derived from the class name lowercased — WordleORS → wordleors. Tell the user this so they know what ENV_NAME to pass.

4. Rollout

Always discover tools and tasks from the env. Don't hardcode names:

api = EnvironmentsAPI(base_url=ENV_URL, api_key="")
env = api.get("wordleors")
tasks = env.list_tasks("train")
tools = env.list_tools(format="openai")     # built-in OpenAI tool-schema converter
with env.session(task=tasks[0]) as session:
    prompt = session.get_prompt()
    result = session.call_tool("guess", {"word": "crane"})
    # result.blocks, result.reward, result.finished

For vision envs, the screenshot tool returns an ImageBlock — read it as b.data (already base64). Pass that into the model's image content.

5. Dockerfiles

Dockerfile.spaces is the HF Space deploy image. Keep it minimal:

FROM python:3.11-slim
RUN useradd -m -u 1000 user
RUN pip install --no-cache-dir openreward pydantic <other-deps>
USER user
ENV HOME=/home/user PATH=/home/user/.local/bin:$PATH
WORKDIR $HOME/app
COPY --chown=user . $HOME/app
EXPOSE 7860
CMD ["python", "server.py", "--host", "0.0.0.0", "--port", "7860"]

README.spaces.md:

---
title: My Env ORS
emoji: 🎯
colorFrom: pink
colorTo: indigo
sdk: docker
app_port: 7860
tags: [ors, openreward]
---

Pushing to HF Spaces

Create a Space named <owner>/<env_name>-ors. Set E2B_API_KEY (and any other secrets) as Space secrets, not environment variables — they survive rebuilds. The local .env file should not be uploaded.

api.add_space_secret(repo_id="<owner>/<env>-ors", key="E2B_API_KEY", value="...")
api.upload_file(path_or_fileobj="Dockerfile.spaces", path_in_repo="Dockerfile", repo_id=...)
api.upload_file(path_or_fileobj="README.spaces.md", path_in_repo="README.md", repo_id=...)
# upload server.py, tasks.py, __init__.py, pyproject.toml

Validation gates

Local server — uv run python server.py --port 8772 then curl http://localhost:8772/list_environments returns ["<envname>"].
Tool discovery — curl http://localhost:8772/<envname>/tools | jq '.tools | length' matches the number of @tool methods.
End-to-end — MAX_TURNS=3 uv run python rollout.py drives the model through at least one tool call without errors.

Gotchas (from real-world ORS work)

from openreward.environments.types import Task — wrong; Task is in openreward.api.environments.types and you usually don't import it. list_tasks can return plain dicts; ORS wraps them.
OpenReward(base_url=URL) rewrites the URL — prepends matrix. / api. / construct. subdomains. For HF Spaces, use EnvironmentsAPI(base_url=URL, api_key="") directly.
e2b-desktop without e2b — e2b-desktop imports from e2b, but doesn't pin it. Add both to dependencies.
Endpoint name is the lowercased class name — MyEnvORS becomes myenvors. Tell users this explicitly so their ENV_NAME env var is right.

Reference

references/architecture.md — protocol shape + Server / Environment / Session lifecycle

Official documentation

openrewardstandard.io — protocol specification
docs.openreward.ai — Python SDK + platform docs
openreward on PyPI — current package (latest 0.1.81+)
Talc-AI/OpenReward on GitHub — source

generate-ors-env

Mais deste repositório

Mais deste repositório

generate-ors-env

Concept

Archetypes

Imports — exactly these

Architecture

Implementation order

1. Tasks file — tasks.py

2. The Environment subclass — server.py

3. Server entry point — server.py main

4. Rollout

5. Dockerfiles

Pushing to HF Spaces

Validation gates

Gotchas (from real-world ORS work)

Reference

Official documentation

generate-ors-env

Concept

Archetypes

Imports — exactly these

Architecture

Implementation order

1. Tasks file — tasks.py

2. The Environment subclass — server.py

3. Server entry point — server.py main

4. Rollout

5. Dockerfiles

Pushing to HF Spaces

Validation gates

Gotchas (from real-world ORS work)

Reference

Official documentation

1. Tasks file — `tasks.py`

2. The Environment subclass — `server.py`

3. Server entry point — `server.py` main

1. Tasks file — `tasks.py`

2. The Environment subclass — `server.py`

3. Server entry point — `server.py` main