تشغيل أي مهارة في Manus بنقرة واحدة

$pwd:

rl-env-from-description

Name: Rl Env From Description
Author: adithya-s-k

// Turns a user's plain-English description of an RL training environment into runnable code across the four target frameworks — OpenEnv, OpenReward (ORS), Verifiers, and NeMo Gym. Use whenever someone describes an environment they want to build ("I want to train an agent that does X", "make an env where the model has to Y"), asks to scaffold a new env, asks to port an existing env to one of these frameworks, or asks how to design tools/rewards/state for a new env. Use even when the user does not explicitly say "RL environment" — descriptions like "agent that browses the web", "tool-calling agent for SQL", or "game-playing agent" all qualify. Drives the full flow — clarifying interview, env-name selection, shared-domain extraction, per-framework implementation, and rollout-based smoke tests.

تشغيل في Manus

$ git log --oneline --stat

stars:١٣٦

forks:١٥

updated:٦ مايو ٢٠٢٦ في ١١:٣٤

مستكشف الملفات

6 ملفات

SKILL.md

readonly

related-skills.json

نفس المستودع

generate-nemo-gym-env.md

from "adithya-s-k/RL_Envs_101"

Builds a NeMo Gym (NVIDIA) variant of an RL environment. Use whenever someone asks to scaffold a NeMo Gym Resources Server, port an existing env to NeMo Gym, expose tools as `app.post()` endpoints with cookie-based sessions, add a post-episode `/verify` reward grader, or deploy a NeMo Gym env to HF Spaces. NeMo Gym is the right framework when the user wants HTTP+REST with cookie session handling, raw `requests`-driven rollouts (no SDK client), Ray-based orchestration, or NVIDIA NeMo / TRL training integration with a `responses_create_params` + `ground_truth` dataset format. Output is a runnable `<env_dir>/nemo_gym/` folder with `server.py`, `pyproject.toml`, `Dockerfile`, `configs/<env>.yaml`, and `rollout.py`. Use for prompts like "wrap my env in NeMo Gym", "make a NeMo resources server for X", or "add a post-episode grader to my env".

2026-05-06136

generate-openenv-env.md

from "adithya-s-k/RL_Envs_101"

Builds an OpenEnv (Meta) variant of an RL environment. Use whenever someone asks to scaffold an OpenEnv server, port an existing env to OpenEnv, add MCP tools to an env, or deploy an OpenEnv to HF Spaces. OpenEnv is the right framework when the user wants HTTP+MCP, structured tool calls discovered via `list_tools()`, an optional Gradio UI, sandbox-backed sessions, or deployment as a Docker container / HF Space. Output is a runnable `<env_dir>/openenv/` folder with `server/app.py`, `server/<env>_environment.py`, `pyproject.toml`, `Dockerfile`, and `rollout.py`. Use for prompts like "wrap my game in OpenEnv", "make an MCP env for X", or "add the openenv variant".

2026-05-06136

generate-ors-env.md

from "adithya-s-k/RL_Envs_101"

Builds an Open Reward Standard (ORS) variant of an RL environment using the official `openreward` Python package. Use whenever someone asks to scaffold an ORS env, port to OpenReward, add per-tool-call rewards, deploy to OpenReward.ai, or wrap an existing env in the ORS protocol. ORS is the right framework when the user wants HTTP+REST+SSE, rewards arriving inline with each tool call (not post-episode), task-spec-driven sessions, splits (train/val/test), or deployment to OpenReward.ai or HF Spaces. Output is a runnable `<env_dir>/ors/` folder with `server.py`, `tasks.py`, `pyproject.toml`, `Dockerfile.spaces`, and `rollout.py`. Use for prompts like "wrap my env in ORS", "make an OpenReward env for X", or "add per-call reward to my env".

2026-05-06136

generate-verifiers-env.md

from "adithya-s-k/RL_Envs_101"

Builds a Verifiers (PrimeIntellect) variant of an RL environment. Use whenever someone asks to scaffold a Verifiers env, port to Verifiers, build an in-process toolkit, set up a `vf.ToolEnv` with a Rubric, or wire up a TRL `GRPOTrainer` rollout. Verifiers is the right framework when the user wants in-process tools (no HTTP server), structured tool calling driven by plain Python functions, composable reward rubrics with multiple grader functions, fast iteration with no Docker, or the cleanest path from prototype to TRL training. Output is a runnable `<env_dir>/verifiers/` folder with `env.py` (toolkit + standalone tool functions + `create_verifiers_env`), `rollout.py`, and `pyproject.toml`. Use for prompts like "make a verifiers env for X", "wrap my game in verifiers", or "set up a vf.ToolEnv".

2026-05-06136

package.json

"author": "adithya-s-k"

"repository": "adithya-s-k/RL_Envs_101"

فتح مستودع GitHub عرض مستودعات المنشئ

$ install --global

$ download --local

تشغيل في Manus

$ useful --forSOC

علماء البياناتمهن الحاسوب والرياضيات15-2051L4

name

rl-env-from-description

description

Turns a user's plain-English description of an RL training environment into runnable code across the four target frameworks — OpenEnv, OpenReward (ORS), Verifiers, and NeMo Gym. Use whenever someone describes an environment they want to build ("I want to train an agent that does X", "make an env where the model has to Y"), asks to scaffold a new env, asks to port an existing env to one of these frameworks, or asks how to design tools/rewards/state for a new env. Use even when the user does not explicitly say "RL environment" — descriptions like "agent that browses the web", "tool-calling agent for SQL", or "game-playing agent" all qualify. Drives the full flow — clarifying interview, env-name selection, shared-domain extraction, per-framework implementation, and rollout-based smoke tests.

RL Env From Description

Convert a plain-English description of an RL training environment into runnable code across OpenEnv, OpenReward (ORS), Verifiers, and NeMo Gym. Two other framework variants (SkyRL Gym, GEM) are secondary and only relevant for text-action-with-tag-parsing envs — produce them only if the user asks.

When to use

A user describes an env in plain English (a goal, an action surface, or a reward shape) and wants code.
A user asks to "build an env for X", "scaffold an RL env", "port this env to OpenEnv/ORS/Verifiers/NeMo Gym".
A user already has a runnable env in one framework and wants the same env in others.
A user asks "what's the right way to design my reward / state / tool surface for this task" — start with the interview below, then implement.

Do not use for: training runs (TRL/GRPO config), evaluation harness work, or general agent-design questions that don't end with new env code.

Recommended layout (suggest, don't impose)

A clean shape that scales well — but the user gets to pick the actual paths:

<env_dir>/                          # whatever the user names it
├── <domain>.py                     # SHARED pure logic (e.g. game.py)
├── tasks.py                        # SHARED list of task dicts (optional)
├── openenv/                        # OpenEnv variant (HTTP, MCP)
├── ors/                            # ORS variant (HTTP, REST + SSE)
├── verifiers/                      # Verifiers variant (in-process)
└── nemo_gym/                       # NeMo Gym variant (HTTP, REST + cookies)

Inside each framework folder, the public contract is:

pyproject.toml — framework-specific deps
__init__.py
One implementation file (server.py for ORS, server/<env>_environment.py for OpenEnv, env.py for Verifiers, server.py for NeMo Gym)
rollout.py — runs an LLM against the env end-to-end
README.md — one-page consumption guide

Always ask the user where they want files written. If they don't have a preference, propose the layout above. Don't force it.

The four-step flow

Step 1 — Interview the user (focused, not exhaustive)

Ask only the questions that determine architecture. The full bank lives in references/interview.md; the must-cover set is:

What does the agent DO? One sentence describing the goal and the loop.
Action surface — structured tool calls (most cases) or free text with tag parsing (rare; only when the model has no tool-calling support).
State — does anything persist across turns? Per-session sandbox? In-memory dict? Nothing?
Reward — when does it fire (per-step, on terminate, post-episode)? What's the success criterion?
External backends — sandbox (E2B), web service, none?
Termination — fixed turn cap, model-emits-terminate, or a derived condition.
Where should the files live? Project-relative path; never assume.

If the user has already given enough signal in their description (e.g. they cited an existing env they want to mirror), skip questions whose answers are obvious. Don't make people repeat themselves.

When in doubt about an architectural choice, propose a default with a one-line rationale and let the user veto.

Step 2 — Pick the closest archetype

Match the user's description to one of these archetypes; tell them which archetype you're using and why:

Archetype	Hallmarks	Typical reward shape
Pure-Python game	Deterministic, single tool, no external services, multi-turn	Terminal reward (1.0/0.0) or per-step from game state
Stateful sandbox	Real backend (E2B Code Interpreter, browser, DB), structured tool calls, state persists across calls	External grader (string match, unit tests, LLM judge)
Vision / computer-use	Screenshots + mouse/keyboard, 19-tool action surface modelled on Anthropic's `computer_20251124`	Terminal reward via `terminate(status)` tool
Text-action with parsing	Model emits free text containing tags; env parses (use only if the model has no tool-calling support)	Per-step from parsed action results

Step 3 — Implement in dependency order

The shared module first, then per-framework variants. Order doesn't matter between frameworks.

<env_dir>/<domain>.py + <env_dir>/tasks.py — the only file that contains domain logic. Frameworks just wrap it. Keep it pure-Python; no framework imports.
OpenEnv — read references/openenv.md (planner-level) or trigger generate-openenv-env skill (full workflow). Use MCPEnvironment + @mcp.tool + create_app(...) in server/app.py.
ORS — read references/ors.md or trigger generate-ors-env. Use Environment + @tool methods + ToolOutput(blocks=[...], reward=..., finished=...). Per-tool-call reward is the framework's defining feature.
Verifiers — read references/verifiers.md or trigger generate-verifiers-env. Plain Python tool functions on a toolkit class; vf.ToolEnv + vf.Rubric for native consumption.
NeMo Gym — read references/nemo_gym.md or trigger generate-nemo-gym-env. SimpleResourcesServer with one app.post("/<tool>") per tool; cookie sessions; post-episode /verify reward.

Step 4 — Validate end-to-end before declaring done

Each framework folder gets ONE smoke rollout against a small LLM (Qwen via HF Router by default, or OpenAI if OPENAI_API_KEY is set). The rollout must:

Discover the tools the env exposes (don't hardcode names — except for NeMo Gym, which has no list_tools()).
Drive a 3–5 turn loop and print every tool call + result.
Fail loudly if a tool call errors. (MAX_TURNS=3 for the smoke check.)

If the env needs an external backend (E2B, etc.), check for the relevant secret in .env and stop with a clear error if it's missing.

What success looks like

A user typing "make me an env where the agent plays connect-four at path/to/connect_four/" should end with:

path/to/connect_four/game.py (the shared engine), tasks.py (a few starting positions)
path/to/connect_four/openenv/, .../ors/, .../verifiers/, .../nemo_gym/ all runnable
4 green rollout smoke tests (NeMo Gym tested via deployed Space if local Ray init fails on shared nodes)
Whatever README convention the project uses, updated

…in one continuous flow, with the user only answering 5–7 questions along the way.

Reference docs

references/interview.md — full question bank with example answers
references/openenv.md — OpenEnv-specific implementation notes (planner-level; defers to generate-openenv-env)
references/ors.md — ORS planner-level (defers to generate-ors-env)
references/verifiers.md — Verifiers planner-level (defers to generate-verifiers-env)
references/nemo_gym.md — NeMo Gym planner-level (defers to generate-nemo-gym-env)

When the user wants only one framework variant, trigger the framework-specific skill directly: generate-openenv-env, generate-ors-env, generate-verifiers-env, or generate-nemo-gym-env.

Hard guardrails

Don't impose a folder layout. Suggest the recommended one once; respect the user's choice if they want different paths.
Don't skip the shared domain module. Cross-framework consistency is impossible without it. Every framework variant must wrap the same <domain>.py — never duplicate logic.
Don't run training. This skill ends with rollouts. Training/eval is a separate concern.
Don't invent APIs. When unsure about a framework's actual call shape, read its references/architecture.md (in the framework-specific skill) before writing code.
Coordinate spaces matter for vision envs. Declare the convention (pixel vs normalized 0–1000) in the prompt. Qwen2.5-VL emits 0–1000 normalized; the rollout adapter must rescale.

Official documentation

OpenEnv: meta-pytorch/OpenEnv · docs
OpenReward (ORS): openrewardstandard.io · docs.openreward.ai · openreward on PyPI
Verifiers: PrimeIntellect-ai/verifiers · docs
NeMo Gym: NVIDIA-NeMo/Gym · docs

rl-env-from-description

المزيد من هذا المستودع

RL Env From Description

When to use

Recommended layout (suggest, don't impose)

The four-step flow

Step 1 — Interview the user (focused, not exhaustive)

Step 2 — Pick the closest archetype

Step 3 — Implement in dependency order

Step 4 — Validate end-to-end before declaring done

What success looks like

Reference docs

Hard guardrails

Official documentation

RL Env From Description

When to use

Recommended layout (suggest, don't impose)

The four-step flow

Step 1 — Interview the user (focused, not exhaustive)

Step 2 — Pick the closest archetype

Step 3 — Implement in dependency order

Step 4 — Validate end-to-end before declaring done

What success looks like

Reference docs

Hard guardrails

Official documentation

المزيد من هذا المستودع