Exécutez n'importe quel Skill dans Manus
en un clic

Exécutez n'importe quel Skill dans Manus en un clic

$pwd:

review-environments

Name: Review Environments
Author: PrimeIntellect-ai

// Review verifiers environments for correctness, robustness, and ecosystem compatibility. Use when asked for environment code review, quality audit, migration validation, or release readiness checks for local environments or environments pulled from the Hub.

Exécuter dans Manus

$ git log --oneline --stat

stars:4 143

forks:553

updated:30 mai 2026 à 09:35

SKILL.md

readonly

name	review-environments
description	Review verifiers environments for correctness, robustness, and ecosystem compatibility. Use when asked for environment code review, quality audit, migration validation, or release readiness checks for local environments or environments pulled from the Hub.

Review Environments

Goal

Find correctness risks and regressions first, then assess maintainability and ecosystem compliance.

Review Input Modes

Local environment module in ./environments/<env_name>.
Pulled Hub environment via prime env pull owner/name.
Installed package under active workspace.

Review Workflow

Identify environment contract:

load_environment(...)
base class and rollout behavior (SingleTurnEnv, MultiTurnEnv, ToolEnv/MCPEnv/StatefulToolEnv, SandboxEnv/PythonEnv, V1 vf.Env with explicit vf.Taskset/vf.Harness objects for framework programs, CliAgentEnv for sandboxed agents)
rubric and metrics

Verify installability and runtime entrypoint with the canonical eval path. Do not add --skip-upload unless the user explicitly requests that deviation; standard runs save automatically for the private Evaluations tab and prime eval view:

prime env install <env>
prime eval run <env> -m openai/gpt-4.1-mini -n 5

Trace reward pipeline and validate scoring semantics.
Run targeted checks for tool/stateful behavior where applicable.

Endpoint And Model Selection Nudge

Encourage endpoint alias setup in configs/endpoints.toml for reproducible review runs.
Check api_client_type when reviewing non-default providers. openai_chat_completions is the default; openai_responses and anthropic_messages should be explicit in endpoint configs when those protocols are required.
Ask whether review coverage should prioritize instruct or reasoning behavior.
Instruct go-tos: gpt-4.1 series, qwen3 instruct series.
Reasoning go-tos: gpt-5 series, qwen3 thinking series, glm series.

Critical Review Criteria

Reward correctness:

Prefer deterministic, explicit checks or LLM judges.
Flag best-effort keyword or style heuristics unless explicitly approved.
Verify the scoring semantics from code before treating a low reward as an implementation failure. Some environments intentionally complete with 0.0 reward when the model fails the task.

Environment self-containment:

Flag any requirement for user-managed background services before load_environment().
Require environment-managed lifecycle for sandboxes/sessions.

v1 taskset/harness contracts:

Expect new taskset/harness environments to use the v1 vf.Env / vf.Taskset / vf.Harness boundary, with load_taskset(config: MyTasksetConfig) and optional load_harness(config: MyHarnessConfig) defining child config types, plus the canonical load_environment(config: vf.EnvConfig) shim delegating through vf.load_taskset(config=config.taskset) and vf.load_harness(config=config.harness).
Expect tasksets to own task data, task-owned tools, user behavior, metrics, rewards, and task-specific config. Flag one-off harness classes that only wrap task behavior.
Review v1 implementations against the generated prime env init my-env --v1 shape: task settings in TasksetConfig, tasks in load_tasks, task-owned tools in load_toolsets, user behavior in User subclasses, lifecycle/metrics/rewards as @vf.* methods, and typed component entrypoints through load_taskset, optional load_harness, and load_environment.
Expect shared dependencies to use bindings owned by the taskset, toolset, user, program, or harness that needs them. Flag pre-initialized resource objects passed through environment loaders; object entries should be serializable loader paths or no-arg loader callables.
Verify Task data is serializable, state remains serializable at rollout boundaries, and model/client controls flow through runtime state rather than top-level dataset columns.
For V1 harness programs, verify framework clients consume state.get_endpoint_config(api="chat") rather than hardcoding an upstream LLM endpoint. For CliAgentEnv agents, verify sandboxed agent code consumes the injected interception endpoint; the proxy is what makes rollouts visible to the rubric.

Migration fidelity:

For ports, verify one-to-one equivalence of prompts, tool traces, and scoring logic.
Flag any assumptions made without user decision.

Secrets handling:

Ensure required keys are validated in load_environment() with vf.ensure_keys(...).

Performance and scaling:

Identify obvious bottlenecks in dataset loading, rubric calls, or tool execution.

Packaging and repo hygiene:

If an environment was renamed or moved, verify pyproject.toml, README/docs references, package include paths, tests, and generated AGENTS output were updated together.
Flag bytecode, coverage files, local eval outputs, and temporary build artifacts unless they are intentional release assets.

Config And Docs Surface

Check that eval, GEPA, RL, and Hosted Training examples use the same public TOML shape where applicable.
For v1 configs, route settings through taskset and harness child config sections; do not subclass EnvConfig just to narrow child config types, and avoid root env config knobs.
If docs changed public behavior, verify the relevant bundled skill was updated too.

Findings Format

Return findings first, sorted by severity:

P0/P1 bugs and behavioral mismatches.
P2 quality risks and maintainability issues.
Test gaps and missing eval coverage. Include file paths, exact lines, impact, and concrete fix direction.

If No Findings

State explicitly that no defects were found, then list residual risk and untested areas.

related-skills.json

même dépôt

create-environments.md

from "PrimeIntellect-ai/verifiers"

Create or migrate verifiers environments for the Prime Lab ecosystem. Use when asked to build a new environment from scratch, port an eval or benchmark from papers or other libraries, start from an environment on the Hub, or convert existing tasks into a package that exposes load_environment and installs cleanly with prime env install.

2026-05-304.1k

browse-environments.md

from "PrimeIntellect-ai/verifiers"

Discover and inspect verifiers environments through the Prime ecosystem. Use when asked to find environments on the Hub, compare options, inspect metadata, check action status, pull local copies for inspection, or choose environment starting points before evaluation, training, or migration work.

2026-05-304.1k

train-with-environments.md

from "PrimeIntellect-ai/verifiers"

Train models with verifiers environments using hosted RL or prime-rl. Use when asked to configure RL runs, tune key hyperparameters, diagnose instability, set up difficulty filtering, or create practical train and eval loops for new environments.

2026-05-304.1k

evaluate-environments.md

from "PrimeIntellect-ai/verifiers"

Run and analyze evaluations for verifiers environments using prime eval. Use when asked to smoke-test environments, run benchmark sweeps, resume interrupted evaluations, compare models, inspect sample-level outputs, or produce evaluation summaries suitable for deciding next steps.

2026-05-294.1k

optimize-with-environments.md

from "PrimeIntellect-ai/verifiers"

Optimize environment system prompts with GEPA through prime gepa run. Use when asked to improve prompt performance without gradient training, compare baseline versus optimized prompts, run GEPA from CLI or TOML configs, or interpret GEPA outputs before deployment.

2026-05-144.1k

brainstorm.md

from "PrimeIntellect-ai/verifiers"

Run interactive brainstorming across verifiers environments, evaluations, GEPA, and RL training. Use when the user wants ideation, literature scanning, concept teaching, roadmap planning, or research program design grounded in local CLI sources, verifiers, and RL trainer code.

2026-05-064.1k

package.json

"author": "PrimeIntellect-ai"

"repository": "PrimeIntellect-ai/verifiers"

Ouvrir le dépôt GitHub Voir les dépôts du créateur

$ install --global

$ download --local

Exécuter dans Manus

$ useful --forSOC

Analystes en assurance qualité des logiciels et testeursProfessions informatiques et mathématiques15-1253L4

name	review-environments
description	Review verifiers environments for correctness, robustness, and ecosystem compatibility. Use when asked for environment code review, quality audit, migration validation, or release readiness checks for local environments or environments pulled from the Hub.

Review Environments

Goal

Find correctness risks and regressions first, then assess maintainability and ecosystem compliance.

Review Input Modes

Local environment module in ./environments/<env_name>.
Pulled Hub environment via prime env pull owner/name.
Installed package under active workspace.

Review Workflow

Identify environment contract:

load_environment(...)
base class and rollout behavior (SingleTurnEnv, MultiTurnEnv, ToolEnv/MCPEnv/StatefulToolEnv, SandboxEnv/PythonEnv, V1 vf.Env with explicit vf.Taskset/vf.Harness objects for framework programs, CliAgentEnv for sandboxed agents)
rubric and metrics

Verify installability and runtime entrypoint with the canonical eval path. Do not add --skip-upload unless the user explicitly requests that deviation; standard runs save automatically for the private Evaluations tab and prime eval view:

prime env install <env>
prime eval run <env> -m openai/gpt-4.1-mini -n 5

Trace reward pipeline and validate scoring semantics.
Run targeted checks for tool/stateful behavior where applicable.

Endpoint And Model Selection Nudge

Encourage endpoint alias setup in configs/endpoints.toml for reproducible review runs.
Check api_client_type when reviewing non-default providers. openai_chat_completions is the default; openai_responses and anthropic_messages should be explicit in endpoint configs when those protocols are required.
Ask whether review coverage should prioritize instruct or reasoning behavior.
Instruct go-tos: gpt-4.1 series, qwen3 instruct series.
Reasoning go-tos: gpt-5 series, qwen3 thinking series, glm series.

Critical Review Criteria

Reward correctness:

Prefer deterministic, explicit checks or LLM judges.
Flag best-effort keyword or style heuristics unless explicitly approved.
Verify the scoring semantics from code before treating a low reward as an implementation failure. Some environments intentionally complete with 0.0 reward when the model fails the task.

Environment self-containment:

Flag any requirement for user-managed background services before load_environment().
Require environment-managed lifecycle for sandboxes/sessions.

v1 taskset/harness contracts:

Expect new taskset/harness environments to use the v1 vf.Env / vf.Taskset / vf.Harness boundary, with load_taskset(config: MyTasksetConfig) and optional load_harness(config: MyHarnessConfig) defining child config types, plus the canonical load_environment(config: vf.EnvConfig) shim delegating through vf.load_taskset(config=config.taskset) and vf.load_harness(config=config.harness).
Expect tasksets to own task data, task-owned tools, user behavior, metrics, rewards, and task-specific config. Flag one-off harness classes that only wrap task behavior.
Review v1 implementations against the generated prime env init my-env --v1 shape: task settings in TasksetConfig, tasks in load_tasks, task-owned tools in load_toolsets, user behavior in User subclasses, lifecycle/metrics/rewards as @vf.* methods, and typed component entrypoints through load_taskset, optional load_harness, and load_environment.
Expect shared dependencies to use bindings owned by the taskset, toolset, user, program, or harness that needs them. Flag pre-initialized resource objects passed through environment loaders; object entries should be serializable loader paths or no-arg loader callables.
Verify Task data is serializable, state remains serializable at rollout boundaries, and model/client controls flow through runtime state rather than top-level dataset columns.
For V1 harness programs, verify framework clients consume state.get_endpoint_config(api="chat") rather than hardcoding an upstream LLM endpoint. For CliAgentEnv agents, verify sandboxed agent code consumes the injected interception endpoint; the proxy is what makes rollouts visible to the rubric.

Migration fidelity:

For ports, verify one-to-one equivalence of prompts, tool traces, and scoring logic.
Flag any assumptions made without user decision.

Secrets handling:

Ensure required keys are validated in load_environment() with vf.ensure_keys(...).

Performance and scaling:

Identify obvious bottlenecks in dataset loading, rubric calls, or tool execution.

Packaging and repo hygiene:

If an environment was renamed or moved, verify pyproject.toml, README/docs references, package include paths, tests, and generated AGENTS output were updated together.
Flag bytecode, coverage files, local eval outputs, and temporary build artifacts unless they are intentional release assets.

Config And Docs Surface

Check that eval, GEPA, RL, and Hosted Training examples use the same public TOML shape where applicable.
For v1 configs, route settings through taskset and harness child config sections; do not subclass EnvConfig just to narrow child config types, and avoid root env config knobs.
If docs changed public behavior, verify the relevant bundled skill was updated too.

Findings Format

Return findings first, sorted by severity:

P0/P1 bugs and behavioral mismatches.
P2 quality risks and maintainability issues.
Test gaps and missing eval coverage. Include file paths, exact lines, impact, and concrete fix direction.

If No Findings

State explicitly that no defects were found, then list residual risk and untested areas.

review-environments

Review Environments

Goal

Review Input Modes

Review Workflow

Endpoint And Model Selection Nudge

Critical Review Criteria

Config And Docs Surface

Findings Format

If No Findings

Plus depuis ce dépôt

Review Environments

Goal

Review Input Modes

Review Workflow

Endpoint And Model Selection Nudge

Critical Review Criteria

Config And Docs Surface

Findings Format

If No Findings

Plus depuis ce dépôt