name	optimize-with-environments
description	Optimize environment system prompts with GEPA through prime gepa run. Use when asked to improve prompt performance without gradient training, compare baseline versus optimized prompts, run GEPA from CLI or TOML configs, or interpret GEPA outputs before deployment.

Optimize With Environments

Goal

Use GEPA to optimize system prompts in a controlled, reproducible loop.

Scope

Current GEPA path is for system prompt optimization. If user asks for unsupported optimization targets, stop and clarify before proceeding.

Endpoint And Model Selection Nudge

Encourage users to define reusable aliases in configs/endpoints.toml.
Ask whether optimization should be validated on instruct or reasoning models.
Instruct go-tos: gpt-4.1 series, qwen3 instruct series.
Reasoning go-tos: gpt-5 series, qwen3 thinking series, glm series.
For benchmark reporting, keep model family fixed between baseline and optimized comparisons unless the user requests a cross-family study.
Endpoint entries support optional headers (or extra_headers) for custom HTTP headers. GEPA inherits these from the registry for both the main model and the reflection model:

[[endpoint]]
endpoint_id = "my-proxy"
model = "gpt-4.1-mini"
url = "https://api.example/v1"
key = "OPENAI_API_KEY"
headers = { "X-Custom-Header" = "value" }

Core Workflow

Verify baseline first with prime eval run. Keep the default save behavior and do not add --skip-upload unless the user explicitly requests that deviation:

prime eval run my-env -m openai/gpt-4.1-mini -n 50 -r 3 -s

For v1 Taskset + Harness environments, confirm prompt-like fields are exposed in the saved state or task info before GEPA reflection; BYO Harness implementations may render richer trajectories than classic MultiTurnEnv examples.
Run GEPA:

prime gepa run my-env -m openai/gpt-4.1-mini -M openai/gpt-4.1-mini -B 500 -n 100 -N 50

Or run from config:

prime gepa run configs/gepa/qwen-3-5.toml

Re-evaluate with optimized prompt and compare against baseline.

High-Value Settings

-B/--max-calls: total optimization budget.
-n/--num-train and -N/--num-val: train/validation split sizes.
--minibatch-size: reflection granularity.
--perfect-score: skip already-solved minibatches when max score is known.
--state-columns: include environment-specific context in reflection data.

Output Artifacts

Expect and inspect:

best_prompt.txt
pareto_frontier.jsonl
metadata.json

Quality Rules

Do not optimize on top of broken reward logic.
For weak deterministic checks, fix rubric quality before GEPA tuning.
Keep model, sampling, and dataset conditions stable during baseline-vs-GEPA comparison.
Report limitations directly when feature gaps block requested optimization.

Deliverable

Return:

Baseline metrics.
Optimized metrics.
Prompt diff summary.
Recommendation to adopt, iterate, or stop.

name	optimize-with-environments
description	Optimize environment system prompts with GEPA through prime gepa run. Use when asked to improve prompt performance without gradient training, compare baseline versus optimized prompts, run GEPA from CLI or TOML configs, or interpret GEPA outputs before deployment.

Optimize With Environments

Goal

Use GEPA to optimize system prompts in a controlled, reproducible loop.

Scope

Current GEPA path is for system prompt optimization. If user asks for unsupported optimization targets, stop and clarify before proceeding.

Endpoint And Model Selection Nudge

Encourage users to define reusable aliases in configs/endpoints.toml.
Ask whether optimization should be validated on instruct or reasoning models.
Instruct go-tos: gpt-4.1 series, qwen3 instruct series.
Reasoning go-tos: gpt-5 series, qwen3 thinking series, glm series.
For benchmark reporting, keep model family fixed between baseline and optimized comparisons unless the user requests a cross-family study.
Endpoint entries support optional headers (or extra_headers) for custom HTTP headers. GEPA inherits these from the registry for both the main model and the reflection model:

[[endpoint]]
endpoint_id = "my-proxy"
model = "gpt-4.1-mini"
url = "https://api.example/v1"
key = "OPENAI_API_KEY"
headers = { "X-Custom-Header" = "value" }

Core Workflow

Verify baseline first with prime eval run. Keep the default save behavior and do not add --skip-upload unless the user explicitly requests that deviation:

prime eval run my-env -m openai/gpt-4.1-mini -n 50 -r 3 -s

For v1 Taskset + Harness environments, confirm prompt-like fields are exposed in the saved state or task info before GEPA reflection; BYO Harness implementations may render richer trajectories than classic MultiTurnEnv examples.
Run GEPA:

prime gepa run my-env -m openai/gpt-4.1-mini -M openai/gpt-4.1-mini -B 500 -n 100 -N 50

Or run from config:

prime gepa run configs/gepa/qwen-3-5.toml

Re-evaluate with optimized prompt and compare against baseline.

High-Value Settings

-B/--max-calls: total optimization budget.
-n/--num-train and -N/--num-val: train/validation split sizes.
--minibatch-size: reflection granularity.
--perfect-score: skip already-solved minibatches when max score is known.
--state-columns: include environment-specific context in reflection data.

Output Artifacts

Expect and inspect:

best_prompt.txt
pareto_frontier.jsonl
metadata.json

Quality Rules

Do not optimize on top of broken reward logic.
For weak deterministic checks, fix rubric quality before GEPA tuning.
Keep model, sampling, and dataset conditions stable during baseline-vs-GEPA comparison.
Report limitations directly when feature gaps block requested optimization.

Deliverable

Return:

Baseline metrics.
Optimized metrics.
Prompt diff summary.
Recommendation to adopt, iterate, or stop.

optimize-with-environments

Optimize With Environments

Goal

Scope

Endpoint And Model Selection Nudge

Core Workflow

High-Value Settings

Output Artifacts

Quality Rules

Deliverable

Optimize With Environments

Goal

Scope

Endpoint And Model Selection Nudge

Core Workflow

High-Value Settings

Output Artifacts

Quality Rules

Deliverable