Run any Skill in Manus with one click

trainer-synthesize

Stars0

Forks0

UpdatedApril 10, 2026 at 15:57

Build official `evals/evals.json` cases and explicit APO datasets from grounded or computed source data. Use whenever a prompt or skill needs eval rows, `train.jsonl`, or `val.jsonl`, especially when correct outputs must be derived from raw fields, business rules, or verifier-backed synthetic examples.

Installation

Install with Codex or Claude Copy this prompt, paste it into Codex, Claude, or another assistant, and let it review the skill page and install it for you.

Run Skill in Manus

Source

Tyler-R-Kendrick

Tyler-R-Kendrick/copilot-auto-training

View GitHub Repository View Creator Repositories

Download

Run Skill in Manus

Related occupationsSOC

Based on SOC occupation classification

Software Quality Assurance Analysts and TestersComputer and Mathematical Occupations·SOC 15-1253

File Explorer

7 files

SKILL.md

readonly

More from this repository

same repository

trainer-optimize

Tyler-R-Kendrick/copilot-auto-training

Improve a markdown prompt file using Agent Lightning APO (Automatic Prompt Optimization). Use when the user asks to optimize or improve a markdown prompt, or starts a message with /trainer-optimize.

2026-04-130

trainer-train-agent

Tyler-R-Kendrick/copilot-auto-training

Own the end-to-end trainer loop for agent contract targets (*.agent.md files, custom agent definitions, and agent instruction documents). Use this whenever the caller needs to research, synthesize datasets, optimize, validate, and write back a trained candidate for an agent-type target. Prefer this specialized loop whenever the selected target defines tool routing, MCP skill configuration, agent personas, or handoff behavior rather than raw prompts, code, or skill definitions.

2026-04-120

trainer-train-code

Tyler-R-Kendrick/copilot-auto-training

Own the end-to-end trainer loop for Python code targets optimized with Microsoft Trace (nodes, bundles, models, and trainable agent components). Use this whenever the caller needs to research, synthesize test-based datasets, optimize, validate, and write back a trained candidate for a code-type target. Prefer this specialized loop for any Python file or callable that benefits from deterministic, test-based or benchmark-based feedback rather than open-ended language instruction quality.

2026-04-120

trainer-train-code

Tyler-R-Kendrick/copilot-auto-training

2026-04-120

trainer-train-prompt

Tyler-R-Kendrick/copilot-auto-training

Own the end-to-end trainer loop for prompt-like files (*.prompt.md, *.prompty, *.instructions.md, system prompts, and other natural-language instruction artifacts). Use this whenever the caller needs to research, synthesize datasets, optimize, validate, and write back a trained candidate for a prompt-type target. Prefer this specialized loop for any file whose primary content is natural-language instructions rather than code, skill configuration, or agent contracts.

2026-04-120

trainer-train-prompt

Tyler-R-Kendrick/copilot-auto-training

2026-04-120

name	trainer-synthesize
description	Build official `evals/evals.json` cases and explicit APO datasets from grounded or computed source data. Use whenever a prompt or skill needs eval rows, `train.jsonl`, or `val.jsonl`, especially when correct outputs must be derived from raw fields, business rules, or verifier-backed synthetic examples.
license	MIT
compatibility	Requires Python 3.11+. Produces official eval manifests plus explicit APO datasets for this repository.
metadata	{"author":"your-org","version":"0.1.0"}

Synthesize

Use this skill to turn source material into official skill eval manifests and explicit APO datasets.

In this repository, authored eval cases live in evals/evals.json, optional eval assets live in evals/files/, and APO training data lives in explicit datasets/train.jsonl and datasets/val.jsonl files. Work compute-first. If labels, expected outputs, assertions, or JSONL rows depend on derived values, plan the computation before writing them. Synthesize only after the ground truth is provided or derived.

When to use this skill

The workflow needs evals/evals.json for a markdown prompt or skill.
The workflow also needs explicit train.jsonl and val.jsonl files before handing off to a downstream optimizer.
The user has examples, CSV or JSON data, tables, schemas, business rules, or source notes from an approved research brief.
The agent should convert known source material into high-quality eval cases.
The agent should convert grounded source rows into explicit APO datasets without blurring them together with authored eval manifests.
The expected outputs are not directly written down and must be derived from raw fields.
The agent needs explicit transforms such as map, filter, reduce, grouping, joins, sorting, normalization, or rule evaluation before authoring eval rows.
The agent should add synthetic rows only after grounded or computed rows establish the target task pattern.

If no credible source material exists yet, use a primary-source research workflow first instead of guessing.

Inputs

prompt_file: target markdown prompt
task_description: short description of the real task the prompt should solve
scoring_rule: expected answer format or evaluation rule
Optional split_strategy: how to divide grounded rows into train and validation sets when APO datasets are required
Optional source material such as csv_file, JSON, tables, schemas, rule definitions, lookup tables, existing examples, or structured notes from an approved research brief

Resolve Before Writing

Resolve these inputs before authoring eval cases:

raw source fields
derived target values
transformation logic
output schema or answer format
dataset row shape and split boundary
constraints, edge cases, and disallowed shortcuts

If any of these are missing, ask for them before synthesizing. Do not invent hidden rules, thresholds, or labels.

Output

Return, in order:

A synthesis plan
Missing-input questions, if any
The final evals/evals.json content and any evals/files/ assets
The final train.jsonl and val.jsonl content when APO datasets are required
A brief provenance and split note that explains which rows are grounded, which rows are synthetic, and how validation rows were held out

If the needed inputs are already present, state that the plan is fully satisfied and proceed. In this repository, producing only evals/evals.json is the exception; when the output is meant to feed an optimizer, treat train.jsonl and val.jsonl as part of done state.

Synthesis Plan

Before writing any eval rows, build a short plan with these sections:

Target layout: derived evals/evals.json path, evals/files/ assets, explicit datasets/train.jsonl and datasets/val.jsonl paths, and any workspace directory implied by the prompt file.
Observed inputs: the fields directly present in the source data or prompt placeholders.
Derived outputs: the labels, scores, extracted facts, summaries, or structured answers that must be computed.
Computation recipe: the exact operations required to go from observed inputs to derived outputs. Use concrete language such as map, filter, reduce, group, join, normalize, rank, or rule-match.
Verification path: how each computed value will be checked independently.
Dataset split: how grounded rows, boundary cases, and verifier-backed synthetic rows will be distributed between train and validation without leakage.
Missing inputs: fields, rules, thresholds, label definitions, or source files that still need to be elicited.

For repeated or error-prone computation, prefer a script or deterministic transformation over manual mental arithmetic.

Use scripts/run_synthesize.py to derive the canonical target paths and prompt placeholders before writing files. If the source rows already exist in CSV form, use the repository JSONL generator utility to bootstrap explicit train.jsonl and val.jsonl files under the modern datasets/ layout.

Dataset contract

Keep authored eval cases and APO datasets separate:

evals/evals.json should contain realistic user prompts, human-readable expected_output, optional files, and objective assertions.
train.jsonl and val.jsonl should contain evaluator-facing rows such as input, expected, expected_json, reference, criteria, and scoring as needed by the task.
Do not leak evaluator-only fields into the prompt-visible input path.
Preserve prompt placeholders exactly. If the template uses multiple placeholders, emit structured JSONL input that satisfies those names.
Keep at least one grounded or independently verified row in both train and validation splits.

Synthetic-data standard

Synthetic data is only acceptable here when it is grounded and verified. Use verifier-backed generation: draft candidate rows, then independently check them with deterministic rules, derived ground truth, or a stricter verifier before keeping them.

Apply that standard like this:

Prefer real or research-grounded source rows first.
Compute or verify the target answers independently from the generator whenever possible.
Use synthetic generation to expand coverage, not to replace missing truth conditions.
Reject candidate rows that cannot be traced back to a clear rule, source, or verification result.
Keep generator and verifier roles conceptually separate. A candidate example is not valid just because it sounds plausible.

Core rule: generate, verify, filter, then keep only high-confidence rows.

Process

Inspect the prompt placeholders and derive the official evals/evals.json target path and any evals/files/ assets.
Build the synthesis plan before authoring any rows. Identify observed fields, derived targets, the computation recipe, and the verification method.
If the computation depends on missing inputs, ask for them before continuing.
Start from grounded source material such as user rows, CSV input, tables, or an approved research brief.
Compute target values with explicit transforms such as map, filter, reduce, grouping, joins, normalization, and rule evaluation.
Keep grounded examples first; add synthetic rows only after the task pattern is established.
Verify every synthetic candidate independently and discard failures.
Fill coverage deliberately: edge cases, class balance, hard near-misses, null or missing inputs, malformed inputs, and realistic long-tail cases.
Convert the resulting tasks into official eval cases with realistic prompt, expected_output, optional files, and objective assertions.
Convert the same grounded task set into explicit train.jsonl and val.jsonl rows, keeping evaluator-only fields out of the prompt-visible path.
Split rows so validation remains a genuine holdout. Explain the split if the source set is small or skewed.
Produce evals/evals.json, any required evals/files/ assets, and explicit dataset files under the canonical layout next to the prompt or skill.

Quality Rules

Keep eval prompts realistic and user-like, not schematic labels.
Keep JSONL rows faithful to the prompt interface and scoring mode.
Prefer small, diverse eval sets over large repetitive sets.
Make synthetic rows distributionally plausible for the task domain.
Include hard negatives and boundary cases, not just easy canonical examples.
Avoid leakage from the prompt wording into the expected answer when the real task would not provide that shortcut.
Keep assertions objective and observable. If a property is subjective, leave it for later human review.
Keep validation rows meaningfully different from training rows; do not create a split that only differs by superficial wording.
Never blur authored evals/evals.json fixtures together with train.jsonl or val.jsonl datasets.
Never hide unresolved ambiguity by writing overconfident expected outputs.

Elicit If Missing

Ask for the missing items that block correct computation, such as:

field definitions and units
aggregation windows or grouping keys
label taxonomy and tie-breaking rules
normalization rules and text-cleaning rules
required scoring mode and dataset row fields
train or validation split constraints when the caller already has a benchmark plan
allowed reference files or lookup tables
examples of tricky failures or borderline cases

If none are missing, say so explicitly and continue.

Guardrails

Do not claim a downstream optimizer will infer or synthesize missing datasets later.
Do not route around missing rules by inventing labels, thresholds, or tie-breakers.
Do not add benchmark, leader-election, or optimizer-runtime behavior to this skill.
Do not emit empty dataset splits. If there is not enough trustworthy data for both train and validation, say so.

Naming rationale

synthesize is the best public name here because it focuses on conversion and example generation after source material has already been gathered.