بنقرة واحدة
example-synthesizer
// Generate SFT training examples from raw sources using Self-Instruct / Evol-Instruct / SQuAD / STaR patterns
// Generate SFT training examples from raw sources using Self-Instruct / Evol-Instruct / SQuAD / STaR patterns
Generate Datasheet, Model Card, and Data Statement from a dataset manifest
Deterministically rebuild a dataset from its manifest and verify fixity equivalence
Create a versioned training dataset with manifest, fixity, provenance, and archive snapshot
End-to-end training dataset pipeline — acquire sources through publication
Detect training-eval overlap against benchmark sets before dataset publication
Convert canonical training examples to Alpaca format for training frameworks
| name | example-synthesizer |
| description | Generate SFT training examples from raw sources using Self-Instruct / Evol-Instruct / SQuAD / STaR patterns |
| namespace | training-complete |
| category | synthesis |
| platforms | ["claude","copilot","cursor","factory","windsurf","warp","codex","opencode","openclaw","hermes"] |
| commandHint | {"argumentHint":"<source-glob> [--pattern <name>] [--count <n>] [--temperature <t>] [--model <haiku|sonnet|opus>]"} |
Generate supervised fine-tuning (SFT) examples from raw ingested sources by delegating to the semantic-memory kernel and applying a chosen synthesis pattern. Produces fully-provenanced example records ready for preference generation, decontamination, and dataset versioning.
<source-glob> (required)A glob matching ingested source records or a seed example set (e.g., sources/*, examples/seed/*).
--pattern <self-instruct|evol-instruct|squad|star> (optional)Synthesis pattern to apply. Default: self-instruct.
--count <n> (optional)Target number of examples to generate per source. Default: 10.
--temperature <t> (optional)Generation temperature. Default: 0.7 (balance of diversity and coherence).
--model <haiku|sonnet|opus> (optional)Generator model. Default: sonnet per RLM cost guidance (sonnet is the recommended tier for synthesis — haiku loses coherence on complex patterns; opus is cost-prohibitive at scale).
--seed <int> (optional)Random seed for reproducibility. Default: system time.
| Pattern | Reference | Mechanism |
|---|---|---|
| Self-Instruct | REF-375 | Bootstrap new instruction/response pairs from a small pool of seed examples by prompting the generator to produce novel-but-similar tasks |
| Evol-Instruct | — | Apply depth evolution (add constraints, deepen reasoning) and breadth evolution (change topic, rephrase) to existing examples |
| SQuAD-style | REF-454 | Extract span-grounded question/answer pairs from document sources; each answer cites a specific passage |
| STaR | REF-445 | Augment existing examples with chain-of-thought reasoning traces; rationales are filtered by whether they yield the correct answer |
memory-ingest consumer interface; load records into generator context.--pattern against source type (SQuAD requires document sources; STaR requires existing I/O pairs).count × len(sources) candidates.example-quality-assess for GRADE rating.synthetic: true, synthetic_depth: 1 (or source depth + 1 for evolved examples), and per-example metadata (see below).derivedPages.synthesizedExamples collection in the training-complete memory consumer.memory-log-append with op synthetic-generate, including pattern, count, model, temperature, seed, acceptance rate.reports/synthesis-<timestamp>.md with pattern, input/output counts, quality distribution, and quarantine pointers.Every synthesized example carries:
metadata:
synthetic: true
synthetic_depth: 1 # incremented if derived from another synthetic example
seeds_used: [ex-abc, ex-def] # IDs of source/seed records
generator_agent: example-synthesizer
model: sonnet
temperature: 0.7
pattern: self-instruct
seed: 42
This lineage feeds grade-on-ingest, decontamination-check, and dataset versioning downstream.
Generated candidates are NOT accepted automatically. Each runs through example-quality-assess:
--min-grade (default MODERATE) land in derivedPages.synthesizedExamples.derivedPages.synthesizedQuarantine for human review (per human-authorization rule — no auto-delete).synthetic-generate-error.decontamination-check.# Bootstrap 50 examples per seed using Self-Instruct
example-synthesizer "examples/seed/*" --pattern self-instruct --count 50
# Deepen existing examples via Evol-Instruct with higher creativity
example-synthesizer "examples/raw/*" --pattern evol-instruct --temperature 0.9 --count 5
# Extract Q&A from ingested papers with reproducible seed
example-synthesizer "sources/papers/*" --pattern squad --count 20 --seed 42 --model opus
@agentic/code/addons/semantic-memory/skills/memory-ingest/SKILL.md@agentic/code/frameworks/training-complete/skills/example-quality-assess/SKILL.mdsynthetic-data-generator agent@agentic/code/addons/semantic-memory/skills/memory-log-append/SKILL.mdADR-022 D10