with one click
format-adapter-alpaca
// Convert canonical training examples to Alpaca format for training frameworks
// Convert canonical training examples to Alpaca format for training frameworks
Generate Datasheet, Model Card, and Data Statement from a dataset manifest
Deterministically rebuild a dataset from its manifest and verify fixity equivalence
Create a versioned training dataset with manifest, fixity, provenance, and archive snapshot
End-to-end training dataset pipeline — acquire sources through publication
Detect training-eval overlap against benchmark sets before dataset publication
Generate SFT training examples from raw sources using Self-Instruct / Evol-Instruct / SQuAD / STaR patterns
| name | format-adapter-alpaca |
| description | Convert canonical training examples to Alpaca format for training frameworks |
| namespace | training-complete |
| category | format |
| platforms | ["claude","copilot","cursor","factory","windsurf","warp","codex","opencode","openclaw","hermes"] |
| commandHint | {"argumentHint":"<input-glob> [--output <path>] [--validate-round-trip]"} |
Convert canonical training example records (@agentic/code/frameworks/training-complete/schemas/example-record.yaml) into Alpaca-format JSONL for downstream SFT training frameworks. Alpaca is the original Stanford self-instruct format and remains widely supported by trainers like Axolotl, LLaMA-Factory, and Unsloth.
<input-glob> (required) — glob of canonical records (e.g., examples/raw/*.json)--output <path> (optional) — output JSONL path. Default: .aiwg/training/exports/alpaca-<timestamp>.jsonl--validate-round-trip (optional) — reload output and diff against canonical invariants before succeedingOne JSON object per line with fields {instruction, input, output}:
{"instruction": "You are a helpful assistant.", "input": "Explain photosynthesis in one sentence.", "output": "Photosynthesis is the process by which plants convert sunlight, water, and CO2 into glucose and oxygen."}
example-record.yaml; reject invalid records.instruction ← input.system (fallback to input.user if no system prompt)input ← input.user (empty string "" if input.system was empty and input.user was promoted to instruction)output ← output.assistantinstruction and output are non-empty; reject preference/tool_use records (not representable — route to sharegpt/chatml adapter).--validate-round-trip) — parse output back and confirm canonical invariants (id, task_type, input.user, output.assistant, quality_grade, license, provenance_id) survive via sidecar.format-convert event via memory-log-append.Alpaca fields cover only input.user and output.assistant. All other invariant fields (id, task_type, quality_grade, license, provenance_id) are preserved via sidecar.
Written alongside output as <output>.metadata.yaml — contains a list keyed by line number with: id, task_type, metadata.*, output.reasoning_trace, output.tool_calls, input.context_refs, and input.tools_available. Reasoning traces and tool calls are structural losses in Alpaca — always go to sidecar.
--validate-round-trip reconstructs canonical invariants 100% from (JSONL + sidecar).format-convert event is logged with input count, output count, and rejection count.@agentic/code/addons/semantic-memory/skills/memory-log-append/SKILL.md — logging the format-convert event