بنقرة واحدة
format-adapter-chatml
// Convert canonical training examples to ChatML format for training frameworks
// Convert canonical training examples to ChatML format for training frameworks
Generate Datasheet, Model Card, and Data Statement from a dataset manifest
Deterministically rebuild a dataset from its manifest and verify fixity equivalence
Create a versioned training dataset with manifest, fixity, provenance, and archive snapshot
End-to-end training dataset pipeline — acquire sources through publication
Detect training-eval overlap against benchmark sets before dataset publication
Generate SFT training examples from raw sources using Self-Instruct / Evol-Instruct / SQuAD / STaR patterns
| name | format-adapter-chatml |
| description | Convert canonical training examples to ChatML format for training frameworks |
| namespace | training-complete |
| category | format |
| platforms | ["claude","copilot","cursor","factory","windsurf","warp","codex","opencode","openclaw","hermes"] |
| commandHint | {"argumentHint":"<input-glob> [--output <path>] [--validate-round-trip]"} |
Convert canonical training example records (@agentic/code/frameworks/training-complete/schemas/example-record.yaml) into ChatML / OpenAI messages format — the native structure used by OpenAI fine-tuning, most modern chat models, and HuggingFace apply_chat_template.
SFTTrainer and a ChatML tokenizer templatetool_calls structure without serialization losses<input-glob> (required) — glob of canonical records--output <path> (optional) — default: .aiwg/training/exports/chatml-<timestamp>.jsonl--validate-round-trip (optional) — reload output and verify invariantsOne JSON object per line containing a messages array with typed roles:
{"messages": [{"role": "system", "content": "You are helpful."}, {"role": "user", "content": "What time is it?"}, {"role": "assistant", "content": null, "tool_calls": [{"id": "t1", "type": "function", "function": {"name": "now", "arguments": "{}"}}]}, {"role": "tool", "tool_call_id": "t1", "content": "12:00"}]}
Roles: system | user | assistant | tool. Native tool_calls on assistant messages.
messages array:
input.system → {role: "system", content: ...} (if present)input.user → {role: "user", content: ...}output.assistant → {role: "assistant", content: ...} with native tool_calls attached{role: "tool", tool_call_id, content}--validate-round-trip) — rebuild canonical record and verify invariants.format-convert event.ChatML preserves input.system, input.user, output.assistant, and output.tool_calls natively. Preserved via sidecar: id, task_type, full metadata, output.reasoning_trace (ChatML has no first-class CoT field — reasoning lives in sidecar unless using <thinking> tags).
<output>.metadata.yaml holds per-line: id, task_type, full metadata.*, output.reasoning_trace, and any context_refs / tools_available schemas that were not inlined into messages.
tool_use records round-trip without loss (native tool_calls used).--validate-round-trip reconstructs all canonical invariants.format-convert event logged with input/output/rejection counts.@agentic/code/addons/semantic-memory/skills/memory-log-append/SKILL.md — logging the format-convert event