تشغيل أي مهارة في Manus بنقرة واحدة

ابدأ الآن

$pwd:

format-adapter-chatml

Name: Format Adapter Chatml
Author: jmagly

// Convert canonical training examples to ChatML format for training frameworks

تشغيل في Manus

$ git log --oneline --stat

stars:١

forks:١

updated:١٥ أبريل ٢٠٢٦ في ٠٦:١١

SKILL.md

readonly

name	format-adapter-chatml
description	Convert canonical training examples to ChatML format for training frameworks
namespace	training-complete
category	format
platforms	["claude","copilot","cursor","factory","windsurf","warp","codex","opencode","openclaw","hermes"]
commandHint	{"argumentHint":"<input-glob> [--output <path>] [--validate-round-trip]"}

format-adapter-chatml

Convert canonical training example records (@agentic/code/frameworks/training-complete/schemas/example-record.yaml) into ChatML / OpenAI messages format — the native structure used by OpenAI fine-tuning, most modern chat models, and HuggingFace apply_chat_template.

When to Use

Fine-tuning on OpenAI-compatible APIs (gpt-4o-mini, etc.)
Training chat models with HuggingFace SFTTrainer and a ChatML tokenizer template
Preserving native tool_calls structure without serialization losses

Parameters

<input-glob> (required) — glob of canonical records
--output <path> (optional) — default: .aiwg/training/exports/chatml-<timestamp>.jsonl
--validate-round-trip (optional) — reload output and verify invariants

Format Spec

One JSON object per line containing a messages array with typed roles:

{"messages": [{"role": "system", "content": "You are helpful."}, {"role": "user", "content": "What time is it?"}, {"role": "assistant", "content": null, "tool_calls": [{"id": "t1", "type": "function", "function": {"name": "now", "arguments": "{}"}}]}, {"role": "tool", "tool_call_id": "t1", "content": "12:00"}]}

Roles: system | user | assistant | tool. Native tool_calls on assistant messages.

Operation

Load canonical records — validate against schema.
Transform — build messages array:
- input.system → {role: "system", content: ...} (if present)
- input.user → {role: "user", content: ...}
- output.assistant → {role: "assistant", content: ...} with native tool_calls attached
- Tool results (from tool_use chains) → {role: "tool", tool_call_id, content}
Validate target — OpenAI messages schema: non-empty content OR non-empty tool_calls on assistant; valid role enum.
Round-trip check (if --validate-round-trip) — rebuild canonical record and verify invariants.
Write output + log — emit JSONL, write sidecar, append format-convert event.

Round-Trip Invariants

ChatML preserves input.system, input.user, output.assistant, and output.tool_calls natively. Preserved via sidecar: id, task_type, full metadata, output.reasoning_trace (ChatML has no first-class CoT field — reasoning lives in sidecar unless using <thinking> tags).

Sidecar Metadata

<output>.metadata.yaml holds per-line: id, task_type, full metadata.*, output.reasoning_trace, and any context_refs / tools_available schemas that were not inlined into messages.

Acceptance Criteria

Every canonical record emits a valid ChatML message sequence (or is rejected with a logged reason).
tool_use records round-trip without loss (native tool_calls used).
--validate-round-trip reconstructs all canonical invariants.
format-convert event logged with input/output/rejection counts.

References

REF-472 — ChatML / ShareGPT / OpenAI messages format comparison
ADR-022 D7 — canonical + adapter strategy

Delegation

@agentic/code/addons/semantic-memory/skills/memory-log-append/SKILL.md — logging the format-convert event

related-skills.json

نفس المستودع

dataset-docs.md

from "jmagly/aiwg-training"

Generate Datasheet, Model Card, and Data Statement from a dataset manifest

2026-04-151

dataset-reproduce.md

from "jmagly/aiwg-training"

Deterministically rebuild a dataset from its manifest and verify fixity equivalence

2026-04-151

dataset-version.md

from "jmagly/aiwg-training"

Create a versioned training dataset with manifest, fixity, provenance, and archive snapshot

2026-04-151

flow-dataset-build.md

from "jmagly/aiwg-training"

End-to-end training dataset pipeline — acquire sources through publication

2026-04-151

decontamination-check.md

from "jmagly/aiwg-training"

Detect training-eval overlap against benchmark sets before dataset publication

2026-04-151

example-synthesizer.md

from "jmagly/aiwg-training"

Generate SFT training examples from raw sources using Self-Instruct / Evol-Instruct / SQuAD / STaR patterns

2026-04-151

package.json

"author": "jmagly"

"repository": "jmagly/aiwg-training"

فتح مستودع GitHub عرض مستودعات المنشئ

$ install --global

$ download --local

تشغيل في Manus

$ useful --forSOC

علماء البياناتمهن الحاسوب والرياضيات15-2051L4

name	format-adapter-chatml
description	Convert canonical training examples to ChatML format for training frameworks
namespace	training-complete
category	format
platforms	["claude","copilot","cursor","factory","windsurf","warp","codex","opencode","openclaw","hermes"]
commandHint	{"argumentHint":"<input-glob> [--output <path>] [--validate-round-trip]"}

format-adapter-chatml

When to Use

Fine-tuning on OpenAI-compatible APIs (gpt-4o-mini, etc.)
Training chat models with HuggingFace SFTTrainer and a ChatML tokenizer template
Preserving native tool_calls structure without serialization losses

Parameters

<input-glob> (required) — glob of canonical records
--output <path> (optional) — default: .aiwg/training/exports/chatml-<timestamp>.jsonl
--validate-round-trip (optional) — reload output and verify invariants

Format Spec

One JSON object per line containing a messages array with typed roles:

{"messages": [{"role": "system", "content": "You are helpful."}, {"role": "user", "content": "What time is it?"}, {"role": "assistant", "content": null, "tool_calls": [{"id": "t1", "type": "function", "function": {"name": "now", "arguments": "{}"}}]}, {"role": "tool", "tool_call_id": "t1", "content": "12:00"}]}

Roles: system | user | assistant | tool. Native tool_calls on assistant messages.

Operation

Load canonical records — validate against schema.
Transform — build messages array:
- input.system → {role: "system", content: ...} (if present)
- input.user → {role: "user", content: ...}
- output.assistant → {role: "assistant", content: ...} with native tool_calls attached
- Tool results (from tool_use chains) → {role: "tool", tool_call_id, content}
Validate target — OpenAI messages schema: non-empty content OR non-empty tool_calls on assistant; valid role enum.
Round-trip check (if --validate-round-trip) — rebuild canonical record and verify invariants.
Write output + log — emit JSONL, write sidecar, append format-convert event.

Round-Trip Invariants

Sidecar Metadata

<output>.metadata.yaml holds per-line: id, task_type, full metadata.*, output.reasoning_trace, and any context_refs / tools_available schemas that were not inlined into messages.

Acceptance Criteria

Every canonical record emits a valid ChatML message sequence (or is rejected with a logged reason).
tool_use records round-trip without loss (native tool_calls used).
--validate-round-trip reconstructs all canonical invariants.
format-convert event logged with input/output/rejection counts.

References

REF-472 — ChatML / ShareGPT / OpenAI messages format comparison
ADR-022 D7 — canonical + adapter strategy

Delegation

@agentic/code/addons/semantic-memory/skills/memory-log-append/SKILL.md — logging the format-convert event

format-adapter-chatml

format-adapter-chatml

When to Use

Parameters

Format Spec

Operation

Round-Trip Invariants

Sidecar Metadata

Acceptance Criteria

References

Delegation

المزيد من هذا المستودع

المزيد من هذا المستودع

format-adapter-chatml

When to Use

Parameters

Format Spec

Operation

Round-Trip Invariants

Sidecar Metadata

Acceptance Criteria

References

Delegation