بنقرة واحدة
azure-ml-llm-trainer
// Train or fine-tune LLMs on Azure ML managed compute with TRL trainers. Uses direct trainer loops (SFT, DPO, RL) without relying on serverless APIs or Hugging Face infrastructure.
// Train or fine-tune LLMs on Azure ML managed compute with TRL trainers. Uses direct trainer loops (SFT, DPO, RL) without relying on serverless APIs or Hugging Face infrastructure.
Generate synthetic and simulated datasets for evaluation and fine-tuning using Azure AI Foundry simulators. Create non-adversarial task data, adversarial safety data, and conversation datasets without manual data collection.
Evaluate generative AI applications and models locally or in the cloud using Azure AI Evaluation SDK. Measure quality, safety, and performance with built-in and custom evaluators.
| name | azure-ml-llm-trainer |
| description | Train or fine-tune LLMs on Azure ML managed compute with TRL trainers. Uses direct trainer loops (SFT, DPO, RL) without relying on serverless APIs or Hugging Face infrastructure. |
| license | See repository root |
This skill provides direct training on Azure ML managed compute using TRL trainers—an alternative to Azure AI Foundry's serverless fine-tuning APIs.
Four fine-tuning options in Azure AI Foundry:
create_finetuning_job() for Phi, Mistral; no compute setup neededUse this skill when:
These are templates in examples/ directory. Generate new files in your project based on these templates:
examples/submit_sft_job.py — Template for submitting SFT training jobsexamples/src/train_sft.py — Template for SFT trainer entry point (TRL SFTTrainer)examples/submit_dpo_job.py — Template for DPO training job submissionexamples/src/train_dpo.py — Template for DPO trainer entry point (TRL DPOTrainer)examples/submit_rl_job.py — Template for RL/PPO training job submissionexamples/src/train_rl.py — Template for RL trainer entry point (TRL PPOTrainer)examples/environment/conda.yml — Template for runtime dependencies (transformers, trl, datasets, torch)Do NOT reference these files directly. Copy and adapt them for your project structure.
examples/submit_sft_job.py to your project as submit_training.pyexamples/src/train_sft.py to your project as src/train_sft.pyexamples/environment/conda.yml to your project as environment/conda.yml"messages" field containing chat-completion format: {"messages": [{"role": "system", "content": "..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}. The trainer uses this field directly.python submit_training.py --compute <compute-name> --data-path <azureml://.../dataset.jsonl> --model-name azureml://registries/azureml/models/Phi-3-mini-4k-instruct/versions/1."chosen" and "rejected" fields with chat-completion format messages: {"chosen": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}], "rejected": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}.beta (default 0.1) controls KL penalty, l2_multiplier (default 0.1) for regularization.python sample/submit_dpo_job.py --compute <compute-name> --data-path <azureml://.../dpo.jsonl> --model-name azureml://registries/azureml/models/Phi-3-mini-4k-instruct/versions/1 --beta 0.1 --l2_multiplier 0.1."prompt" field (string) and optional "reward" (float) for explicit reward signals. If reward is missing, length-based reward shaping is used as fallback: {"prompt": "user message", "reward": 0.5}.python sample/submit_rl_job.py --compute <compute-name> --data-path <azureml://.../rl.jsonl> --model-name azureml://registries/azureml/models/Phi-3-mini-4k-instruct/versions/1.azureml://registries/azureml/models/Phi-3-mini-4k-instruct/versions/1). Avoid Hugging Face downloads.batch_size, learning_rate (default 2e-5), n_epochs (default 1), seedbeta (KL penalty, default 0.1), l2_multiplier (regularization, default 0.1)ppo_epochs, learning_rate, reward shaping via custom logicazureml:// URIs. Keep datasets in Azure; do not rely on external sources.| Criterion | Direct Training (This Skill) | Serverless API | Managed Compute | OpenAI API |
|---|---|---|---|---|
| Control | Full (trainer config, callbacks) | Limited | UI-based | Limited |
| Cost Model | Per compute hour | Per training tokens | Per training tokens | Per training tokens |
| Setup | Requires compute cluster | Automatic | Automatic | N/A (Azure OpenAI) |
| Supported Methods | SFT, DPO, RL/PPO (TRL) | SFT (mostly) | SFT | SFT, DPO, RL with graders |
| SDK/Programmatic | Yes (full MLClient) | Yes (Python) | Minimal (mostly UI) | Yes (OpenAI SDK) |
| Best for | Experimentation, research, custom loss | Production quick-start | Production (non-devs) | Production OpenAI models |