name	azure-ml-llm-trainer
description	Train or fine-tune LLMs on Azure ML managed compute with TRL trainers. Uses direct trainer loops (SFT, DPO, RL) without relying on serverless APIs or Hugging Face infrastructure.
license	See repository root

Azure ML LLM Trainer

This skill provides direct training on Azure ML managed compute using TRL trainers—an alternative to Azure AI Foundry's serverless fine-tuning APIs.

Four fine-tuning options in Azure AI Foundry:

Serverless API (Foundry models) — Use create_finetuning_job() for Phi, Mistral; no compute setup needed
OpenAI API (OpenAI models) — Use OpenAI SDK with Azure endpoint for GPT-4o, GPT-4 Turbo
Managed Compute (Portal UI) — Web UI–driven fine-tuning with automatic compute provisioning; limited SDK
Direct Training (This Skill) — Run TRL trainers on your own Azure ML compute for full control and transparency

Use this skill when:

You need full control over training loops and hyperparameters
You want to use TRL (Transformer Reinforcement Learning) methods directly
You prefer running on your own compute resources (no vendor lock-in)
You want to experiment with advanced training techniques (LoRA, gradient checkpointing, etc.)

Template Files

These are templates in examples/ directory. Generate new files in your project based on these templates:

examples/submit_sft_job.py — Template for submitting SFT training jobs
examples/src/train_sft.py — Template for SFT trainer entry point (TRL SFTTrainer)
examples/submit_dpo_job.py — Template for DPO training job submission
examples/src/train_dpo.py — Template for DPO trainer entry point (TRL DPOTrainer)
examples/submit_rl_job.py — Template for RL/PPO training job submission
examples/src/train_rl.py — Template for RL trainer entry point (TRL PPOTrainer)
examples/environment/conda.yml — Template for runtime dependencies (transformers, trl, datasets, torch)

Do NOT reference these files directly. Copy and adapt them for your project structure.

Quick start

az login then set AZURE_SUBSCRIPTION_ID, AZURE_RESOURCE_GROUP, AZUREML_WORKSPACE_NAME.
Create training files in your project:
- Copy examples/submit_sft_job.py to your project as submit_training.py
- Copy examples/src/train_sft.py to your project as src/train_sft.py
- Copy examples/environment/conda.yml to your project as environment/conda.yml
Upload a JSONL dataset to a workspace datastore (workspaceblobstore or your own). SFT dataset format: Each line must be valid JSON with a "messages" field containing chat-completion format: {"messages": [{"role": "system", "content": "..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}. The trainer uses this field directly.
Ensure a compute target exists (GPU recommended, for example gpu-cluster).
Submit: python submit_training.py --compute <compute-name> --data-path <azureml://.../dataset.jsonl> --model-name azureml://registries/azureml/models/Phi-3-mini-4k-instruct/versions/1.
Monitor in Azure ML studio; trained weights land in the job output folder.

DPO quick start

Dataset format (JSONL): Each line must contain "chosen" and "rejected" fields with chat-completion format messages: {"chosen": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}], "rejected": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}.
Hyperparameters: beta (default 0.1) controls KL penalty, l2_multiplier (default 0.1) for regularization.
Submit: python sample/submit_dpo_job.py --compute <compute-name> --data-path <azureml://.../dpo.jsonl> --model-name azureml://registries/azureml/models/Phi-3-mini-4k-instruct/versions/1 --beta 0.1 --l2_multiplier 0.1.

RL (PPO-style) quick start

Dataset format (JSONL): Each line must contain a "prompt" field (string) and optional "reward" (float) for explicit reward signals. If reward is missing, length-based reward shaping is used as fallback: {"prompt": "user message", "reward": 0.5}.
Submit: python sample/submit_rl_job.py --compute <compute-name> --data-path <azureml://.../rl.jsonl> --model-name azureml://registries/azureml/models/Phi-3-mini-4k-instruct/versions/1.

Notes

Why direct training? Serverless APIs abstract away training details; direct training gives you full control over trainer config, callbacks, checkpointing, and custom loss functions.
Model source: Use fine-tuning-enabled base models from Azure AI Foundry model catalog (e.g., azureml://registries/azureml/models/Phi-3-mini-4k-instruct/versions/1). Avoid Hugging Face downloads.
Hyperparameters:
- SFT: batch_size, learning_rate (default 2e-5), n_epochs (default 1), seed
- DPO: Add beta (KL penalty, default 0.1), l2_multiplier (regularization, default 0.1)
- RL/PPO: ppo_epochs, learning_rate, reward shaping via custom logic
Data: Must be in Azure ML datastores as JSONL; referenced via azureml:// URIs. Keep datasets in Azure; do not rely on external sources.
Artifacts: Trained models saved to job output folder; register as Azure AI Foundry model for deployment or further fine-tuning.

When to Use This vs Other Fine-Tuning Methods

Criterion	Direct Training (This Skill)	Serverless API	Managed Compute	OpenAI API
Control	Full (trainer config, callbacks)	Limited	UI-based	Limited
Cost Model	Per compute hour	Per training tokens	Per training tokens	Per training tokens
Setup	Requires compute cluster	Automatic	Automatic	N/A (Azure OpenAI)
Supported Methods	SFT, DPO, RL/PPO (TRL)	SFT (mostly)	SFT	SFT, DPO, RL with graders
SDK/Programmatic	Yes (full MLClient)	Yes (Python)	Minimal (mostly UI)	Yes (OpenAI SDK)
Best for	Experimentation, research, custom loss	Production quick-start	Production (non-devs)	Production OpenAI models

name	azure-ml-llm-trainer
description	Train or fine-tune LLMs on Azure ML managed compute with TRL trainers. Uses direct trainer loops (SFT, DPO, RL) without relying on serverless APIs or Hugging Face infrastructure.
license	See repository root

Azure ML LLM Trainer

This skill provides direct training on Azure ML managed compute using TRL trainers—an alternative to Azure AI Foundry's serverless fine-tuning APIs.

Four fine-tuning options in Azure AI Foundry:

Serverless API (Foundry models) — Use create_finetuning_job() for Phi, Mistral; no compute setup needed
OpenAI API (OpenAI models) — Use OpenAI SDK with Azure endpoint for GPT-4o, GPT-4 Turbo
Managed Compute (Portal UI) — Web UI–driven fine-tuning with automatic compute provisioning; limited SDK
Direct Training (This Skill) — Run TRL trainers on your own Azure ML compute for full control and transparency

Use this skill when:

You need full control over training loops and hyperparameters
You want to use TRL (Transformer Reinforcement Learning) methods directly
You prefer running on your own compute resources (no vendor lock-in)
You want to experiment with advanced training techniques (LoRA, gradient checkpointing, etc.)

Template Files

These are templates in examples/ directory. Generate new files in your project based on these templates:

examples/submit_sft_job.py — Template for submitting SFT training jobs
examples/src/train_sft.py — Template for SFT trainer entry point (TRL SFTTrainer)
examples/submit_dpo_job.py — Template for DPO training job submission
examples/src/train_dpo.py — Template for DPO trainer entry point (TRL DPOTrainer)
examples/submit_rl_job.py — Template for RL/PPO training job submission
examples/src/train_rl.py — Template for RL trainer entry point (TRL PPOTrainer)
examples/environment/conda.yml — Template for runtime dependencies (transformers, trl, datasets, torch)

Do NOT reference these files directly. Copy and adapt them for your project structure.

Quick start

az login then set AZURE_SUBSCRIPTION_ID, AZURE_RESOURCE_GROUP, AZUREML_WORKSPACE_NAME.
Create training files in your project:
- Copy examples/submit_sft_job.py to your project as submit_training.py
- Copy examples/src/train_sft.py to your project as src/train_sft.py
- Copy examples/environment/conda.yml to your project as environment/conda.yml
Upload a JSONL dataset to a workspace datastore (workspaceblobstore or your own). SFT dataset format: Each line must be valid JSON with a "messages" field containing chat-completion format: {"messages": [{"role": "system", "content": "..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}. The trainer uses this field directly.
Ensure a compute target exists (GPU recommended, for example gpu-cluster).
Submit: python submit_training.py --compute <compute-name> --data-path <azureml://.../dataset.jsonl> --model-name azureml://registries/azureml/models/Phi-3-mini-4k-instruct/versions/1.
Monitor in Azure ML studio; trained weights land in the job output folder.

DPO quick start

Dataset format (JSONL): Each line must contain "chosen" and "rejected" fields with chat-completion format messages: {"chosen": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}], "rejected": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}.
Hyperparameters: beta (default 0.1) controls KL penalty, l2_multiplier (default 0.1) for regularization.
Submit: python sample/submit_dpo_job.py --compute <compute-name> --data-path <azureml://.../dpo.jsonl> --model-name azureml://registries/azureml/models/Phi-3-mini-4k-instruct/versions/1 --beta 0.1 --l2_multiplier 0.1.

RL (PPO-style) quick start

Dataset format (JSONL): Each line must contain a "prompt" field (string) and optional "reward" (float) for explicit reward signals. If reward is missing, length-based reward shaping is used as fallback: {"prompt": "user message", "reward": 0.5}.
Submit: python sample/submit_rl_job.py --compute <compute-name> --data-path <azureml://.../rl.jsonl> --model-name azureml://registries/azureml/models/Phi-3-mini-4k-instruct/versions/1.

Notes

Why direct training? Serverless APIs abstract away training details; direct training gives you full control over trainer config, callbacks, checkpointing, and custom loss functions.
Model source: Use fine-tuning-enabled base models from Azure AI Foundry model catalog (e.g., azureml://registries/azureml/models/Phi-3-mini-4k-instruct/versions/1). Avoid Hugging Face downloads.
Hyperparameters:
- SFT: batch_size, learning_rate (default 2e-5), n_epochs (default 1), seed
- DPO: Add beta (KL penalty, default 0.1), l2_multiplier (regularization, default 0.1)
- RL/PPO: ppo_epochs, learning_rate, reward shaping via custom logic
Data: Must be in Azure ML datastores as JSONL; referenced via azureml:// URIs. Keep datasets in Azure; do not rely on external sources.
Artifacts: Trained models saved to job output folder; register as Azure AI Foundry model for deployment or further fine-tuning.

When to Use This vs Other Fine-Tuning Methods

Criterion	Direct Training (This Skill)	Serverless API	Managed Compute	OpenAI API
Control	Full (trainer config, callbacks)	Limited	UI-based	Limited
Cost Model	Per compute hour	Per training tokens	Per training tokens	Per training tokens
Setup	Requires compute cluster	Automatic	Automatic	N/A (Azure OpenAI)
Supported Methods	SFT, DPO, RL/PPO (TRL)	SFT (mostly)	SFT	SFT, DPO, RL with graders
SDK/Programmatic	Yes (full MLClient)	Yes (Python)	Minimal (mostly UI)	Yes (OpenAI SDK)
Best for	Experimentation, research, custom loss	Production quick-start	Production (non-devs)	Production OpenAI models

azure-ml-llm-trainer

Azure ML LLM Trainer

Template Files

Quick start

DPO quick start

RL (PPO-style) quick start

Notes

When to Use This vs Other Fine-Tuning Methods

المزيد من هذا المستودع

المزيد من هذا المستودع

Azure ML LLM Trainer

Template Files

Quick start

DPO quick start

RL (PPO-style) quick start

Notes

When to Use This vs Other Fine-Tuning Methods