원클릭으로
train-dpo
// Direct Preference Optimization (DPO) fine-tune with TRL `DPOTrainer`. Triggered when the user wants to align a model on preferences / pairwise comparisons / chosen-vs-rejected data, or improve an existing SFT checkpoint with a preference dataset.