Skip to main content
Run any Skill in Manus
with one click

kl-trajectory-decoupling-llm-distillation

KL-Trajectory Decoupling methodology — unified theoretical framework decomposing LLM distillation into two orthogonal choices: prefix distribution (what to condition on) and trajectory distribution (how to generate responses). Reveals that SFT, DAgger, Offline RL, and On-Policy Distillation (OPD) differ along these two axes. Use when: analyzing distillation methods, choosing between SFT/Dagger/OPD/Offline-RL, designing new distillation algorithms, understanding KL divergence in LLM fine-tuning, RL vs imitation learning trade-offs. Activation: KL trajectory decoupling, LLM distillation framework, SFT DAgger OPD comparison, prefix trajectory decomposition, distillation design space, on-policy off-policy distillation.

Overview

KL-Trajectory Decoupling methodology — unified theoretical framework decomposing LLM distillation into two orthogonal choices: prefix distribution (what to condition on) and trajectory distribution (how to generate responses). Reveals that SFT, DAgger, Offline RL, and On-Policy Distillation (OPD) differ along these two axes. Use when: analyzing distillation methods, choosing between SFT/Dagger/OPD/Offline-RL, designing new distillation algorithms, understanding KL divergence in LLM fine-tuning, RL vs imitation learning trade-offs. Activation: KL trajectory decoupling, LLM distillation framework, SFT DAgger OPD comparison, prefix trajectory decomposition, distillation design space, on-policy off-policy distillation.

Install command
npx skills add https://github.com/hiyenwong/ai_collection --skill kl-trajectory-decoupling-llm-distillation

Copy and paste this command into Claude Code to install the skill

Stars1
Forks0
UpdatedJune 4, 2026 at 02:00
SKILL.md
readonly