بنقرة واحدة
distilqwen
// DistilQwen2.5 — Alibaba's industrial practices for training distilled open lightweight language models. Knowledge distillation from Qwen2.5 72B into smaller 0.5B-7B models.
// DistilQwen2.5 — Alibaba's industrial practices for training distilled open lightweight language models. Knowledge distillation from Qwen2.5 72B into smaller 0.5B-7B models.
Prepare and validate Parameter Golf record folders: self-contained train_gpt.py, README.md, submission.json, FineWeb SP1024 BPB accounting, artifact-size logging, run logs, and PR-ready folder hygiene.
Run Parameter Golf competition submissions on RunPod GPU Pods. Covers required operator inputs, RunPod pod specs, FineWeb SP1024 data caching, record-folder hygiene, torchrun launch commands, monitoring, artifact-size checks, and result collection.
Hands-on implementation template and API reference for writing, tuning, debugging, and benchmarking Triton GPU kernels. Covers the full triton.language API surface, autotuning patterns, profiling workflows, and production integration.
Hands-on implementation template and API reference for writing, tuning, debugging, and benchmarking Triton GPU kernels. Covers the full triton.language API surface, autotuning patterns, profiling workflows, and production integration.
Tencent AngelSlim — accessible, comprehensive, and efficient toolkit for large model compression. Quantization (FP8/INT4/NVFP4/1.25-bit), pruning, speculative decoding (Eagle3), and diffusion model compression.
Intel Neural Compressor — SOTA low-bit LLM quantization (INT8/FP8/INT4/NVFP4), sparsity, pruning, and distillation for PyTorch, TensorFlow, and ONNX Runtime.
| name | distilqwen |
| description | DistilQwen2.5 — Alibaba's industrial practices for training distilled open lightweight language models. Knowledge distillation from Qwen2.5 72B into smaller 0.5B-7B models. |
| tags | ["distilqwen","knowledge-distillation","qwen","lightweight-llm","alibaba","zorai"] |
DistilQwen2.5 (Alibaba, arXiv:2504.15027) provides industrial practices for training distilled open lightweight LLMs. The approach distills Qwen2.5-72B-Instruct into smaller models (0.5B, 1.5B, 3B, 7B) with strong performance retention.
The distilled models are available on HuggingFace as distilqwen/distilqwen2.5-*-instruct and can be used directly:
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("distilqwen/distilqwen2.5-1.5b-instruct")
tokenizer = AutoTokenizer.from_pretrained("distilqwen/distilqwen2.5-1.5b-instruct")