ワンクリックでManusで任意のスキルを実行

$pwd:

knowledge-distillation

Name: Knowledge Distillation
Author: mkurman

// Knowledge distillation techniques for model compression: logit-level, feature-level, and relation-based distillation. KD-Lib library and practical workflows for training student models.

Manusで実行

$ git log --oneline --stat

stars:308

forks:24

updated:2026年5月12日 17:36

SKILL.md

readonly

name	knowledge-distillation
description	Knowledge distillation techniques for model compression: logit-level, feature-level, and relation-based distillation. KD-Lib library and practical workflows for training student models.
tags	["knowledge-distillation","model-compression","student-teacher","kd-lib","pytorch","zorai"]

Overview

Knowledge distillation transfers knowledge from a larger teacher model to a smaller student model. Combined with pruning and quantization, it forms the critical middle step in the P-KD-Q compression pipeline.

Installation

uv pip install kd-lib

Logit Distillation

from kd_lib import distill
import torch.nn.functional as F

def kd_loss(student_logits, teacher_logits, labels, temperature=4.0, alpha=0.5):
    soft_targets = F.softmax(teacher_logits / temperature, dim=-1)
    soft_prob = F.log_softmax(student_logits / temperature, dim=-1)
    kd = F.kl_div(soft_prob, soft_targets, reduction="batchmean") * (temperature ** 2)
    ce = F.cross_entropy(student_logits, labels)
    return alpha * kd + (1 - alpha) * ce

Distillation Methods

Method	What it transfers	Best For
Logit distillation	Output probability distribution	Classification, generation
Feature distillation	Intermediate hidden states	Transformer layers
Relation distillation	Relationships between representations	Structured outputs
Self-distillation	Model teaches itself	No teacher needed
Online distillation	Teacher & student train jointly	Both models improve

References

related-skills.json

同じリポジトリ

parameter-golf-submission.md

from "mkurman/zorai"

Prepare and validate Parameter Golf record folders: self-contained train_gpt.py, README.md, submission.json, FineWeb SP1024 BPB accounting, artifact-size logging, run logs, and PR-ready folder hygiene.

2026-05-26308

runpod-parameter-golf.md

from "mkurman/zorai"

Run Parameter Golf competition submissions on RunPod GPU Pods. Covers required operator inputs, RunPod pod specs, FineWeb SP1024 data caching, record-folder hygiene, torchrun launch commands, monitoring, artifact-size checks, and result collection.

2026-05-26308

triton-kernel-programming.md

from "mkurman/zorai"

Hands-on implementation template and API reference for writing, tuning, debugging, and benchmarking Triton GPU kernels. Covers the full triton.language API surface, autotuning patterns, profiling workflows, and production integration.

2026-05-17308

triton-kernel-programming.md

Tencent AngelSlim — accessible, comprehensive, and efficient toolkit for large model compression. Quantization (FP8/INT4/NVFP4/1.25-bit), pruning, speculative decoding (Eagle3), and diffusion model compression.

2026-05-12308

distilqwen.md

from "mkurman/zorai"

DistilQwen2.5 — Alibaba's industrial practices for training distilled open lightweight language models. Knowledge distillation from Qwen2.5 72B into smaller 0.5B-7B models.

2026-05-12308

package.json

"author": "mkurman"

"repository": "mkurman/zorai"

GitHub リポジトリを開く Creator のリポジトリを見る

$ install --global

$ download --local

Manusで実行

$ useful --forSOC

データサイエンティストコンピュータ・数学職15-2051L4

name	knowledge-distillation
description	Knowledge distillation techniques for model compression: logit-level, feature-level, and relation-based distillation. KD-Lib library and practical workflows for training student models.
tags	["knowledge-distillation","model-compression","student-teacher","kd-lib","pytorch","zorai"]

Overview

Installation

uv pip install kd-lib

Logit Distillation

from kd_lib import distill
import torch.nn.functional as F

def kd_loss(student_logits, teacher_logits, labels, temperature=4.0, alpha=0.5):
    soft_targets = F.softmax(teacher_logits / temperature, dim=-1)
    soft_prob = F.log_softmax(student_logits / temperature, dim=-1)
    kd = F.kl_div(soft_prob, soft_targets, reduction="batchmean") * (temperature ** 2)
    ce = F.cross_entropy(student_logits, labels)
    return alpha * kd + (1 - alpha) * ce

Distillation Methods

Method	What it transfers	Best For
Logit distillation	Output probability distribution	Classification, generation
Feature distillation	Intermediate hidden states	Transformer layers
Relation distillation	Relationships between representations	Structured outputs
Self-distillation	Model teaches itself	No teacher needed
Online distillation	Teacher & student train jointly	Both models improve

knowledge-distillation

Overview

Installation

Logit Distillation

Distillation Methods

References

このリポジトリの他の Skills

Overview

Installation

Logit Distillation

Distillation Methods

References

このリポジトリの他の Skills