intel-neural-compressor

Name: Intel Neural Compressor
Author: mkurman

// Intel Neural Compressor — SOTA low-bit LLM quantization (INT8/FP8/INT4/NVFP4), sparsity, pruning, and distillation for PyTorch, TensorFlow, and ONNX Runtime.

Executar no Manus

$ git log --oneline --stat

stars:308

forks:24

updated:12 de maio de 2026 às 17:36

SKILL.md

readonly

name	intel-neural-compressor
description	Intel Neural Compressor — SOTA low-bit LLM quantization (INT8/FP8/INT4/NVFP4), sparsity, pruning, and distillation for PyTorch, TensorFlow, and ONNX Runtime.
tags	["intel-neural-compressor","quantization","pruning","distillation","inc","intel","zorai"]

Overview

Intel Neural Compressor provides low-bit quantization (INT8, FP8, INT4, MXFP4, NVFP4), sparsity, pruning, and knowledge distillation for optimizing models on Intel hardware and beyond.

Installation

uv pip install neural-compressor

Basic Quantization

from neural_compressor import Quantization, config

# Post-training quantization
quantizer = Quantization(config)
q_model = quantizer(model)
q_model.save("quantized_model")

Pruning

from neural_compressor import Pruning

pruner = Pruning(model, config={"pruning_type": "snip_momentum", "target_sparsity": 0.3})
pruned_model = pruner.fit()

References

related-skills.json

mesmo repositório

parameter-golf-submission.md

from "mkurman/zorai"

Prepare and validate Parameter Golf record folders: self-contained train_gpt.py, README.md, submission.json, FineWeb SP1024 BPB accounting, artifact-size logging, run logs, and PR-ready folder hygiene.

2026-05-26308

runpod-parameter-golf.md

from "mkurman/zorai"

Run Parameter Golf competition submissions on RunPod GPU Pods. Covers required operator inputs, RunPod pod specs, FineWeb SP1024 data caching, record-folder hygiene, torchrun launch commands, monitoring, artifact-size checks, and result collection.

2026-05-26308

triton-kernel-programming.md

from "mkurman/zorai"

Hands-on implementation template and API reference for writing, tuning, debugging, and benchmarking Triton GPU kernels. Covers the full triton.language API surface, autotuning patterns, profiling workflows, and production integration.

2026-05-17308

triton-kernel-programming.md

Tencent AngelSlim — accessible, comprehensive, and efficient toolkit for large model compression. Quantization (FP8/INT4/NVFP4/1.25-bit), pruning, speculative decoding (Eagle3), and diffusion model compression.

2026-05-12308

distilqwen.md

from "mkurman/zorai"

DistilQwen2.5 — Alibaba's industrial practices for training distilled open lightweight language models. Knowledge distillation from Qwen2.5 72B into smaller 0.5B-7B models.

2026-05-12308

package.json