تشغيل أي مهارة في Manus بنقرة واحدة

$pwd:

angelslim

Name: Angelslim
Author: mkurman

// Tencent AngelSlim — accessible, comprehensive, and efficient toolkit for large model compression. Quantization (FP8/INT4/NVFP4/1.25-bit), pruning, speculative decoding (Eagle3), and diffusion model compression.

تشغيل في Manus

$ git log --oneline --stat

stars:٣٠٨

forks:٢٤

updated:١٢ مايو ٢٠٢٦ في ١٧:٣٦

SKILL.md

readonly

name	angelslim
description	Tencent AngelSlim — accessible, comprehensive, and efficient toolkit for large model compression. Quantization (FP8/INT4/NVFP4/1.25-bit), pruning, speculative decoding (Eagle3), and diffusion model compression.
tags	["angelslim","model-compression","quantization","pruning","speculative-decoding","tencent","zorai"]

Overview

AngelSlim integrates mainstream compression algorithms into a unified framework with one-click access. Supports FP8/INT8/INT4/NVFP4/1.25-bit quantization, pruning, Eagle3 speculative decoding, and diffusion model compression for LLMs, VLMs, and audio models.

Installation

uv pip install angelslim

Basic Quantization (PTQ)

import angelslim as slim

# FP8 static quantization
model = slim.quantize(model, dtype="fp8_static", qconfig="default")

# INT4 GPTQ
model = slim.quantize(model, dtype="int4_gptq", dataset="wikitext2")

Compression Strategies

Method	Precision	Best For
FP8-Static/Dynamic	8-bit	General LLM deployment
INT4 GPTQ/AWQ/GPTAQ	4-bit	Memory-constrained serving
NVFP4	4-bit (NVIDIA)	Blackwell GPUs
Sherry	1.25-bit	Extreme compression
STQ1_0	1.25-bit	On-device deployment

Speculative Decoding (Eagle3)

# Train Eagle3 draft model
slim.eagle3.train(model, draft_model_config)

# Inference with Eagle3
output = model.generate_with_eagle3(input_ids, max_new_tokens=256)

References

related-skills.json

نفس المستودع

parameter-golf-submission.md

from "mkurman/zorai"

Prepare and validate Parameter Golf record folders: self-contained train_gpt.py, README.md, submission.json, FineWeb SP1024 BPB accounting, artifact-size logging, run logs, and PR-ready folder hygiene.

2026-05-26308

runpod-parameter-golf.md

from "mkurman/zorai"

Run Parameter Golf competition submissions on RunPod GPU Pods. Covers required operator inputs, RunPod pod specs, FineWeb SP1024 data caching, record-folder hygiene, torchrun launch commands, monitoring, artifact-size checks, and result collection.

2026-05-26308

triton-kernel-programming.md

from "mkurman/zorai"

Hands-on implementation template and API reference for writing, tuning, debugging, and benchmarking Triton GPU kernels. Covers the full triton.language API surface, autotuning patterns, profiling workflows, and production integration.

2026-05-17308

triton-kernel-programming.md

DistilQwen2.5 — Alibaba's industrial practices for training distilled open lightweight language models. Knowledge distillation from Qwen2.5 72B into smaller 0.5B-7B models.

2026-05-12308

intel-neural-compressor.md

from "mkurman/zorai"

Intel Neural Compressor — SOTA low-bit LLM quantization (INT8/FP8/INT4/NVFP4), sparsity, pruning, and distillation for PyTorch, TensorFlow, and ONNX Runtime.

2026-05-12308

package.json

"author": "mkurman"

"repository": "mkurman/zorai"

فتح مستودع GitHub عرض مستودعات المنشئ

$ install --global

$ download --local

تشغيل في Manus

$ useful --forSOC

علماء البياناتمهن الحاسوب والرياضيات15-2051L4

name	angelslim
description	Tencent AngelSlim — accessible, comprehensive, and efficient toolkit for large model compression. Quantization (FP8/INT4/NVFP4/1.25-bit), pruning, speculative decoding (Eagle3), and diffusion model compression.
tags	["angelslim","model-compression","quantization","pruning","speculative-decoding","tencent","zorai"]

Overview

Installation

uv pip install angelslim

Basic Quantization (PTQ)

import angelslim as slim

# FP8 static quantization
model = slim.quantize(model, dtype="fp8_static", qconfig="default")

# INT4 GPTQ
model = slim.quantize(model, dtype="int4_gptq", dataset="wikitext2")

Compression Strategies

Method	Precision	Best For
FP8-Static/Dynamic	8-bit	General LLM deployment
INT4 GPTQ/AWQ/GPTAQ	4-bit	Memory-constrained serving
NVFP4	4-bit (NVIDIA)	Blackwell GPUs
Sherry	1.25-bit	Extreme compression
STQ1_0	1.25-bit	On-device deployment

Speculative Decoding (Eagle3)

# Train Eagle3 draft model
slim.eagle3.train(model, draft_model_config)

# Inference with Eagle3
output = model.generate_with_eagle3(input_ids, max_new_tokens=256)

angelslim

Overview

Installation

Basic Quantization (PTQ)

Compression Strategies

Speculative Decoding (Eagle3)

References

المزيد من هذا المستودع

المزيد من هذا المستودع

Overview

Installation

Basic Quantization (PTQ)

Compression Strategies

Speculative Decoding (Eagle3)

References