Skip to main content
Run any Skill in Manus
with one click

awq-quantization

Stars9,996
Forks745
UpdatedNovember 25, 2025 at 21:59

Activation-aware weight quantization for 4-bit LLM compression with 3x speedup and minimal accuracy loss. Use when deploying large models (7B-70B) on limited GPU memory, when you need faster inference than GPTQ with better accuracy preservation, or for instruction-tuned and multimodal models. MLSys 2024 Best Paper Award winner.

Installation

Install with Codex or Claude Copy this prompt, paste it into Codex, Claude, or another assistant, and let it review the skill page and install it for you.

File Explorer
3 files
SKILL.md
readonly