Run any Skill in Manus with one click

$pwd:

add-vlm-model

Name: Add Vlm Model
Author: intel

// Add support for a new Vision-Language Model (VLM) to AutoRound, including multimodal block handler, calibration dataset template, and special model handling. Use when integrating a new VLM like LLaVA, Qwen2-VL, GLM-Image, Phi-Vision, or similar multi-modal models for quantization.

Run Skill in Manus

$ git log --oneline --stat

stars:1,425

forks:134

updated:May 14, 2026 at 01:40

SKILL.md

readonly

related-skills.json

same repository

adapt-new-diffusion-model.md

from "intel/auto-round"

Adapt AutoRound to support a new diffusion model architecture (DiT, UNet, hybrid AR+DiT). Use when a new diffusion model fails quantization, needs custom output configs, requires a custom pipeline function, or is a hybrid architecture with both autoregressive and diffusion components.

2026-05-141.4k

adapt-new-llm.md

from "intel/auto-round"

Adapt AutoRound to support a new LLM architecture that doesn't work out-of-the-box. Use when quantization fails for a new model type, block detection doesn't find layers, MoE models need unfusing, custom forward passes are needed, or non-standard linear layer types need handling.

2026-05-141.4k

add-inference-backend.md

from "intel/auto-round"

Add a new hardware inference backend to AutoRound for deploying quantized models (e.g., CUDA/Marlin, Triton, CPU, HPU, ARK). Use when implementing QuantLinear kernels, registering backend capabilities, or enabling quantized model inference on a new hardware platform.

2026-05-111.4k

add-export-format.md

from "intel/auto-round"

Add a new model export format to AutoRound (e.g., auto_round, auto_gptq, auto_awq, gguf, llm_compressor). Use when implementing a new quantized model serialization format, adding a new packing method, or extending export compatibility for deployment frameworks like vLLM, SGLang, or llama.cpp.

2026-04-171.4k

add-quantization-datatype.md

from "intel/auto-round"

Add a new quantization data type to AutoRound (e.g., INT, FP8, MXFP, NVFP, GGUF variants). Use when implementing a new weight/activation quantization scheme, registering a new quant function, or extending the data_type registry.

2026-04-171.4k

review-pr.md

from "intel/auto-round"

Review a pull request for the AutoRound repository with a structured checklist covering code quality, test coverage, documentation, Chinese translations, and quantization-specific concerns. Use when reviewing or preparing to submit a PR.

2026-04-171.4k

package.json

"author": "intel"

"repository": "intel/auto-round"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Software DevelopersComputer and Mathematical Occupations15-1252L4

name	add-vlm-model
description	Add support for a new Vision-Language Model (VLM) to AutoRound, including multimodal block handler, calibration dataset template, and special model handling. Use when integrating a new VLM like LLaVA, Qwen2-VL, GLM-Image, Phi-Vision, or similar multi-modal models for quantization.

Adding a New Vision-Language Model to AutoRound

Overview

This skill guides you through adding support for a new Vision-Language Model (VLM) to AutoRound. VLMs require special handling because they typically have separate vision encoder and language model components, and calibration may need multi-modal data.

The integration involves three parts:

Multimodal Block Handler — Tell AutoRound how to find quantizable blocks
MLLM Calibration Path — Ensure MLLMCalibrator can build and feed calibration samples
Special Model Handler — Handle model-specific forward pass quirks

Prerequisites

Before starting, determine:

Model architecture: What sub-modules exist? (vision encoder, projector, language model, audio tower, etc.)
Model type: The model_type string from config.json
Block structure: Where are the transformer layers? (e.g., model.layers, thinker.model.layers, language_model.layers)
Text-only support: Can the model be calibrated with text-only data?
Batch size limitations: Does the VLM have restrictions on batch size?

Step 1: Add Multimodal Block Handler

Edit auto_round/special_model_handler.py:

1a. Create a block discovery function

def _get_your_vlm_multimodal_block(model, quant_vision=False):
    """Get block names for YourVLM model.

    YourVLM structure:
    - model.vision_encoder.blocks: vision encoder
    - model.projector.layers: vision-language projector
    - model.language_model.layers: text decoder

    By default, only the text decoder is quantized. Set quant_vision=True
    to include vision encoder and projector blocks.
    """
    block_names = []

    if quant_vision:
        if hasattr(model, "model") and hasattr(model.model, "vision_encoder"):
            if hasattr(model.model.vision_encoder, "blocks"):
                block_names.append(
                    [f"model.vision_encoder.blocks.{i}" for i in range(len(model.model.vision_encoder.blocks))]
                )
        # Add projector if it has quantizable layers
        if hasattr(model, "model") and hasattr(model.model, "projector"):
            if hasattr(model.model.projector, "layers"):
                block_names.append([f"model.projector.layers.{i}" for i in range(len(model.model.projector.layers))])

    # Language model layers (always quantized)
    if hasattr(model, "model") and hasattr(model.model, "language_model"):
        if hasattr(model.model.language_model, "layers"):
            block_names.append(
                [f"model.language_model.layers.{i}" for i in range(len(model.model.language_model.layers))]
            )

    return block_names

1b. Register in the `SPECIAL_MULTIMODAL_BLOCK` dict

Find the SPECIAL_MULTIMODAL_BLOCK dictionary (in special_model_handler.py) and add your model:

SPECIAL_MULTIMODAL_BLOCK["your_vlm"] = _get_your_vlm_multimodal_block

The key must match the model_type from the model's config.json.

1c. Add to support lists

# If your VLM supports text-only calibration (most do):
SUPPORT_ONLY_TEXT_MODELS.append("your_vlm")

# If your VLM has batch size limitations:
mllms_with_limited_bs = (
    ...,
    "your_vlm",
)

Step 2: Wire MLLM Calibration

The new architecture routes multimodal calibration through:

auto_round/compressors/mllm_mixin.py for compressor construction and calibrator selection
auto_round/calibration/mllm.py for template selection, dataloader creation, and calibration forward calls
auto_round/special_model_handler.py for multimodal block discovery and special forwards

If your model works with an existing template/processor, prefer passing template=..., processor=..., or image_processor=... through AutoRound / ExtraConfig instead of adding compressor code.

Step 3: Add Calibration Template

The built-in MLLM template and processor registries live in auto_round/compressors/mllm/ and are consumed by the new architecture through MLLMCalibrator. When adding a new built-in template, keep the new-architecture caller in mind: auto_round/calibration/mllm.py will load it via get_template().

3a. Create template JSON

Create a template JSON file in auto_round/compressors/mllm/templates/:

{
    "model_type": "your_vlm",
    "format_user": "<|user|>\n{content}\n",
    "format_assistant": "<|assistant|>\n{content}\n",
    "format_system": "<|system|>\n{content}\n",
    "format_observation": "",
    "system": "",
    "separator": "",
    "stop_words": ["<|end|>"]
}

Adjust the template fields to match your model's chat format. Check the model's tokenizer_config.json or documentation for the correct chat template.

3b. Register the template

_register_template(
    "your_vlm",
    default_dataset="liuhaotian/llava_conv_58k",  # or appropriate dataset
    processor=PROCESSORS["default"],  # or a custom processor
)

3c. Add a custom processor (if needed)

If your model requires special image/prompt processing for calibration, create a processor in auto_round/compressors/mllm/processor.py, which is used by MLLMCalibrator:

def _your_vlm_processor(raw_data, model_path, seqlen, processor=None, **kwargs):
    """Process calibration data for YourVLM.

    Args:
        raw_data: Dataset samples
        model_path: Path to the model
        seqlen: Sequence length for calibration
        processor: The model's processor

    Returns:
        list: Processed samples ready for calibration
    """
    # Build prompts with images and text
    ...

PROCESSORS["your_vlm"] = _your_vlm_processor

Step 4: Handle Special Forward Pass (If Needed)

If your VLM's forward() method is non-standard (e.g., requires special kwargs, has multiple model components that need separate handling), add a custom forward wrapper in special_model_handler.py:

def _your_vlm_forward(model, **kwargs):
    """Custom forward pass for YourVLM during calibration."""
    # Handle special input processing
    # Route inputs to correct sub-models
    return model.language_model(**kwargs)

def _handle_special_model(model):
    ...
    if hasattr(model, "config") and model.config.model_type == "your_vlm":
        from functools import partial

        model.forward = partial(_your_vlm_forward, model)
    return model

Step 5: Add Custom Calibration Dataset (Optional)

If your model needs a specialized calibration dataset loader, create one in auto_round/calib_dataset.py using the @register_dataset decorator:

@register_dataset("your_vlm_dataset")
class YourVLMDataset:
    def __init__(self, dataset_name, model_path, seqlen, **kwargs): ...

    def __len__(self):
        return len(self.data)

    def __iter__(self):
        for sample in self.data:
            yield sample

Step 6: Test

def test_your_vlm_quantization():
    model_name = "your-org/your-vlm-small"
    ar = AutoRound(
        model_name,
        bits=4,
        group_size=128,
        iters=2,
        nsamples=2,
        quant_nontext_module=False,  # text-only quantization
    )
    compressed_model, _ = ar.quantize()
    ar.save_quantized(output_dir="./tmp_your_vlm", format="auto_round")

Test with vision quantization:

ar = AutoRound(
    model_name,
    bits=4,
    group_size=128,
    quant_nontext_module=True,  # also quantize vision encoder
)

Step 7: Update Documentation

Add your model to the supported VLM list in README.md
Update README_CN.md with the same changes (Chinese translation required)
Add example quantization script if the model has special usage patterns

Reference: Existing VLM Implementations

Model Type	Block Handler	Template	Special Forward
`llava`	`_get_llava_multimodal_block`	llava template	No
`qwen2_vl`	`_get_qwen2_vl_multimodal_block`	qwen2_vl template	No
`qwen2_5_omni`	`_get_qwen2_5_omni_multimodal_block`	qwen2_5_omni template	Yes (`_qwen2_5_omni_forward`)
`qwen3_omni_moe`	`_get_qwen3_omni_moe_multimodal_block`	qwen3_omni_moe template	Yes (`_qwen3_omni_moe_forward`)
`deepseek_vl_v2`	`_get_deepseek_vl2_multimodal_block`	deepseek_vl_v2 template	Yes (`_deepseek_vl2_forward`)
`glm_image`	`_get_glm_image_multimodal_block`	glm_image template	No
`phi3_v`	via generic handler	phi3_v template	No

Key Registration Points

What	Where	Mechanism
Block handler	`special_model_handler.py`	`SPECIAL_MULTIMODAL_BLOCK[model_type]`
Text-only support	`special_model_handler.py`	`SUPPORT_ONLY_TEXT_MODELS` list
Batch limit	`special_model_handler.py`	`mllms_with_limited_bs` tuple
MLLM routing	`compressors/mllm_mixin.py`	`_get_calibrator_kind() -> "mllm"`
MLLM calibration	`calibration/mllm.py`	`MLLMCalibrator.calib()`
Template	`compressors/mllm/template.py`	`_register_template()`
Processor	`compressors/mllm/processor.py`	`PROCESSORS` dict
Custom forward	`special_model_handler.py`	`_handle_special_model()`
Dataset loader	`calib_dataset.py`	`@register_dataset()`

add-vlm-model

More from this repository

More from this repository

Adding a New Vision-Language Model to AutoRound

Overview

Prerequisites

Step 1: Add Multimodal Block Handler

1a. Create a block discovery function

1b. Register in the SPECIAL_MULTIMODAL_BLOCK dict

1c. Add to support lists

Step 2: Wire MLLM Calibration

Step 3: Add Calibration Template

3a. Create template JSON

3b. Register the template

3c. Add a custom processor (if needed)

Step 4: Handle Special Forward Pass (If Needed)

Step 5: Add Custom Calibration Dataset (Optional)

Step 6: Test

Step 7: Update Documentation

Reference: Existing VLM Implementations

Key Registration Points

Adding a New Vision-Language Model to AutoRound

Overview

Prerequisites

Step 1: Add Multimodal Block Handler

1a. Create a block discovery function

1b. Register in the SPECIAL_MULTIMODAL_BLOCK dict

1c. Add to support lists

Step 2: Wire MLLM Calibration

Step 3: Add Calibration Template

3a. Create template JSON

3b. Register the template

3c. Add a custom processor (if needed)

Step 4: Handle Special Forward Pass (If Needed)

Step 5: Add Custom Calibration Dataset (Optional)

Step 6: Test

Step 7: Update Documentation

Reference: Existing VLM Implementations

Key Registration Points

1b. Register in the `SPECIAL_MULTIMODAL_BLOCK` dict

1b. Register in the `SPECIAL_MULTIMODAL_BLOCK` dict