원클릭으로 Manus에서 모든 스킬 실행

$pwd:

adapt-new-diffusion-model

Name: Adapt New Diffusion Model
Author: intel

// Adapt AutoRound to support a new diffusion model architecture (DiT, UNet, hybrid AR+DiT). Use when a new diffusion model fails quantization, needs custom output configs, requires a custom pipeline function, or is a hybrid architecture with both autoregressive and diffusion components.

Manus에서 실행

$ git log --oneline --stat

stars:1,425

forks:134

updated:2026년 5월 14일 01:40

SKILL.md

readonly

related-skills.json

같은 저장소

adapt-new-llm.md

from "intel/auto-round"

Adapt AutoRound to support a new LLM architecture that doesn't work out-of-the-box. Use when quantization fails for a new model type, block detection doesn't find layers, MoE models need unfusing, custom forward passes are needed, or non-standard linear layer types need handling.

2026-05-141.4k

add-vlm-model.md

from "intel/auto-round"

Add support for a new Vision-Language Model (VLM) to AutoRound, including multimodal block handler, calibration dataset template, and special model handling. Use when integrating a new VLM like LLaVA, Qwen2-VL, GLM-Image, Phi-Vision, or similar multi-modal models for quantization.

2026-05-141.4k

add-inference-backend.md

from "intel/auto-round"

Add a new hardware inference backend to AutoRound for deploying quantized models (e.g., CUDA/Marlin, Triton, CPU, HPU, ARK). Use when implementing QuantLinear kernels, registering backend capabilities, or enabling quantized model inference on a new hardware platform.

2026-05-111.4k

add-export-format.md

from "intel/auto-round"

Add a new model export format to AutoRound (e.g., auto_round, auto_gptq, auto_awq, gguf, llm_compressor). Use when implementing a new quantized model serialization format, adding a new packing method, or extending export compatibility for deployment frameworks like vLLM, SGLang, or llama.cpp.

2026-04-171.4k

add-quantization-datatype.md

from "intel/auto-round"

Add a new quantization data type to AutoRound (e.g., INT, FP8, MXFP, NVFP, GGUF variants). Use when implementing a new weight/activation quantization scheme, registering a new quant function, or extending the data_type registry.

2026-04-171.4k

review-pr.md

from "intel/auto-round"

Review a pull request for the AutoRound repository with a structured checklist covering code quality, test coverage, documentation, Chinese translations, and quantization-specific concerns. Use when reviewing or preparing to submit a PR.

2026-04-171.4k

package.json

"author": "intel"

"repository": "intel/auto-round"

GitHub 저장소 열기 Creator 저장소 보기

$ install --global

$ download --local

Manus에서 실행

$ useful --forSOC

소프트웨어 개발자컴퓨터 및 수학직15-1252L4

name	adapt-new-diffusion-model
description	Adapt AutoRound to support a new diffusion model architecture (DiT, UNet, hybrid AR+DiT). Use when a new diffusion model fails quantization, needs custom output configs, requires a custom pipeline function, or is a hybrid architecture with both autoregressive and diffusion components.

Adapting AutoRound for a New Diffusion Model Architecture

Overview

AutoRound's new diffusion path uses auto_round/compressors/diffusion_mixin.py, auto_round/calibration/diffusion.py, and the quantizer implementations under auto_round/algorithms/quantization/. This skill covers what code changes are needed when a new diffusion model doesn't work out-of-the-box. Common reasons for adaptation:

Transformer block type not registered in DIFFUSION_OUTPUT_CONFIGS
Non-standard pipeline API (not compatible with pipe(prompts, ...))
Hybrid architecture with both AR and diffusion components
Model not detected as a diffusion model

Step 0: Diagnose the Problem

from auto_round import AutoRound

ar = AutoRound(
    "your-org/your-diffusion-model",
    scheme="W4A16",
    iters=2,
    nsamples=2,
    num_inference_steps=5,
)
ar.quantize_and_save(output_dir="./test_output", format="fake")

Error / Symptom	Root Cause	Fix Section
"using LLM mode" instead of Diffusion	Model not detected as diffusion	Step 1
`assert len(output_config) == len(tmp_output)`	Block output config mismatch	Step 2
Pipeline call fails	Non-standard inference API	Step 3
Hybrid model only quantizes DiT	AR component not handled	Step 4

Step 1: Ensure Model Detection

AutoRound detects diffusion models by checking for model_index.json in the model directory:

# auto_round/utils/model.py
def is_diffusion_model(model_or_path):
    # Checks for model_index.json presence

If your model doesn't have model_index.json, either create one in the model directory or pass diffusion-specific options through new-architecture ExtraConfig / AutoRound kwargs:

from auto_round.compressors.config import ExtraConfig

ar = AutoRound(
    model,
    extra_config=ExtraConfig(num_inference_steps=5),
)

Pipeline Loading

diffusion_load_model() uses AutoPipelineForText2Image.from_pretrained() and extracts pipe.transformer as the quantizable model. If your model uses a different attribute (e.g., pipe.unet), this needs adjustment in auto_round/utils/model.py.

Step 2: Register Transformer Block Output Config

This is the most common adaptation needed. DIFFUSION_OUTPUT_CONFIGS maps transformer block class names to their output tensor names. Without this, calibration crashes because AutoRound doesn't know how to collect activations.

Find your block class name

import diffusers

pipe = diffusers.AutoPipelineForText2Image.from_pretrained("your-model")
for name, module in pipe.transformer.named_modules():
    if hasattr(module, "forward") and "block" in name.lower():
        print(f"{name}: {type(module).__name__}")

Register in `DIFFUSION_OUTPUT_CONFIGS`

Edit auto_round/algorithms/quantization/base.py:

class BaseQuantizers:
    DIFFUSION_OUTPUT_CONFIGS = {
        "FluxTransformerBlock": ["encoder_hidden_states", "hidden_states"],
        "FluxSingleTransformerBlock": ["encoder_hidden_states", "hidden_states"],
        # Add your block type:
        "YourTransformerBlock": ["hidden_states"],  # output tensor names in order
    }

The list must match the exact order of tensors returned by the block's forward() method.

How to determine output tensor names

Read the block's forward() method in diffusers source code
Identify what tensors it returns (usually hidden_states, sometimes also encoder_hidden_states)
List them in the order they're returned

Example: If forward() returns (hidden_states, encoder_hidden_states):

BaseQuantizers.DIFFUSION_OUTPUT_CONFIGS["YourBlock"] = ["hidden_states", "encoder_hidden_states"]

Example: If forward() returns just hidden_states:

BaseQuantizers.DIFFUSION_OUTPUT_CONFIGS["YourBlock"] = ["hidden_states"]

Step 3: Handle Non-Standard Pipeline API

If your model's inference API differs from the standard pipe(prompts, guidance_scale=..., num_inference_steps=...), provide a custom pipeline function.

Option A: Add a custom pipeline dispatch in `DiffusionCalibrator`

Update auto_round/calibration/diffusion.py so DiffusionCalibrator.calib() dispatches through a small helper instead of calling pipe(...) directly:

class DiffusionCalibrator(LLMCalibrator):
    ...

    def _run_pipeline(self, pipe, prompts, generator):
        if getattr(pipe, "_autoround_pipeline_fn", None) is not None:
            pipe._autoround_pipeline_fn(
                pipe,
                prompts,
                guidance_scale=self.compressor.guidance_scale,
                num_inference_steps=self.compressor.num_inference_steps,
                generator=generator,
            )
            return
        pipe(
            prompts,
            guidance_scale=self.compressor.guidance_scale,
            num_inference_steps=self.compressor.num_inference_steps,
            generator=generator,
        )

Option B: Attach a model-specific function during model loading

For a known model family, attach _autoround_pipeline_fn in auto_round/utils/model.py or auto_round/special_model_handler.py:

pipe._autoround_pipeline_fn = your_model_pipeline_fn

Option C: Add a dedicated branch in `DiffusionCalibrator`

For full control, update auto_round/calibration/diffusion.py so DiffusionCalibrator.calib() dispatches through your custom pipeline function:

class DiffusionCalibrator(LLMCalibrator):
    ...

    def _run_pipeline(self, pipe, prompts):
        c = self.compressor
        generator = (
            None if c.generator_seed is None else torch.Generator(device=pipe.device).manual_seed(c.generator_seed)
        )
        pipe.your_custom_generate(
            prompts,
            steps=c.num_inference_steps,
            cfg=c.guidance_scale,
            generator=generator,
        )

Step 4: Add Hybrid AR+DiT Support

For models with both autoregressive and diffusion components (e.g., GLM-Image).

4a. Register AR component

Add hybrid routing through the new architecture. Start with auto_round/autoround.py, auto_round/compressors/entry.py, and auto_round/compressors/diffusion_mixin.py. If a reusable AR-component registry is needed, place it near the new routing code:

HYBRID_AR_COMPONENTS = [
    "vision_language_encoder",  # GLM-Image
    "your_ar_component",  # Your model's AR attribute name
]

The attribute name must match what exists on the diffusers pipeline object (i.e., pipe.your_ar_component).

4b. Register DiT block output config

Add the DiT-specific output config in BaseQuantizers.DIFFUSION_OUTPUT_CONFIGS:

BaseQuantizers.DIFFUSION_OUTPUT_CONFIGS["YourDiTBlock"] = ["hidden_states", "encoder_hidden_states"]

4c. Register AR block handler

In auto_round/special_model_handler.py, add a block handler for the AR component so AutoRound knows which layers to quantize:

def _get_your_hybrid_multimodal_block(model, quant_vision=False):
    block_names = []
    if quant_vision and hasattr(model, "vision_encoder"):
        block_names.append([f"vision_encoder.blocks.{i}" for i in range(len(model.vision_encoder.blocks))])
    block_names.append([f"language_model.layers.{i}" for i in range(len(model.language_model.layers))])
    return block_names


SPECIAL_MULTIMODAL_BLOCK["your_model_type"] = _get_your_hybrid_multimodal_block

Hybrid quantization flow

The new hybrid flow should run two phases:

Phase 1 (AR): Quantizes the AR component using text calibration data (MLLM-style)
Phase 2 (DiT): Quantizes the DiT component using diffusion pipeline calibration

ar = AutoRound(
    "your-hybrid-model",
    dataset="coco2014",  # DiT calibration
    ar_dataset="NeelNanda/pile-10k",  # AR calibration
    quant_ar=True,
    quant_dit=True,
)

Step 5: Add Custom Calibration Dataset (Optional)

If your model needs a specific dataset format:

Edit the diffusion calibration path used by the new architecture:

auto_round/calibration/diffusion.py for how diffusion prompts are loaded and consumed
auto_round/calib_dataset.py for reusable dataset registration helpers

def get_diffusion_dataloader(dataset_name, nsamples, ...):
    # Add handling for your dataset format
    if dataset_name == "your_custom_dataset":
        return _load_your_dataset(dataset_name, nsamples)
    ...

The default coco2014 dataset works for most text-to-image models. Custom datasets need a TSV file with id and caption columns.

Step 6: Test

def test_your_diffusion_model():
    ar = AutoRound(
        "your-org/your-diffusion-model",
        scheme="W4A16",
        iters=2,
        nsamples=4,
        num_inference_steps=5,
        guidance_scale=7.5,
    )
    compressed_model, layer_config = ar.quantize()
    assert len(layer_config) > 0, "No layers quantized"
    ar.save_quantized(output_dir="./test_output", format="fake")

For hybrid models, test both phases:

ar = AutoRound(
    "your-hybrid-model",
    quant_ar=True,
    quant_dit=True,
    iters=2,
    nsamples=4,
)

Checklist

Key Files

File	Purpose
`auto_round/algorithms/quantization/base.py`	`BaseQuantizers.DIFFUSION_OUTPUT_CONFIGS`
`auto_round/calibration/diffusion.py`	`DiffusionCalibrator`, pipeline-driving calibration logic
`auto_round/compressors/diffusion_mixin.py`	Diffusion compressor mixin and calibrator routing
`auto_round/compressors/entry.py`	New-architecture `AutoRoundCompatible` factory routing
`auto_round/utils/model.py`	`is_diffusion_model()`, `diffusion_load_model()`
`auto_round/special_model_handler.py`	AR block handlers for hybrid models
`auto_round/autoround.py`	Model type routing (diffusion vs hybrid vs LLM)

Reference: Existing Adaptations

Model	Type	What Was Adapted
FLUX.1-dev	Pure DiT	`DIFFUSION_OUTPUT_CONFIGS` for `FluxTransformerBlock`/`FluxSingleTransformerBlock`
GLM-Image	Hybrid AR+DiT	AR routing + `SPECIAL_MULTIMODAL_BLOCK` + DiT `DIFFUSION_OUTPUT_CONFIGS`
NextStep	Custom pipeline	model-specific pipeline function attached by model handler / loader

adapt-new-diffusion-model

이 저장소의 다른 Skills

이 저장소의 다른 Skills

Adapting AutoRound for a New Diffusion Model Architecture

Overview

Step 0: Diagnose the Problem

Step 1: Ensure Model Detection

Pipeline Loading

Step 2: Register Transformer Block Output Config

Find your block class name

Register in DIFFUSION_OUTPUT_CONFIGS

How to determine output tensor names

Step 3: Handle Non-Standard Pipeline API

Option A: Add a custom pipeline dispatch in DiffusionCalibrator

Option B: Attach a model-specific function during model loading

Option C: Add a dedicated branch in DiffusionCalibrator

Step 4: Add Hybrid AR+DiT Support

4a. Register AR component

4b. Register DiT block output config

4c. Register AR block handler

Hybrid quantization flow

Step 5: Add Custom Calibration Dataset (Optional)

Step 6: Test

Checklist

Key Files

Reference: Existing Adaptations

Adapting AutoRound for a New Diffusion Model Architecture

Overview

Step 0: Diagnose the Problem

Step 1: Ensure Model Detection

Pipeline Loading

Step 2: Register Transformer Block Output Config

Find your block class name

Register in DIFFUSION_OUTPUT_CONFIGS

How to determine output tensor names

Step 3: Handle Non-Standard Pipeline API

Option A: Add a custom pipeline dispatch in DiffusionCalibrator

Option B: Attach a model-specific function during model loading

Option C: Add a dedicated branch in DiffusionCalibrator

Step 4: Add Hybrid AR+DiT Support

4a. Register AR component

4b. Register DiT block output config

4c. Register AR block handler

Hybrid quantization flow

Step 5: Add Custom Calibration Dataset (Optional)

Step 6: Test

Checklist

Key Files

Reference: Existing Adaptations

Register in `DIFFUSION_OUTPUT_CONFIGS`

Option A: Add a custom pipeline dispatch in `DiffusionCalibrator`

Option C: Add a dedicated branch in `DiffusionCalibrator`

Register in `DIFFUSION_OUTPUT_CONFIGS`

Option A: Add a custom pipeline dispatch in `DiffusionCalibrator`

Option C: Add a dedicated branch in `DiffusionCalibrator`