| name | adapt-new-diffusion-model |
| description | Adapt AutoRound to support a new diffusion model architecture (DiT, UNet, hybrid AR+DiT). Use when a new diffusion model fails quantization, needs custom output configs, requires a custom pipeline function, or is a hybrid architecture with both autoregressive and diffusion components. |
Adapting AutoRound for a New Diffusion Model Architecture
Overview
AutoRound's new diffusion path uses auto_round/compressors/diffusion_mixin.py,
auto_round/calibration/diffusion.py, and the quantizer implementations under
auto_round/algorithms/quantization/. This skill covers what code changes are
needed when a new diffusion model doesn't work out-of-the-box. Common reasons
for adaptation:
- Transformer block type not registered in
DIFFUSION_OUTPUT_CONFIGS
- Non-standard pipeline API (not compatible with
pipe(prompts, ...))
- Hybrid architecture with both AR and diffusion components
- Model not detected as a diffusion model
Step 0: Diagnose the Problem
from auto_round import AutoRound
ar = AutoRound(
"your-org/your-diffusion-model",
scheme="W4A16",
iters=2,
nsamples=2,
num_inference_steps=5,
)
ar.quantize_and_save(output_dir="./test_output", format="fake")
| Error / Symptom | Root Cause | Fix Section |
|---|
| "using LLM mode" instead of Diffusion | Model not detected as diffusion | Step 1 |
assert len(output_config) == len(tmp_output) | Block output config mismatch | Step 2 |
| Pipeline call fails | Non-standard inference API | Step 3 |
| Hybrid model only quantizes DiT | AR component not handled | Step 4 |
Step 1: Ensure Model Detection
AutoRound detects diffusion models by checking for model_index.json in the
model directory:
def is_diffusion_model(model_or_path):
If your model doesn't have model_index.json, either create one in the model
directory or pass diffusion-specific options through new-architecture
ExtraConfig / AutoRound kwargs:
from auto_round.compressors.config import ExtraConfig
ar = AutoRound(
model,
extra_config=ExtraConfig(num_inference_steps=5),
)
Pipeline Loading
diffusion_load_model() uses AutoPipelineForText2Image.from_pretrained() and
extracts pipe.transformer as the quantizable model. If your model uses a
different attribute (e.g., pipe.unet), this needs adjustment in
auto_round/utils/model.py.
Step 2: Register Transformer Block Output Config
This is the most common adaptation needed. DIFFUSION_OUTPUT_CONFIGS maps
transformer block class names to their output tensor names. Without this,
calibration crashes because AutoRound doesn't know how to collect activations.
Find your block class name
import diffusers
pipe = diffusers.AutoPipelineForText2Image.from_pretrained("your-model")
for name, module in pipe.transformer.named_modules():
if hasattr(module, "forward") and "block" in name.lower():
print(f"{name}: {type(module).__name__}")
Register in DIFFUSION_OUTPUT_CONFIGS
Edit auto_round/algorithms/quantization/base.py:
class BaseQuantizers:
DIFFUSION_OUTPUT_CONFIGS = {
"FluxTransformerBlock": ["encoder_hidden_states", "hidden_states"],
"FluxSingleTransformerBlock": ["encoder_hidden_states", "hidden_states"],
"YourTransformerBlock": ["hidden_states"],
}
The list must match the exact order of tensors returned by the block's
forward() method.
How to determine output tensor names
- Read the block's
forward() method in diffusers source code
- Identify what tensors it returns (usually
hidden_states, sometimes also
encoder_hidden_states)
- List them in the order they're returned
Example: If forward() returns (hidden_states, encoder_hidden_states):
BaseQuantizers.DIFFUSION_OUTPUT_CONFIGS["YourBlock"] = ["hidden_states", "encoder_hidden_states"]
Example: If forward() returns just hidden_states:
BaseQuantizers.DIFFUSION_OUTPUT_CONFIGS["YourBlock"] = ["hidden_states"]
Step 3: Handle Non-Standard Pipeline API
If your model's inference API differs from the standard
pipe(prompts, guidance_scale=..., num_inference_steps=...), provide a custom
pipeline function.
Option A: Add a custom pipeline dispatch in DiffusionCalibrator
Update auto_round/calibration/diffusion.py so DiffusionCalibrator.calib()
dispatches through a small helper instead of calling pipe(...) directly:
class DiffusionCalibrator(LLMCalibrator):
...
def _run_pipeline(self, pipe, prompts, generator):
if getattr(pipe, "_autoround_pipeline_fn", None) is not None:
pipe._autoround_pipeline_fn(
pipe,
prompts,
guidance_scale=self.compressor.guidance_scale,
num_inference_steps=self.compressor.num_inference_steps,
generator=generator,
)
return
pipe(
prompts,
guidance_scale=self.compressor.guidance_scale,
num_inference_steps=self.compressor.num_inference_steps,
generator=generator,
)
Option B: Attach a model-specific function during model loading
For a known model family, attach _autoround_pipeline_fn in
auto_round/utils/model.py or auto_round/special_model_handler.py:
pipe._autoround_pipeline_fn = your_model_pipeline_fn
Option C: Add a dedicated branch in DiffusionCalibrator
For full control, update auto_round/calibration/diffusion.py so
DiffusionCalibrator.calib() dispatches through your custom pipeline function:
class DiffusionCalibrator(LLMCalibrator):
...
def _run_pipeline(self, pipe, prompts):
c = self.compressor
generator = (
None if c.generator_seed is None else torch.Generator(device=pipe.device).manual_seed(c.generator_seed)
)
pipe.your_custom_generate(
prompts,
steps=c.num_inference_steps,
cfg=c.guidance_scale,
generator=generator,
)
Step 4: Add Hybrid AR+DiT Support
For models with both autoregressive and diffusion components (e.g., GLM-Image).
4a. Register AR component
Add hybrid routing through the new architecture. Start with
auto_round/autoround.py, auto_round/compressors/entry.py, and
auto_round/compressors/diffusion_mixin.py.
If a reusable AR-component registry is needed, place it near the new routing code:
HYBRID_AR_COMPONENTS = [
"vision_language_encoder",
"your_ar_component",
]
The attribute name must match what exists on the diffusers pipeline object
(i.e., pipe.your_ar_component).
4b. Register DiT block output config
Add the DiT-specific output config in BaseQuantizers.DIFFUSION_OUTPUT_CONFIGS:
BaseQuantizers.DIFFUSION_OUTPUT_CONFIGS["YourDiTBlock"] = ["hidden_states", "encoder_hidden_states"]
4c. Register AR block handler
In auto_round/special_model_handler.py, add a block handler for the AR
component so AutoRound knows which layers to quantize:
def _get_your_hybrid_multimodal_block(model, quant_vision=False):
block_names = []
if quant_vision and hasattr(model, "vision_encoder"):
block_names.append([f"vision_encoder.blocks.{i}" for i in range(len(model.vision_encoder.blocks))])
block_names.append([f"language_model.layers.{i}" for i in range(len(model.language_model.layers))])
return block_names
SPECIAL_MULTIMODAL_BLOCK["your_model_type"] = _get_your_hybrid_multimodal_block
Hybrid quantization flow
The new hybrid flow should run two phases:
- Phase 1 (AR): Quantizes the AR component using text calibration data
(MLLM-style)
- Phase 2 (DiT): Quantizes the DiT component using diffusion pipeline
calibration
ar = AutoRound(
"your-hybrid-model",
dataset="coco2014",
ar_dataset="NeelNanda/pile-10k",
quant_ar=True,
quant_dit=True,
)
Step 5: Add Custom Calibration Dataset (Optional)
If your model needs a specific dataset format:
Edit the diffusion calibration path used by the new architecture:
auto_round/calibration/diffusion.py for how diffusion prompts are loaded and consumed
auto_round/calib_dataset.py for reusable dataset registration helpers
def get_diffusion_dataloader(dataset_name, nsamples, ...):
if dataset_name == "your_custom_dataset":
return _load_your_dataset(dataset_name, nsamples)
...
The default coco2014 dataset works for most text-to-image models. Custom
datasets need a TSV file with id and caption columns.
Step 6: Test
def test_your_diffusion_model():
ar = AutoRound(
"your-org/your-diffusion-model",
scheme="W4A16",
iters=2,
nsamples=4,
num_inference_steps=5,
guidance_scale=7.5,
)
compressed_model, layer_config = ar.quantize()
assert len(layer_config) > 0, "No layers quantized"
ar.save_quantized(output_dir="./test_output", format="fake")
For hybrid models, test both phases:
ar = AutoRound(
"your-hybrid-model",
quant_ar=True,
quant_dit=True,
iters=2,
nsamples=4,
)
Checklist
Key Files
| File | Purpose |
|---|
auto_round/algorithms/quantization/base.py | BaseQuantizers.DIFFUSION_OUTPUT_CONFIGS |
auto_round/calibration/diffusion.py | DiffusionCalibrator, pipeline-driving calibration logic |
auto_round/compressors/diffusion_mixin.py | Diffusion compressor mixin and calibrator routing |
auto_round/compressors/entry.py | New-architecture AutoRoundCompatible factory routing |
auto_round/utils/model.py | is_diffusion_model(), diffusion_load_model() |
auto_round/special_model_handler.py | AR block handlers for hybrid models |
auto_round/autoround.py | Model type routing (diffusion vs hybrid vs LLM) |
Reference: Existing Adaptations
| Model | Type | What Was Adapted |
|---|
| FLUX.1-dev | Pure DiT | DIFFUSION_OUTPUT_CONFIGS for FluxTransformerBlock/FluxSingleTransformerBlock |
| GLM-Image | Hybrid AR+DiT | AR routing + SPECIAL_MULTIMODAL_BLOCK + DiT DIFFUSION_OUTPUT_CONFIGS |
| NextStep | Custom pipeline | model-specific pipeline function attached by model handler / loader |