com um clique
llava-onevision2-consistency
// Bilingual guide for running and interpreting LLaVA-OneVision2 HF vs Megatron consistency checks across TP and PP settings
// Bilingual guide for running and interpreting LLaVA-OneVision2 HF vs Megatron consistency checks across TP and PP settings
Bilingual guide for merging ViT + LLM into LlavaOnevision2 HF checkpoint and validating weight/inference consistency
Guide for writing clear, consistent git commit messages following this repository's conventions
Bilingual guide for the OFFLINE_PACKING_BMR and OFFLINE_PACKED_DATA environment variables that control LLaVA-OneVision2 training-side packing — what each gate does, why both must be enabled together, MBS=1 requirement, and the dead OFFLINE_PACKING_VQA branch
Bilingual guide for running offline_packing/auto_pipe.sh across multiple nodes to produce padding-free packed WebDataset shards for SFT, with Energon Metadataset assembly
Bilingual guide for understanding how cu_lengths controls attention behavior across ViT and LLM stages, and how patch_positions scope differs between the two
Bilingual guide for understanding LengthPoolSortDataset cross-rank length synchronization mechanism in multi-GPU training
| name | llava-onevision2-consistency |
| description | Bilingual guide for running and interpreting LLaVA-OneVision2 HF vs Megatron consistency checks across TP and PP settings |
| compatibility | opencode |
| metadata | {"domain":"model-validation","framework":"llava-onevision2","repo":"llava-onevision2"} |
Use this skill when validating whether a HuggingFace checkpoint and a Megatron/MCore checkpoint are behaviorally consistent in this repository.
在这个仓库里,需要验证 HuggingFace checkpoint 和 Megatron/MCore checkpoint 是否行为一致时,使用这个 skill。
There are two test systems in this repo:
本仓库有两套测试系统:
tests/consistency/conftest.py — session fixtures, HF→mcore conversion, Megatron initialization
tests/consistency/test_model_consistency.py — 6 integration tests
tests/consistency/test_consistency_utils.py — 10 utility functions + 11 unit tests
tests/consistency/run_consistency_tests.sh — shell wrapper with auto-conversion + torchrun
tests/consistency/conftest.py —— session 级 fixture、HF→mcore 转换、Megatron 初始化
tests/consistency/test_model_consistency.py —— 6 个集成测试
tests/consistency/test_consistency_utils.py —— 10 个工具函数 + 11 个单元测试
tests/consistency/run_consistency_tests.sh —— shell 入口,自动转换 + torchrun
examples/llava_onevision2/check_model_consistency.shexamples/llava_onevision2/check_model_consistency.py仅作历史参考,新的工作请用 pytest 套件。
The pytest suite assumes only the HF checkpoint exists as input. The mcore checkpoint is generated automatically via conversion.
pytest 测试套件假设 只有 HF checkpoint 作为输入。mcore checkpoint 通过转换 自动生成。
HF auto-model (input)
→ convert_4b_hf_to_mcore.sh (auto-run by conftest.py or run_consistency_tests.sh)
→ mcore checkpoint (generated)
→ both models loaded → 6 tests run
The pytest suite does not exercise the reverse path. For the p14m2 variant, two scripts ship for this:
| Script | Use case |
|---|---|
examples/llava_onevision2/convert/convert_4b_p14m2_mcore_to_hf.sh | Single mcore→HF pass (deploy, inference debug) |
examples/llava_onevision2/convert/convert_4b_p14m2_mcore_to_release.sh | Re-shard mcore via HF round-trip (change TP/PP without retraining) |
# mcore → HF (auto-detects /release subdir; pass either form)
bash examples/llava_onevision2/convert/convert_4b_p14m2_mcore_to_hf.sh \
/train_tmp/llava_onevision2_4b_p14m2_mcore_tp1pp1 \
/train_tmp/llava_onevision2_4b_p14m2_hf_out \
1 1
# Re-shard: mcore TP=1 PP=1 → mcore TP=2 PP=4 (round-trips through HF)
bash examples/llava_onevision2/convert/convert_4b_p14m2_mcore_to_release.sh \
/src_mcore_tp1pp1 /dst_mcore_tp2pp4 2 4 0,12,12,12
Round-trip correctness (TP=1 PP=1, verified 2026-05-25):
mcore → HF → mcore is bitwise identical to the original mcore checkpoint
(588 non-empty tensors compared, max abs diff = 0.000e+00, 0 shape mismatches,
0 missing keys). This is the strongest correctness guarantee for the reverse
path. Use this whenever changing TP/PP layout without retraining.
pytest 套件 不覆盖 反向路径。p14m2 variant 提供两个脚本(单次 mcore→HF
用于部署/推理 debug,mcore→release 用于通过 HF 中转改 TP/PP 切分)。
回环 mcore→HF→mcore 在 TP=1 PP=1 下与原始 mcore 逐位一致(588 个非空 tensor,
max abs diff = 0.000e+00,0 形状不匹配,0 缺失键,2026-05-25 验证)。
在不重训的前提下改 TP/PP layout 时使用回环。
Path convention note:
convert_4b_p14m2_mcore_to_hf.shauto-detects<load>/release— pass either the parent dir or the explicit release path. Sibling scripts (4b, p14m3, p16m3, 8b, 30b) still require explicit/release.路径约定说明:
convert_4b_p14m2_mcore_to_hf.sh会自动检测<load>/release— 父目录或显式 release 路径都可以传。Sibling 脚本仍要求显式/release。
tests/consistency/
├── __init__.py # empty package init / 空包初始化
├── conftest.py # 9 session fixtures (209 lines) / 9 个 session fixture(209 行)
├── test_consistency_utils.py # 10 utilities + 11 unit tests (373 lines, DO NOT MODIFY) / 10 个工具 + 11 个单元测试(373 行,不要改)
├── test_model_consistency.py # 6 integration tests (402 lines) / 6 个集成测试(402 行)
└── run_consistency_tests.sh # shell wrapper (60 lines) / shell 入口(60 行)
下表列出 conftest.py 暴露的 session 级 fixture,及其用途和默认来源:
| Fixture | Scope | Description |
|---|---|---|
hf_model_path | session | HF auto-model directory (env: HF_MODEL_PATH) |
converted_mcore_path | session | Auto-converts HF→mcore if MCORE_CHECKPOINT_PATH not set |
preprocessor_path | session | Processor path (defaults to HF_MODEL_PATH) |
test_image_path | session | Local test image (default: asset/performance.png) |
megatron_init | session | Initializes Megatron via sys.argv override |
hf_config | session | LlavaOnevision2Config.from_pretrained() |
hf_vision_model | session | LlavaOnevision2Model.from_pretrained().visual on cuda bf16 |
hf_cond_gen_model | session | LlavaOnevision2ForConditionalGeneration on cuda bf16 |
mcore_model | session | Megatron get_model() + load_checkpoint() |
hf_processor | session | AutoProcessor.from_pretrained() |
Compares all mapped weights between HF and mcore vision models:
比较 HF 和 mcore 视觉模型之间所有映射权重:
convert_hf_qkv_to_mcore_layout (interleaved Q/K/V per head) / 通过 convert_hf_qkv_to_mcore_layout 做 QKV 布局转换(每 head 交织 Q/K/V)_maybe_gather_tp_weight / 通过 _maybe_gather_tp_weight 做 TP-aware gatherCompares forward_debug outputs at 4 strategic points:
在 4 个关键点比较 forward_debug 输出:
after_patch_embed — patch embedding output / patch embedding 输出rotary_pos_emb — rotary position embedding (aligned via align_rotary_debug_tensors) / 旋转位置编码(通过 align_rotary_debug_tensors 对齐)after_pre_layernorm — after pre-layernorm / 经过 pre-layernorm 之后before_adapter — final encoder output before adapter / 进入 adapter 之前的最终 encoder 输出Compares vision + adapter pipeline output:
比较视觉 + adapter pipeline 输出:
forward_debug['after_merger'] / HF:forward_debug['after_merger']vision_model() → adapter() / mcore:vision_model() → adapter()Layer-by-layer comparison of all 24 encoder layers:
逐层比较所有 24 个 encoder 层:
layer_{i}_input and layer_{i}_output for each layer / 每层的 layer_{i}_input 和 layer_{i}_outputinput_hidden_states — initial encoder input / 初始 encoder 输入final_output — final encoder output / 最终 encoder 输出align_encoder_debug_tensors for shape alignment / 用 align_encoder_debug_tensors 做形状对齐End-to-end LLM logits comparison:
端到端 LLM logits 比较:
LlavaOnevision2ForConditionalGeneration (HF) and full mcore model / 加载 HF 的 LlavaOnevision2ForConditionalGeneration 和完整 mcore 模型Validates HF model loading methods are equivalent:
验证 HF 模型加载方式等价:
from_pretrained() vs manual load_file() from safetensors / from_pretrained() 对比从 safetensors 手动 load_file()np.allclose) / 比较所有 vision 权重(用 np.allclose 做精确匹配)forward_debug outputs (cosine > 0.9999) / 比较 forward_debug 输出(cosine > 0.9999)| Variable | Default | Description |
|---|---|---|
HF_MODEL_PATH | <path/to/hf_checkpoint> | HF checkpoint (the only required input) |
MCORE_CHECKPOINT_PATH | (auto-generated) | Set to skip conversion |
PREPROCESSOR_PATH | $HF_MODEL_PATH | Image processor path |
TEST_IMAGE_PATH | $REPO_ROOT/asset/performance.png | Local test image |
CONSISTENCY_TEST_TP | 1 | Tensor parallel size |
CONSISTENCY_TEST_PP | 1 | Pipeline parallel size |
AIAK_TRAINING_PATH | $REPO_ROOT | AIAK training framework root |
AIAK_MAGATRON_PATH | $REPO_ROOT/aiak_megatron | AIAK Megatron path |
MASTER_PORT | 29500 | Distributed master port |
All Python must run inside the container llava_megatron_container_ax.
所有 Python 必须在容器 llava_megatron_container_ax 内运行。
# Inside container, from repo root:
# 在容器内、仓库根目录执行:
bash tests/consistency/run_consistency_tests.sh
bash tests/consistency/run_consistency_tests.sh -m ""
TP=2 PP=1 MASTER_PORT=29501 bash tests/consistency/run_consistency_tests.sh
MCORE_CHECKPOINT_PATH=/path/to/existing bash tests/consistency/run_consistency_tests.sh
pytest tests/consistency/test_consistency_utils.py -v
bash tests/consistency/run_consistency_tests.sh -k test_weight_consistency
HF_MODEL_PATH and TEST_IMAGE_PATH exist / 校验 HF_MODEL_PATH 和 TEST_IMAGE_PATH 存在MCORE_CHECKPOINT_PATH is empty, runs convert_4b_hf_to_mcore.sh to generate it / 如果 MCORE_CHECKPOINT_PATH 为空,跑 convert_4b_hf_to_mcore.sh 生成PYTHONPATH to include transformers_impl/llavaonevision2, aiak_megatron, repo root / 把 transformers_impl/llavaonevision2、aiak_megatron、仓库根目录加入 PYTHONPATHtorchrun --nproc_per_node=$((TP*PP)) with pytest / 用 torchrun --nproc_per_node=$((TP*PP)) 启动 pytestSince pytest has its own arg parsing, Megatron CLI args can't be passed via command line. The solution:
由于 pytest 有自己的参数解析,Megatron CLI 参数不能通过命令行传递。解决方案:
HF_MODEL_PATH, MCORE_CHECKPOINT_PATH, CONSISTENCY_TEST_TP/PP, etc.) / shell 脚本导出环境变量(HF_MODEL_PATH、MCORE_CHECKPOINT_PATH、CONSISTENCY_TEST_TP/PP 等)conftest.py reads env vars, temporarily overrides sys.argv with constructed Megatron CLI args / conftest.py 读取环境变量,临时把 sys.argv 替换成构造好的 Megatron CLI 参数parse_arguments() + initialize_aiak_megatron() inside the override / 在替换期内调用 parse_arguments() + initialize_aiak_megatron()sys.argv afterward / 完事后恢复 sys.argv| Symptom | Likely Cause | Fix |
|---|---|---|
| weight_consistency fails on QKV | QKV layout conversion bug | Check convert_hf_qkv_to_mcore_layout for num_heads |
| weight_consistency fails on many keys | Wrong model / TP/PP mismatch | Verify HF_MODEL_PATH and conversion TP/PP |
| vision_encoder rotary_pos_emb fails | Debug tensor shape mismatch | Check align_rotary_debug_tensors — HF (1,S,64) vs mcore (S,32) |
| encoder_layer_wise late layers fail | Debug capture timing / layout | Usually not a real model bug if weight + merger pass |
| llm_output shape mismatch | Wrong tokenization or attention mask | Check prompt formatting and attention_mask.logical_not() |
| Megatron init fails | Wrong CLI args | Check _build_megatron_cli_args in conftest.py |
| Conversion fails | Missing AIAK_TRAINING_PATH | Export it before running |
| HF Key | mcore Key |
|---|---|
embeddings.patch_embedding | patch_embed.proj |
embeddings.class_embedding | class_embedding |
layernorm_pre/post | pre_layernorm/post_layernorm |
encoder.layers.{i}.layer_norm1 | decoder.layers.{i}.self_attention.linear_qkv.layer_norm |
encoder.layers.{i}.self_attn.qkv | decoder.layers.{i}.self_attention.linear_qkv |
encoder.layers.{i}.self_attn.proj | decoder.layers.{i}.self_attention.linear_proj |
encoder.layers.{i}.layer_norm2 | decoder.layers.{i}.mlp.linear_fc1.layer_norm |
encoder.layers.{i}.mlp.fc1/fc2 | decoder.layers.{i}.mlp.linear_fc1/fc2 |
QKV weights need layout conversion: HF stores [Q_all, K_all, V_all], mcore stores interleaved [Q_h0, K_h0, V_h0, Q_h1, K_h1, V_h1, ...].
QKV 权重需要布局转换:HF 存储 [Q_all, K_all, V_all],mcore 存储交织的 [Q_h0, K_h0, V_h0, Q_h1, K_h1, V_h1, ...]。
HF and Megatron expose different rotary_pos_emb debug shapes:
HF 和 Megatron 暴露不同形状的 rotary_pos_emb debug 张量:
(1, S, 64)(S, 32)The align_rotary_debug_tensors function handles this by squeezing batch dim and concatenating mcore's half-dim.
align_rotary_debug_tensors 函数通过去掉 batch 维度并拼接 mcore 的半维度来处理。
When PP > 1, not every pipeline stage owns vision_model, adapter, or decoder post-process outputs. Tests must skip non-owner stages.
当 PP > 1 时,不是每个 pipeline stage 都拥有 vision_model、adapter 或 decoder 后处理输出。测试必须跳过非 owner stage。
When TP > 1, use _maybe_gather_tp_weight to gather shards before comparison. It gathers along first dim for QKV/FC1, last dim for proj/FC2.
当 TP > 1 时,用 _maybe_gather_tp_weight 在比较前 gather shards。QKV/FC1 沿第一维 gather,proj/FC2 沿最后一维。
No pixel value conversion is needed between HF and mcore models.
HF 和 mcore 模型使用相同的 2x2 内存布局,无需转换 pixel values。
If weight_consistency + merger pass but encoder_layer_wise fails in late layers, suspect debug capture semantics rather than real model bugs.
如果 weight_consistency + merger 通过但 encoder_layer_wise 在后面层失败,优先怀疑 debug 捕获语义而非模型真错。
If the run fails, check in this order:
如果运行失败,按以下顺序排查:
Is the container running? docker exec -it llava_megatron_container_ax bash
Does HF_MODEL_PATH exist and contain safetensors files?
Did the HF→mcore conversion succeed? Check stderr output.
Does the container have enough GPUs for TP * PP?
Is MASTER_PORT already in use? Try a different port.
Did test_weight_consistency fail? → Fix this first before investigating other tests.
Is the failure in a @pytest.mark.slow test? → Run fast tests first with default marker filter.
容器是否在运行?docker exec -it llava_megatron_container_ax bash
HF_MODEL_PATH 是否存在且包含 safetensors 文件?
HF→mcore 转换是否成功?检查 stderr 输出。
容器 GPU 数量是否满足 TP * PP?
MASTER_PORT 是否被占用?换一个端口试试。
test_weight_consistency 是否失败?→ 先修这个再看其他测试。
失败的是否是 @pytest.mark.slow 测试?→ 先用默认 marker 跑 fast 测试。