Bilingual guide for the OFFLINE_PACKING_BMR and OFFLINE_PACKED_DATA environment variables that control LLaVA-OneVision2 training-side packing — what each gate does, why both must be enabled together, MBS=1 requirement, and the dead OFFLINE_PACKING_VQA branch

2026-05-06998

distributed-offline-packing.md

from "EvolvingLMMs-Lab/LLaVA-OneVision-2"

Bilingual guide for running offline_packing/auto_pipe.sh across multiple nodes to produce padding-free packed WebDataset shards for SFT, with Energon Metadataset assembly

2026-04-28998

cu-lengths-attention-flow.md

from "EvolvingLMMs-Lab/LLaVA-OneVision-2"

Bilingual guide for understanding how cu_lengths controls attention behavior across ViT and LLM stages, and how patch_positions scope differs between the two

2026-03-26998

length-pool-sort-dataset.md

from "EvolvingLMMs-Lab/LLaVA-OneVision-2"

Bilingual guide for understanding LengthPoolSortDataset cross-rank length synchronization mechanism in multi-GPU training

2026-03-26998

package.json

"author": "EvolvingLMMs-Lab"

"repository": "EvolvingLMMs-Lab/LLaVA-OneVision-2"

Abrir repositório GitHub Ver repositórios do creator

$ install --global

$ download --local

Executar no Manus

$ useful --forSOC

Desenvolvedores de softwareInformática e Matemática15-1252L4

Script

Use case

examples/llava_onevision2/convert/convert_4b_p14m2_mcore_to_hf.sh

Single mcore→HF pass (deploy, inference debug)

examples/llava_onevision2/convert/convert_4b_p14m2_mcore_to_release.sh

Re-shard mcore via HF round-trip (change TP/PP without retraining)

# mcore → HF (auto-detects /release subdir; pass either form) bash examples/llava_onevision2/convert/convert_4b_p14m2_mcore_to_hf.sh \ /train_tmp/llava_onevision2_4b_p14m2_mcore_tp1pp1 \ /train_tmp/llava_onevision2_4b_p14m2_hf_out \ 1 1 # Re-shard: mcore TP=1 PP=1 → mcore TP=2 PP=4 (round-trips through HF) bash examples/llava_onevision2/convert/convert_4b_p14m2_mcore_to_release.sh \ /src_mcore_tp1pp1 /dst_mcore_tp2pp4 2 4 0,12,12,12

tests/consistency/ ├── __init__.py # empty package init / 空包初始化 ├── conftest.py # 9 session fixtures (209 lines) / 9 个 session fixture（209 行） ├── test_consistency_utils.py # 10 utilities + 11 unit tests (373 lines, DO NOT MODIFY) / 10 个工具 + 11 个单元测试（373 行，不要改） ├── test_model_consistency.py # 6 integration tests (402 lines) / 6 个集成测试（402 行） └── run_consistency_tests.sh # shell wrapper (60 lines) / shell 入口（60 行）

Fixture

Scope

Description

hf_model_path

session

HF auto-model directory (env: HF_MODEL_PATH)

converted_mcore_path

session

Auto-converts HF→mcore if MCORE_CHECKPOINT_PATH not set

preprocessor_path

session

Processor path (defaults to HF_MODEL_PATH)

test_image_path

session

Local test image (default: asset/performance.png)

megatron_init

session

Initializes Megatron via sys.argv override

hf_config

session

LlavaOnevision2Config.from_pretrained()

hf_vision_model

session

LlavaOnevision2Model.from_pretrained().visual on cuda bf16

hf_cond_gen_model

session

LlavaOnevision2ForConditionalGeneration on cuda bf16

mcore_model

session

Megatron get_model() + load_checkpoint()

hf_processor

session

AutoProcessor.from_pretrained()

Variable

Default

Description

HF_MODEL_PATH

<path/to/hf_checkpoint>

HF checkpoint (the only required input)

MCORE_CHECKPOINT_PATH

(auto-generated)

Set to skip conversion

PREPROCESSOR_PATH

$HF_MODEL_PATH

Image processor path

TEST_IMAGE_PATH

$REPO_ROOT/asset/performance.png

Local test image

CONSISTENCY_TEST_TP

1

Tensor parallel size

CONSISTENCY_TEST_PP

1

Pipeline parallel size

AIAK_TRAINING_PATH

$REPO_ROOT

AIAK training framework root

AIAK_MAGATRON_PATH

$REPO_ROOT/aiak_megatron

AIAK Megatron path

MASTER_PORT

29500

Distributed master port

Symptom

Likely Cause

Fix

weight_consistency fails on QKV

QKV layout conversion bug

Check convert_hf_qkv_to_mcore_layout for num_heads

weight_consistency fails on many keys

Wrong model / TP/PP mismatch

Verify HF_MODEL_PATH and conversion TP/PP

vision_encoder rotary_pos_emb fails

Debug tensor shape mismatch

Check align_rotary_debug_tensors — HF (1,S,64) vs mcore (S,32)

encoder_layer_wise late layers fail

Debug capture timing / layout

Usually not a real model bug if weight + merger pass

llm_output shape mismatch

Wrong tokenization or attention mask

Check prompt formatting and attention_mask.logical_not()

Megatron init fails

Wrong CLI args

Check _build_megatron_cli_args in conftest.py

Conversion fails

Missing AIAK_TRAINING_PATH

Export it before running

HF Key

mcore Key

embeddings.patch_embedding

patch_embed.proj

embeddings.class_embedding

class_embedding

layernorm_pre/post

pre_layernorm/post_layernorm

encoder.layers.{i}.layer_norm1

decoder.layers.{i}.self_attention.linear_qkv.layer_norm

encoder.layers.{i}.self_attn.qkv

decoder.layers.{i}.self_attention.linear_qkv

encoder.layers.{i}.self_attn.proj

decoder.layers.{i}.self_attention.linear_proj

encoder.layers.{i}.layer_norm2

decoder.layers.{i}.mlp.linear_fc1.layer_norm

encoder.layers.{i}.mlp.fc1/fc2

decoder.layers.{i}.mlp.linear_fc1/fc2

Script

Use case

examples/llava_onevision2/convert/convert_4b_p14m2_mcore_to_hf.sh

Single mcore→HF pass (deploy, inference debug)

examples/llava_onevision2/convert/convert_4b_p14m2_mcore_to_release.sh

Re-shard mcore via HF round-trip (change TP/PP without retraining)

Fixture

Scope

Description

hf_model_path

session

HF auto-model directory (env: HF_MODEL_PATH)

converted_mcore_path

session

Auto-converts HF→mcore if MCORE_CHECKPOINT_PATH not set

preprocessor_path

session

Processor path (defaults to HF_MODEL_PATH)

test_image_path

session

Local test image (default: asset/performance.png)

megatron_init

session

Initializes Megatron via sys.argv override

hf_config

session

LlavaOnevision2Config.from_pretrained()

hf_vision_model

session

LlavaOnevision2Model.from_pretrained().visual on cuda bf16

hf_cond_gen_model

session

LlavaOnevision2ForConditionalGeneration on cuda bf16

mcore_model

session

Megatron get_model() + load_checkpoint()

hf_processor

session

AutoProcessor.from_pretrained()

Variable

Default

Description

HF_MODEL_PATH

<path/to/hf_checkpoint>

HF checkpoint (the only required input)

MCORE_CHECKPOINT_PATH

(auto-generated)

Set to skip conversion

PREPROCESSOR_PATH

$HF_MODEL_PATH

Image processor path

TEST_IMAGE_PATH

$REPO_ROOT/asset/performance.png

Local test image

CONSISTENCY_TEST_TP

1

Tensor parallel size

CONSISTENCY_TEST_PP

1

Pipeline parallel size

AIAK_TRAINING_PATH

$REPO_ROOT

AIAK training framework root

AIAK_MAGATRON_PATH

$REPO_ROOT/aiak_megatron

AIAK Megatron path

MASTER_PORT

29500

Distributed master port

Symptom

Likely Cause

Fix

weight_consistency fails on QKV

QKV layout conversion bug

Check convert_hf_qkv_to_mcore_layout for num_heads

weight_consistency fails on many keys

Wrong model / TP/PP mismatch

Verify HF_MODEL_PATH and conversion TP/PP

vision_encoder rotary_pos_emb fails

Debug tensor shape mismatch

Check align_rotary_debug_tensors — HF (1,S,64) vs mcore (S,32)

encoder_layer_wise late layers fail

Debug capture timing / layout

Usually not a real model bug if weight + merger pass

llm_output shape mismatch

Wrong tokenization or attention mask

Check prompt formatting and attention_mask.logical_not()

Megatron init fails

Wrong CLI args

Check _build_megatron_cli_args in conftest.py

Conversion fails

Missing AIAK_TRAINING_PATH

Export it before running

HF Key

mcore Key

embeddings.patch_embedding

patch_embed.proj

embeddings.class_embedding

class_embedding

layernorm_pre/post

pre_layernorm/post_layernorm

encoder.layers.{i}.layer_norm1

decoder.layers.{i}.self_attention.linear_qkv.layer_norm

encoder.layers.{i}.self_attn.qkv

decoder.layers.{i}.self_attention.linear_qkv

encoder.layers.{i}.self_attn.proj

decoder.layers.{i}.self_attention.linear_proj

encoder.layers.{i}.layer_norm2

decoder.layers.{i}.mlp.linear_fc1.layer_norm

encoder.layers.{i}.mlp.fc1/fc2

decoder.layers.{i}.mlp.linear_fc1/fc2

name	llava-onevision2-consistency
description	Bilingual guide for running and interpreting LLaVA-OneVision2 HF vs Megatron consistency checks across TP and PP settings
compatibility	opencode
metadata	{"domain":"model-validation","framework":"llava-onevision2","repo":"llava-onevision2"}

name	llava-onevision2-consistency
description	Bilingual guide for running and interpreting LLaVA-OneVision2 HF vs Megatron consistency checks across TP and PP settings
compatibility	opencode
metadata	{"domain":"model-validation","framework":"llava-onevision2","repo":"llava-onevision2"}

llava-onevision2-consistency

Mais deste repositório

Mais deste repositório

Purpose / 用途

1. pytest test suite (recommended / 推荐)

2. Legacy monolithic script (reference only / 仅供参考)

Architecture / 架构

Direction: HF → mcore

Direction: mcore → HF (reverse / deploy / round-trip) / 反向：mcore → HF（部署 / 回环）

Test file structure / 测试文件结构

Fixtures in conftest.py / conftest.py 中的 fixtures

What the 6 tests check / 6 个测试检查什么

test_weight_consistency (fast)

test_vision_encoder_consistency_336px (fast)

test_mllm_after_merger_336px (fast)

test_encoder_layer_wise_consistency (slow)

test_llm_output_consistency (slow)

test_hf_loading_consistency (slow)

Environment variables / 环境变量

How to run / 怎么跑

Quick: run non-slow tests with auto-conversion / 快速：跑非 slow 测试 + 自动转换

Run all tests including slow / 跑全部测试（含 slow）

Custom TP/PP / 自定义 TP/PP

Skip conversion (pre-existing mcore checkpoint) / 跳过转换（已有 mcore checkpoint）

Run only unit tests (no GPU needed, works on host) / 只跑单元测试（不需要 GPU，host 上也能跑）

Run specific integration test / 跑指定的集成测试

What run_consistency_tests.sh does / run_consistency_tests.sh 做了什么

What conftest.py does for Megatron init / conftest.py 如何初始化 Megatron

How to interpret failures / 如何解读失败

Priority order for diagnosis / 诊断优先顺序

Common failure causes / 常见失败原因

Key weight mapping / 关键权重映射

Known repo-local lessons / 当前仓库已知经验

1. Rotary debug representation must be aligned

2. PP-aware testing is necessary

3. TP-aware weight comparison is necessary

4. HF and mcore use the same pixel value 2x2 memory layout

5. Encoder-layer-wise failures may be debug-layout issues

Minimal troubleshooting checklist / 最小排查清单

Purpose / 用途

1. pytest test suite (recommended / 推荐)

2. Legacy monolithic script (reference only / 仅供参考)

Architecture / 架构

Direction: HF → mcore

Direction: mcore → HF (reverse / deploy / round-trip) / 反向：mcore → HF（部署 / 回环）

Test file structure / 测试文件结构

Fixtures in conftest.py / conftest.py 中的 fixtures

What the 6 tests check / 6 个测试检查什么

test_weight_consistency (fast)

test_vision_encoder_consistency_336px (fast)

test_mllm_after_merger_336px (fast)

test_encoder_layer_wise_consistency (slow)

test_llm_output_consistency (slow)

test_hf_loading_consistency (slow)

Environment variables / 环境变量

How to run / 怎么跑

Quick: run non-slow tests with auto-conversion / 快速：跑非 slow 测试 + 自动转换

Run all tests including slow / 跑全部测试（含 slow）

Custom TP/PP / 自定义 TP/PP

Skip conversion (pre-existing mcore checkpoint) / 跳过转换（已有 mcore checkpoint）

Run only unit tests (no GPU needed, works on host) / 只跑单元测试（不需要 GPU，host 上也能跑）

Run specific integration test / 跑指定的集成测试

What run_consistency_tests.sh does / run_consistency_tests.sh 做了什么

What conftest.py does for Megatron init / conftest.py 如何初始化 Megatron

How to interpret failures / 如何解读失败

Priority order for diagnosis / 诊断优先顺序

Common failure causes / 常见失败原因

Key weight mapping / 关键权重映射

Known repo-local lessons / 当前仓库已知经验

1. Rotary debug representation must be aligned

2. PP-aware testing is necessary

3. TP-aware weight comparison is necessary

4. HF and mcore use the same pixel value 2x2 memory layout

5. Encoder-layer-wise failures may be debug-layout issues

Minimal troubleshooting checklist / 最小排查清单