Bilingual guide for the OFFLINE_PACKING_BMR and OFFLINE_PACKED_DATA environment variables that control LLaVA-OneVision2 training-side packing — what each gate does, why both must be enabled together, MBS=1 requirement, and the dead OFFLINE_PACKING_VQA branch

2026-05-06998

distributed-offline-packing.md

from "EvolvingLMMs-Lab/LLaVA-OneVision-2"

Bilingual guide for running offline_packing/auto_pipe.sh across multiple nodes to produce padding-free packed WebDataset shards for SFT, with Energon Metadataset assembly

2026-04-28998

cu-lengths-attention-flow.md

from "EvolvingLMMs-Lab/LLaVA-OneVision-2"

Bilingual guide for understanding how cu_lengths controls attention behavior across ViT and LLM stages, and how patch_positions scope differs between the two

2026-03-26998

package.json

"author": "EvolvingLMMs-Lab"

"repository": "EvolvingLMMs-Lab/LLaVA-OneVision-2"

Abrir repositório GitHub Ver repositórios do creator

$ install --global

$ download --local

Executar no Manus

$ useful --forSOC

Cientistas de dadosInformática e Matemática15-2051L4

for batch_idx, sample in enumerate(self.dataset): pool.append(sample) if len(pool) >= self.pool_size: pool.sort(key=self.key_fn) # 1. 按长度排序 shuffle_seed = 42 + batch_idx # 2. 确定性 seed random.Random(shuffle_seed).shuffle(pool) # 3. 同 seed shuffle for s in pool: yield s pool.clear()

permutation = [3, 0, 4, 1, 2]: Rank 0 输出: [108, 100, 5000, 102, 105] Rank 1 输出: [109, 101, 5001, 103, 106] Rank 2 输出: [110, 99, 5002, 104, 107] → 同一位置长度近似一致

方案

效果

只 sort 不 shuffle

所有 rank 先跑短 sample 后跑长 sample。前期 step 极快，后期 step 极慢。训练动态不稳定

sort + shuffle

各 step 长度随机但 rank 间一致。step 耗时平稳，rank 间同步开销小

不 sort 不 shuffle

各 rank 同一 step 长度随机且不一致，快的等慢的

pool_size

跨 rank 长度同步精度

内存开销

首个 pool 输出延迟

小（~100）

较差，各 rank 分位数估计不准

低

中（~1000-10000）

好，推荐范围

中

大（~全量）

完美同步

高

极限 = dataset 大小

等价全局排序

不实际

permutation = [3, 0, 4, 1, 2]: Rank 0 输出: [108, 100, 5000, 102, 105] Rank 1 输出: [109, 101, 5001, 103, 106] Rank 2 输出: [110, 99, 5002, 104, 107] → 同一位置长度近似一致

方案

效果

只 sort 不 shuffle

所有 rank 先跑短 sample 后跑长 sample。前期 step 极快，后期 step 极慢。训练动态不稳定

sort + shuffle

各 step 长度随机但 rank 间一致。step 耗时平稳，rank 间同步开销小

不 sort 不 shuffle

各 rank 同一 step 长度随机且不一致，快的等慢的

pool_size

跨 rank 长度同步精度

内存开销

首个 pool 输出延迟

小（~100）

较差，各 rank 分位数估计不准

低

中（~1000-10000）

好，推荐范围

中

大（~全量）

完美同步

高

极限 = dataset 大小

等价全局排序

不实际

name	length-pool-sort-dataset
description	Bilingual guide for understanding LengthPoolSortDataset cross-rank length synchronization mechanism in multi-GPU training
compatibility	opencode
metadata	{"domain":"distributed-training","framework":"megatron-energon","repo":"llava-onevision2"}

name	length-pool-sort-dataset
description	Bilingual guide for understanding LengthPoolSortDataset cross-rank length synchronization mechanism in multi-GPU training
compatibility	opencode
metadata	{"domain":"distributed-training","framework":"megatron-energon","repo":"llava-onevision2"}

length-pool-sort-dataset

Purpose / 用途

Core mechanism / 核心机制

Three-step pipeline / 三步流水线

Pipeline position / 在 pipeline 中的位置

Why it accelerates training / 为什么能加速训练

The problem / 问题

The solution / 解决方案

Why it works — step by step / 为什么有效——逐步分析

Step 1: Sort aligns the i-th position across ranks / 排序对齐各 rank 第 i 个位置

Step 2: Same seed preserves alignment after shuffle / 同 seed 保持 shuffle 后的对齐

Why both sort AND shuffle are needed / 为什么排序和 shuffle 缺一不可

pool_size tuning / pool_size 调优

Multi-worker behavior (num_workers > 1) / 多 worker 行为

Checkpoint resume caveat / checkpoint 恢复注意事项

key_fn / 排序键

What to check during debugging / 调试时要检查什么

Expected outputs when using this skill / 使用本 skill 时的期望输出

Purpose / 用途

Core mechanism / 核心机制

Three-step pipeline / 三步流水线

Pipeline position / 在 pipeline 中的位置

Why it accelerates training / 为什么能加速训练

The problem / 问题

The solution / 解决方案

Why it works — step by step / 为什么有效——逐步分析

Step 1: Sort aligns the i-th position across ranks / 排序对齐各 rank 第 i 个位置

Step 2: Same seed preserves alignment after shuffle / 同 seed 保持 shuffle 后的对齐

Why both sort AND shuffle are needed / 为什么排序和 shuffle 缺一不可

pool_size tuning / pool_size 调优

Multi-worker behavior (num_workers > 1) / 多 worker 行为

Checkpoint resume caveat / checkpoint 恢复注意事项

key_fn / 排序键

What to check during debugging / 调试时要检查什么

Expected outputs when using this skill / 使用本 skill 时的期望输出

length-pool-sort-dataset

Mais deste repositório

Mais deste repositório

Purpose / 用途

Core mechanism / 核心机制

Three-step pipeline / 三步流水线

Pipeline position / 在 pipeline 中的位置

Why it accelerates training / 为什么能加速训练

The problem / 问题

The solution / 解决方案

Why it works — step by step / 为什么有效——逐步分析

Step 1: Sort aligns the i-th position across ranks / 排序对齐各 rank 第 i 个位置

Step 2: Same seed preserves alignment after shuffle / 同 seed 保持 shuffle 后的对齐

Why both sort AND shuffle are needed / 为什么排序和 shuffle 缺一不可

pool_size tuning / pool_size 调优

Multi-worker behavior (num_workers > 1) / 多 worker 行为

Checkpoint resume caveat / checkpoint 恢复注意事项

key_fn / 排序键

What to check during debugging / 调试时要检查什么

Expected outputs when using this skill / 使用本 skill 时的期望输出

Purpose / 用途

Core mechanism / 核心机制

Three-step pipeline / 三步流水线

Pipeline position / 在 pipeline 中的位置

Why it accelerates training / 为什么能加速训练

The problem / 问题

The solution / 解决方案

Why it works — step by step / 为什么有效——逐步分析

Step 1: Sort aligns the i-th position across ranks / 排序对齐各 rank 第 i 个位置

Step 2: Same seed preserves alignment after shuffle / 同 seed 保持 shuffle 后的对齐

Why both sort AND shuffle are needed / 为什么排序和 shuffle 缺一不可

pool_size tuning / pool_size 调优

Multi-worker behavior (num_workers > 1) / 多 worker 行为

Checkpoint resume caveat / checkpoint 恢复注意事项

key_fn / 排序键

What to check during debugging / 调试时要检查什么

Expected outputs when using this skill / 使用本 skill 时的期望输出