Run any Skill in Manus with one click

$pwd:

offline-packing-env-vars

Name: Offline Packing Env Vars
Author: EvolvingLMMs-Lab

// Bilingual guide for the OFFLINE_PACKING_BMR and OFFLINE_PACKED_DATA environment variables that control LLaVA-OneVision2 training-side packing — what each gate does, why both must be enabled together, MBS=1 requirement, and the dead OFFLINE_PACKING_VQA branch

Run Skill in Manus

$ git log --oneline --stat

stars:998

forks:72

updated:May 6, 2026 at 15:55

SKILL.md

readonly

related-skills.json

same repository

llava-onevision2-consistency.md

from "EvolvingLMMs-Lab/LLaVA-OneVision-2"

Bilingual guide for running and interpreting LLaVA-OneVision2 HF vs Megatron consistency checks across TP and PP settings

2026-05-26998

merge-ov2.md

from "EvolvingLMMs-Lab/LLaVA-OneVision-2"

Bilingual guide for merging ViT + LLM into LlavaOnevision2 HF checkpoint and validating weight/inference consistency

2026-05-26998

commit-message.md

from "EvolvingLMMs-Lab/LLaVA-OneVision-2"

Guide for writing clear, consistent git commit messages following this repository's conventions

2026-05-06998

distributed-offline-packing.md

from "EvolvingLMMs-Lab/LLaVA-OneVision-2"

Bilingual guide for running offline_packing/auto_pipe.sh across multiple nodes to produce padding-free packed WebDataset shards for SFT, with Energon Metadataset assembly

2026-04-28998

cu-lengths-attention-flow.md

from "EvolvingLMMs-Lab/LLaVA-OneVision-2"

Bilingual guide for understanding how cu_lengths controls attention behavior across ViT and LLM stages, and how patch_positions scope differs between the two

2026-03-26998

length-pool-sort-dataset.md

from "EvolvingLMMs-Lab/LLaVA-OneVision-2"

Bilingual guide for understanding LengthPoolSortDataset cross-rank length synchronization mechanism in multi-GPU training

2026-03-26998

package.json

"author": "EvolvingLMMs-Lab"

"repository": "EvolvingLMMs-Lab/LLaVA-OneVision-2"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Data ScientistsComputer and Mathematical Occupations15-2051L4

Env var

Status

Default

Read at

Effect

OFFLINE_PACKING_BMR

ALIVE

0

aiak_training_llm/data/multimodal/task_encoder.py:194

Inside PackedCaptioningSample handling, unroll each packed entry into a MultiMixQASample (BMR-style, with full prompt/caption messages). When 0, falls through to the legacy CaptioningSample branch which loses the multi-turn structure.

OFFLINE_PACKED_DATA

ALIVE

0

aiak_training_llm/data/multimodal/task_encoder.py:363

Inside batch(), replace dummy cu_lengths = [[0]] with the real per-sample s.cu_lengths stacked across the batch. Without this, the consumer side cannot construct PackedSeqParams.

OFFLINE_PACKING_VQA

DEAD

n/a

nowhere in aiak_training_llm/

Mentioned in README + several legacy shells under examples/llava_onevision1_5/ and examples/llava_onevision2/quick_start_video_2b/, but no source file reads it. Setting it has zero runtime effect. Treat as documentation noise.

# Cumulative sample lengths are needed for packing, otherwise use dummy values. cu_lengths = torch.tensor([[0]], dtype=torch.int32) max_lengths = torch.tensor([[0]], dtype=torch.int32) if self.is_packing_enabled or int(os.environ.get("OFFLINE_PACKED_DATA", 0)) == 1: cu_lengths = torch.stack([s.cu_lengths for s in samples]) max_lengths = torch.tensor([s.max_length for s in samples], dtype=torch.int32)

packed_seq_params = None ... if cu_lengths.shape == torch.Size([1, 1]): pass # treat as not packed else: assert cu_lengths.shape[0] == 1, "micro-batch-size must be 1 for packing" packed_seq_params = PackedSeqParams( qkv_format="thd", cu_seqlens_q=cu_lengths[0], cu_seqlens_kv=cu_lengths[0], ... )

BMR

PACKED_DATA

Result

No packing. Each sample treated independently. Slow but correct (if data is unpacked).

SILENT BUG. Data is encoded as packed sub-samples (BMR), cu_lengths is built, but batch() discards it as dummy [[0]]. Consumer sees shape == [1,1] → packed_seq_params = None → flash-attn applies a single causal mask across the entire packed sequence → cross-sub-sample attention leakage. Loss looks fine; model silently learns wrong attention.

HIDDEN CORRUPTION. Sub-samples encoded via legacy path, boundaries in cu_lengths don't align with token sequence. Consumer applies varlen attention with wrong offsets.

CORRECT. BMR encodes properly, PACKED_DATA forwards the real offsets, consumer builds PackedSeqParams, flash-attn applies per-sub-sample causal mask via cu_seqlens_q/kv.

┌─────────────────────────────────────────────────────────────────┐ │ Offline preprocessing (auto_pipe.sh, separate skill) │ │ Produces WebDataset shards with PackedCaptioningSample format │ └──────────────────────────────┬──────────────────────────────────┘ │ Energon dataloader yields PackedCaptioningSample │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ task_encoder.encode_sample() │ │ if OFFLINE_PACKING_BMR == 1: ◄── GATE 1 │ │ for each sub-sample → MultiMixQASample → encode_multi_mix_qa │ │ else: │ │ for each sub-sample → CaptioningSample → encode_captioning │ │ pack_selected_samples(l_samples) │ │ → ImageTaskSamplePacked with cu_lengths=[0,L1,L1+L2,...] │ └──────────────────────────────┬──────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ task_encoder.batch() │ │ if is_packing_enabled or OFFLINE_PACKED_DATA==1: ◄── GATE 2 │ │ cu_lengths = stack([s.cu_lengths for s in samples]) │ │ else: │ │ cu_lengths = [[0]] # dummy, signals "not packed" │ └──────────────────────────────┬──────────────────────────────────┘ │ batch dict broadcast via tensor_parallel ▼ ┌─────────────────────────────────────────────────────────────────┐ │ pretrain_llava_onevision2.get_batch_on_this_tp_rank() │ │ if cu_lengths.shape == [1,1]: packed_seq_params = None │ │ else: │ │ assert cu_lengths.shape[0] == 1 # MBS=1 required │ │ packed_seq_params = PackedSeqParams( │ │ qkv_format="thd", │ │ cu_seqlens_q=cu_lengths[0], │ │ cu_seqlens_kv=cu_lengths[0], ...) │ └──────────────────────────────┬──────────────────────────────────┘ │ ▼ Model forward → flash-attn varlen (see cu-lengths-attention-flow skill)

# ─────────────────────────────────────────────────────────── # Packing env vars — both REQUIRED for padding-free training # Set both to '1' when DATA_PATH points to offline-packed shards # (PackedCaptioningSample format, e.g. produced by auto_pipe.sh) # Leave both as '0' (or unset) for unpacked datasets. # Mixed states are silent bugs — see offline-packing-env-vars skill. # ─────────────────────────────────────────────────────────── export OFFLINE_PACKING_BMR='1' export OFFLINE_PACKED_DATA='1' # Hard requirement when packing is on MBS=1 # Throughput knobs: GBS via grad-accum, longer SEQ_LEN, more PP — not MBS

File

Lines

What

aiak_training_llm/data/multimodal/task_encoder.py

186-279

PackedCaptioningSample branch + Gate 1 (OFFLINE_PACKING_BMR)

aiak_training_llm/data/multimodal/task_encoder.py

359-365

batch() Gate 2 (OFFLINE_PACKED_DATA)

aiak_training_llm/data/multimodal/task_encoder.py

401-477

pack_selected_samples — builds real cu_lengths

aiak_training_llm/train/pretrain/pretrain_llava_onevision2.py

145-168

Consumer: cu_lengths.shape check + PackedSeqParams construction + MBS=1 assert

aiak_training_llm/train/pretrain/pretrain_llava_onevision2.py

171-207

SP padding for packed_seq_params (TP/SP-only path)

Env var

Status

Default

Read at

Effect

OFFLINE_PACKING_BMR

ALIVE

0

aiak_training_llm/data/multimodal/task_encoder.py:194

OFFLINE_PACKED_DATA

ALIVE

0

aiak_training_llm/data/multimodal/task_encoder.py:363

Inside batch(), replace dummy cu_lengths = [[0]] with the real per-sample s.cu_lengths stacked across the batch. Without this, the consumer side cannot construct PackedSeqParams.

OFFLINE_PACKING_VQA

DEAD

n/a

nowhere in aiak_training_llm/

BMR

PACKED_DATA

Result

No packing. Each sample treated independently. Slow but correct (if data is unpacked).

HIDDEN CORRUPTION. Sub-samples encoded via legacy path, boundaries in cu_lengths don't align with token sequence. Consumer applies varlen attention with wrong offsets.

CORRECT. BMR encodes properly, PACKED_DATA forwards the real offsets, consumer builds PackedSeqParams, flash-attn applies per-sub-sample causal mask via cu_seqlens_q/kv.

File

Lines

What

aiak_training_llm/data/multimodal/task_encoder.py

186-279

PackedCaptioningSample branch + Gate 1 (OFFLINE_PACKING_BMR)

aiak_training_llm/data/multimodal/task_encoder.py

359-365

batch() Gate 2 (OFFLINE_PACKED_DATA)

aiak_training_llm/data/multimodal/task_encoder.py

401-477

pack_selected_samples — builds real cu_lengths

aiak_training_llm/train/pretrain/pretrain_llava_onevision2.py

145-168

Consumer: cu_lengths.shape check + PackedSeqParams construction + MBS=1 assert

aiak_training_llm/train/pretrain/pretrain_llava_onevision2.py

171-207

SP padding for packed_seq_params (TP/SP-only path)

name	offline-packing-env-vars
description	Bilingual guide for the OFFLINE_PACKING_BMR and OFFLINE_PACKED_DATA environment variables that control LLaVA-OneVision2 training-side packing — what each gate does, why both must be enabled together, MBS=1 requirement, and the dead OFFLINE_PACKING_VQA branch
compatibility	opencode
metadata	{"domain":"training-pipeline","framework":"llava-onevision2","repo":"llava-onevision2"}

name	offline-packing-env-vars
description	Bilingual guide for the OFFLINE_PACKING_BMR and OFFLINE_PACKED_DATA environment variables that control LLaVA-OneVision2 training-side packing — what each gate does, why both must be enabled together, MBS=1 requirement, and the dead OFFLINE_PACKING_VQA branch
compatibility	opencode
metadata	{"domain":"training-pipeline","framework":"llava-onevision2","repo":"llava-onevision2"}

offline-packing-env-vars

Purpose / 用途

TL;DR / 一句话总结

The Three Env Vars / 三个环境变量真相表

Two-Stage Gate Architecture / 两段式 Gate 架构

Gate 1 — Data Layer (`OFFLINE_PACKING_BMR`)

Gate 2 — Batch Layer (`OFFLINE_PACKED_DATA`)

Why both gates must fire / 为什么必须两个都开

MBS=1 Hard Requirement / MBS=1 硬性要求

End-to-End Flow / 端到端流程图

Recipe: Correct Stage-N Script Snippet / 正确的训练脚本片段

Concrete Stage-1 A/B Pair (this repo) / 本仓库的 Stage-1 A/B 对照

Diagnostics / 排查清单

Cross-References / 交叉引用

Source File Index / 源文件索引

Purpose / 用途

TL;DR / 一句话总结

The Three Env Vars / 三个环境变量真相表

Two-Stage Gate Architecture / 两段式 Gate 架构

Gate 1 — Data Layer (`OFFLINE_PACKING_BMR`)

Gate 2 — Batch Layer (`OFFLINE_PACKED_DATA`)

Why both gates must fire / 为什么必须两个都开

MBS=1 Hard Requirement / MBS=1 硬性要求

End-to-End Flow / 端到端流程图

Recipe: Correct Stage-N Script Snippet / 正确的训练脚本片段

Concrete Stage-1 A/B Pair (this repo) / 本仓库的 Stage-1 A/B 对照

Diagnostics / 排查清单

Cross-References / 交叉引用

Source File Index / 源文件索引

offline-packing-env-vars

More from this repository

Purpose / 用途

TL;DR / 一句话总结

The Three Env Vars / 三个环境变量真相表

Two-Stage Gate Architecture / 两段式 Gate 架构

Gate 1 — Data Layer (OFFLINE_PACKING_BMR)

Gate 2 — Batch Layer (OFFLINE_PACKED_DATA)

Why both gates must fire / 为什么必须两个都开

MBS=1 Hard Requirement / MBS=1 硬性要求

End-to-End Flow / 端到端流程图

Recipe: Correct Stage-N Script Snippet / 正确的训练脚本片段

Concrete Stage-1 A/B Pair (this repo) / 本仓库的 Stage-1 A/B 对照

Diagnostics / 排查清单

Cross-References / 交叉引用

Source File Index / 源文件索引

Purpose / 用途

TL;DR / 一句话总结

The Three Env Vars / 三个环境变量真相表

Two-Stage Gate Architecture / 两段式 Gate 架构

Gate 1 — Data Layer (OFFLINE_PACKING_BMR)

Gate 2 — Batch Layer (OFFLINE_PACKED_DATA)

Why both gates must fire / 为什么必须两个都开

MBS=1 Hard Requirement / MBS=1 硬性要求

End-to-End Flow / 端到端流程图

Recipe: Correct Stage-N Script Snippet / 正确的训练脚本片段

Concrete Stage-1 A/B Pair (this repo) / 本仓库的 Stage-1 A/B 对照

Diagnostics / 排查清单

Cross-References / 交叉引用

Source File Index / 源文件索引

More from this repository

Gate 1 — Data Layer (`OFFLINE_PACKING_BMR`)

Gate 2 — Batch Layer (`OFFLINE_PACKED_DATA`)

Gate 1 — Data Layer (`OFFLINE_PACKING_BMR`)

Gate 2 — Batch Layer (`OFFLINE_PACKED_DATA`)