Run any Skill in Manus with one click

v2a-cross-domain-offline-rl

V2A methodology — unifying Value Alignment, Assignment, and dynamics alignment for cross-domain offline RL with heterogeneous datasets from multiple source domains collected by diverse behavior policies.

Run Skill in Manus

Overview

Install command

npx skills add https://github.com/hiyenwong/ai_collection --skill v2a-cross-domain-offline-rl

Copy and paste this command into Claude Code to install the skill

Source

hiyenwong/ai_collection

Stars1

Forks0

UpdatedJune 4, 2026 at 02:00

SKILL.md

readonly

name	v2a-cross-domain-offline-rl
description	V2A methodology — unifying Value Alignment, Assignment, and dynamics alignment for cross-domain offline RL with heterogeneous datasets from multiple source domains collected by diverse behavior policies.

V2A: Value Alignment + Assignment for Cross-Domain Offline RL

Paper: Unifying Value Alignment and Assignment in Cross-Domain Offline Reinforcement Learning with Heterogeneous Datasets arXiv: 2605.24862 Authors: Zhongjian Qiao, Jiafei Lyu, Chenjia Bai, Peisong Wang, Siyang Gao, Shuang Qiu Submitted: 24 May 2026 (Accepted at ICML 2026)

Core Idea

Cross-domain offline RL aims to learn a policy in a target domain with limited target data + source data that exhibits a dynamics shift. When source datasets come from multiple source domains collected by diverse behavior policies, a critical yet overlooked issue emerges: value misassignment.

Value misassignment undermines value alignment, misleads data filtering toward selecting suboptimal samples, and loosens the suboptimality gap, degrading agent performance.

The proposed V2A framework integrates dynamics alignment, value alignment, and value assignment to address this.

Key Contributions

Identifies value misassignment in heterogeneous cross-domain offline RL — first work to study this multi-source, multi-behavior-policy setting.
V2A framework with three components:
- Dynamics alignment via temporally-consistent modality representation learning
- Value alignment via modality-aware advantage learning
- Value assignment via selective data filtering
Empirical results: Significantly outperforms strong baselines under general heterogeneous cross-domain offline RL settings.

Method Details

V2A Framework

Dynamics Alignment
- Extract dynamics modalities from source datasets using temporally-consistent modality representation learning
- Learn representations that capture the underlying dynamics of each source domain
Value Alignment
- Modality-aware advantage learning to rectify value alignment across domains
- Ensures value estimates are comparable across different source domains
Value Assignment
- Data filtering paradigm to selectively share source data for policy learning
- Filters out samples that would cause value misassignment

Key Insight

Value misassignment arises when:

Source datasets have different dynamics (multiple domains)
Source datasets are collected by different behavior policies
Standard value alignment methods fail to account for these differences

V2A addresses this by:

First identifying the dynamics modality of each source sample
Then learning modality-aware value estimates
Finally filtering data based on both dynamics alignment AND value alignment

Implementation Considerations

Use a dynamics encoder to extract temporal-consistent representations
Modality-aware heads for advantage estimation per dynamics cluster
Data filtering threshold based on combined dynamics + value alignment score
Target domain policy initialized from filtered source data

Activation Keywords

cross-domain offline RL, heterogeneous offline RL, value misassignment, value alignment, dynamics alignment, V2A, multiple source domains, offline RL transfer, dynamics shift, modality-aware RL, offline policy transfer

Related Work

Dynamics alignment: Matching source/target dynamics via representation learning
Value alignment: Ensuring value estimates are consistent across domains
Offline RL transfer: Using source data to bootstrap target policy

More from this repository

same repository

attachment-representations-interbrain-synchrony

hiyenwong/ai_collection

Attachment representations in early childhood as independent endogenous driver of interbrain synchrony during remote cooperation. Novel Remote Partner-Belief Manipulation paradigm isolates attachment representations by manipulating partner-belief. EEG synchrony concentrated at P4 channel (right TPJ). Activation: attachment, interbrain synchrony, EEG hyperscanning, child-adult interaction, attachment representations, social neuroscience, partner-belief manipulation, early childhood, mother-child interaction, brain synchronization, attachment security, social-emotional development.

2026-06-041

sleep-replay-acceleration-sharp

hiyenwong/ai_collection

SHARP (Sleep-based Hierarchical Accelerated Replay) 方法论 — 睡眠启发的分层加速回放框架用于长程非平稳时序模式识别。受啮齿动物慢波睡眠中加速回放启发，通过分离记忆模块和模式识别模块实现无反向传播的长程信用分配。适用于流式时序学习、长程依赖建模、神经科学启发的 AI 架构。触发词：睡眠回放、加速回放、SHARP、时序学习、长程依赖、流式学习、慢波睡眠、hierarchical replay

2026-06-041

piston-control-two-ion-quantum

hiyenwong/ai_collection

Inverse-engineering methodology for piston operations in trapped-ion quantum devices. One ion serves as classical piston driven by Coulomb interaction with quantum-controlled ion. Stationary state determined self-consistently. Inverse-engineering protocols enable precise control of classical ion motion. Provides route toward controlled piston dynamics in microscopic quantum devices.

2026-06-041

quantum-fault-trees-minimal-cut

hiyenwong/ai_collection

Quantum fault tree analysis methodology using quantum computing. Extends classical reliability engineering fault trees to quantum domain. Identifies minimal cut sets in system reliability analysis using quantum algorithms. Applicable to safety-critical systems, cyber-physical systems, and quantum system reliability engineering.

2026-06-041

adaptive-hybrid-feature-fusion-medical

hiyenwong/ai_collection

Adaptive Hybrid Quantum-Classical Feature Fusion methodology for medical image classification. Addresses optimization asymmetries between quantum and classical paradigms using Temperature-Scaled Hybrid Fusion (TSHF), Dynamic Hybrid Fusion (DHF), and Static Hybrid Fusion (SHF) strategies. Use when designing hybrid quantum-classical ML pipelines for healthcare/medical imaging, especially when combining ResNet backbones with variational quantum circuits for diagnostic tasks.

2026-06-041

adaptive-spiking-neuron-asn

hiyenwong/ai_collection

Adaptive Spiking Neuron (ASN) methodology for vision and language modeling. Implements trainable membrane potential dynamics with adaptive firing mechanisms for efficient Spiking Neural Networks (SNNs). Activation: adaptive spiking neuron, ASN, spiking neural network vision language, SNN adaptive neuron, neuromorphic vision language model.

2026-06-041

Source

hiyenwong

hiyenwong/ai_collection

View GitHub Repository View Creator Repositories

Install command

Download

Run Skill in Manus

name	v2a-cross-domain-offline-rl
description	V2A methodology — unifying Value Alignment, Assignment, and dynamics alignment for cross-domain offline RL with heterogeneous datasets from multiple source domains collected by diverse behavior policies.

V2A: Value Alignment + Assignment for Cross-Domain Offline RL

Core Idea

Value misassignment undermines value alignment, misleads data filtering toward selecting suboptimal samples, and loosens the suboptimality gap, degrading agent performance.

The proposed V2A framework integrates dynamics alignment, value alignment, and value assignment to address this.

Key Contributions

Identifies value misassignment in heterogeneous cross-domain offline RL — first work to study this multi-source, multi-behavior-policy setting.
V2A framework with three components:
- Dynamics alignment via temporally-consistent modality representation learning
- Value alignment via modality-aware advantage learning
- Value assignment via selective data filtering
Empirical results: Significantly outperforms strong baselines under general heterogeneous cross-domain offline RL settings.

Method Details

V2A Framework

Dynamics Alignment
- Extract dynamics modalities from source datasets using temporally-consistent modality representation learning
- Learn representations that capture the underlying dynamics of each source domain
Value Alignment
- Modality-aware advantage learning to rectify value alignment across domains
- Ensures value estimates are comparable across different source domains
Value Assignment
- Data filtering paradigm to selectively share source data for policy learning
- Filters out samples that would cause value misassignment

Key Insight

Value misassignment arises when:

Source datasets have different dynamics (multiple domains)
Source datasets are collected by different behavior policies
Standard value alignment methods fail to account for these differences

V2A addresses this by:

First identifying the dynamics modality of each source sample
Then learning modality-aware value estimates
Finally filtering data based on both dynamics alignment AND value alignment

Implementation Considerations

Use a dynamics encoder to extract temporal-consistent representations
Modality-aware heads for advantage estimation per dynamics cluster
Data filtering threshold based on combined dynamics + value alignment score
Target domain policy initialized from filtered source data

Activation Keywords

cross-domain offline RL, heterogeneous offline RL, value misassignment, value alignment, dynamics alignment, V2A, multiple source domains, offline RL transfer, dynamics shift, modality-aware RL, offline policy transfer

Related Work

Dynamics alignment: Matching source/target dynamics via representation learning
Value alignment: Ensuring value estimates are consistent across domains
Offline RL transfer: Using source data to bootstrap target policy