Run any Skill in Manus with one click

action-quantization-behavior-cloning

Establish regret bounds for behavior cloning with discretized actions combining statistical error and quantization error terms. Prove smoothness requirements for safe quantizer design, show that learning-based quantizers fail these requirements, and propose model-based augmentation to reduce error dependence from H² to H.

Run Skill in Manus

Overview

Install command

npx skills add https://github.com/ADu2021/skillXiv --skill action-quantization-behavior-cloning

Copy and paste this command into Claude Code to install the skill

Source

ADu2021/skillXiv

Stars3

Forks0

UpdatedMarch 26, 2026 at 15:00

SKILL.md

readonly

name	action-quantization-behavior-cloning
title	Understanding Behavior Cloning with Action Quantization: Regret Bounds and Quantizer Design
version	0.0.3
engine	skillxiv-v0.0.3-claude-opus-4.6
license	MIT
url	https://arxiv.org/abs/2603.20538
keywords	["Behavior Cloning","Action Quantization","Imitation Learning","Theoretical Bounds","Quantizer Design"]
description	Establish regret bounds for behavior cloning with discretized actions combining statistical error and quantization error terms. Prove smoothness requirements for safe quantizer design, show that learning-based quantizers fail these requirements, and propose model-based augmentation to reduce error dependence from H² to H.

Understanding Behavior Cloning with Action Quantization

Research Question

Why do some action discretization schemes fail catastrophically in behavior cloning despite low in-distribution quantization error? What are the theoretical limits on BC with quantized actions?

Analytical Instrument: Regret Decomposition

Decompose regret into two independent error sources:

Statistical Component: Uncertainty from finite samples: H√(log|Π|/n)

Standard BC sample complexity
Depends on policy class size |Π| and horizon H

Quantization Component: Discretization error propagation: H²·εq

Key insight: Quadratic scaling in horizon H
Comes from compounding state divergence across steps
Even small per-step quantization error εq explodes over long horizons

Regret Bound: R(n) = H√(log|Π|/n) + H²·εq

The quadratic term reveals why quantization is more damaging than sample complexity.

Controls & Theoretical Framework

Probabilistic Incremental Input-to-State Stability (P-IISS): Formal condition for dynamics robustness under action perturbations. Captures when small action errors remain localized (don't compound exponentially).

Relaxed Total Variation Continuity (RTVC): Characterizes when quantized policies maintain smooth decision boundaries. Proves that general learning-based quantizers violate this requirement.

Binning-Based Quantizers Satisfy RTVC: Uniform discretization (binning) naturally preserves smoothness; learned quantizers optimize in-distribution error at expense of smoothness guarantees.

# Quantizer Design Comparison:
#
# Learning-Based Quantizer (FAILS):
#   - Low in-distribution error (good on training data)
#   - Non-smooth boundaries between bins
#   - Fails RTVC: creates discontinuities in policy
#   - Result: exponential state divergence under deployment
#
# Binning-Based Quantizer (WORKS):
#   - Slightly higher in-distribution error
#   - Smooth boundaries by construction
#   - Satisfies RTVC: stable state evolution
#   - Result: controlled error propagation

Key Findings

Finding 1: Quantization Dominates at Long Horizons For H=100, εq=0.01: quantization term (100²·0.01=100) dominates statistical term (100·√(0.001)≈3). Improving sample complexity via more data is futile if quantizer is poor.

Finding 2: Smoothness Requirement is Non-Negotiable Learned quantizers that minimize training error but violate RTVC suffer exponential error amplification. No amount of better data collection fixes this.

Finding 3: Model-Based Augmentation Reduces Dependency By learning auxiliary transition model to keep rollouts in-distribution, quantization error reduces from H²·εq to H·εq—dramatic improvement from quadratic to linear scaling.

# Model-based augmentation: learn auxiliary dynamics
# During imitation learning, simultaneously train:
# 1. Policy π_θ from expert demonstrations
# 2. Transition model M_φ on expert trajectory data
#
# Loss = Imitation(π) + TD(M)  # joint optimization
# During deployment: use model to detect out-of-distribution
# trajectories; apply corrective actions to return to safe region
#
# Result: quantization error no longer compounds geometrically

Implications for Practitioners

Use Binning, Not Learned Quantizers: Even if learned quantizer has lower training error, binning provides safety guarantees critical for deployment.
Horizon Matters: Short-horizon tasks (H<20) tolerate poor quantizers; long-horizon tasks (H>50) require careful quantizer design.
Dynamics Stability is Critical: Check system stability (P-IISS property) before deployment. Marginal/unstable dynamics + quantization = failure.
Model Augmentation ROI: If rollouts frequently go out-of-distribution, training auxiliary transition model is worth 2×+ error reduction.
Sample Size Strategies:
- For short horizons: increase samples (improves statistical term)
- For long horizons: improve quantizer design (fixes exponential scaling)

Information-Theoretic Limits

Lower bounds prove that regret decomposition (statistical + quantization) is unavoidable—no algorithm achieves better rates. However, practitioners can optimize:

Quantizer smoothness (reduce εq via careful discretization)
Horizon length (design tasks to require fewer steps)
Dynamics stability (prefer deterministic or low-variance environments)

More from this repository

same repository

meaningful-kebab-case-name

ADu2021/skillXiv

Convert arXiv papers into ready-to-use agent skills using category-aware extraction. First classifies the paper into one or more of 11 research categories, then applies a specialized extraction pipeline for each category — because different types of papers produce different types of usable knowledge. A single paper can yield multiple skills if it spans categories. Use this skill whenever the user wants to turn a paper into a skill, extract practical techniques from research, build a skill library from papers, convert arXiv papers into reusable agent instructions, or batch-process multiple papers into skills. Also trigger when someone asks about extracting actionable knowledge from papers, making research practical for LLM agents, or systematically converting academic contributions into structured agent capabilities.

2026-03-263

adaptive-lora-personalized-ranks

ADu2021/skillXiv

Dynamically allocate LoRA ranks per-layer during fine-tuning instead of using fixed uniform ranks. Learn optimal rank for each layer and subject via variational framework with discretized exponential distribution, reducing memory footprint while maintaining fidelity and text-alignment.

2026-03-263

additivellm2-domain-adaptation

ADu2021/skillXiv

Adapt general LLMs to specialized manufacturing domains via domain-adaptive pretraining on open-access journals and visual instruction tuning. Extract 50M tokens and 24K images from peer-reviewed papers, achieve >90% accuracy on domain knowledge tasks, and enable real-time defect identification from manufacturing images.

2026-03-263

agentic-ai-intelligence-explosion

ADu2021/skillXiv

Future intelligence explosions will be plural, social, and entangled with humanity through distributed collaborative systems rather than singular superintelligence. Intelligence is inherently social, demanding infrastructure matching agent development; integrate governance, institutional frameworks, and constitutional checks across hierarchies of autonomous agents and human-AI centaurs in shifting configurations.

2026-03-263

animalclap-taxonomy-aware-pretraining

ADu2021/skillXiv

Build taxonomy-aware audio-text pretraining systems for species recognition from animal vocalizations. Train contrastive models that augment text prompts with hierarchical taxonomic structure (scientific/common names, phylogenetic sequences), evaluate on unseen species via rare-species test sets, and predict ecological traits directly from audio.

2026-03-263

bubblerag-evidence-driven-graphs

ADu2021/skillXiv

Address hallucinations in LLM QA over black-box knowledge graphs using evidence-driven retrieval. Formalize Optimal Informative Subgraph Retrieval and employ bubble expansion to discover candidate evidence graphs, achieving state-of-the-art multi-hop QA performance.

2026-03-263

Source

ADu2021

ADu2021/skillXiv

View GitHub Repository View Creator Repositories

Install command

Download

Run Skill in Manus

Useful forSOC

Data ScientistsComputer and Mathematical Occupations15-2051L4

name	action-quantization-behavior-cloning
title	Understanding Behavior Cloning with Action Quantization: Regret Bounds and Quantizer Design
version	0.0.3
engine	skillxiv-v0.0.3-claude-opus-4.6
license	MIT
url	https://arxiv.org/abs/2603.20538
keywords	["Behavior Cloning","Action Quantization","Imitation Learning","Theoretical Bounds","Quantizer Design"]
description	Establish regret bounds for behavior cloning with discretized actions combining statistical error and quantization error terms. Prove smoothness requirements for safe quantizer design, show that learning-based quantizers fail these requirements, and propose model-based augmentation to reduce error dependence from H² to H.

Understanding Behavior Cloning with Action Quantization

Research Question

Why do some action discretization schemes fail catastrophically in behavior cloning despite low in-distribution quantization error? What are the theoretical limits on BC with quantized actions?

Analytical Instrument: Regret Decomposition

Decompose regret into two independent error sources:

Statistical Component: Uncertainty from finite samples: H√(log|Π|/n)

Standard BC sample complexity
Depends on policy class size |Π| and horizon H

Quantization Component: Discretization error propagation: H²·εq

Key insight: Quadratic scaling in horizon H
Comes from compounding state divergence across steps
Even small per-step quantization error εq explodes over long horizons

Regret Bound: R(n) = H√(log|Π|/n) + H²·εq

The quadratic term reveals why quantization is more damaging than sample complexity.

Controls & Theoretical Framework

Relaxed Total Variation Continuity (RTVC): Characterizes when quantized policies maintain smooth decision boundaries. Proves that general learning-based quantizers violate this requirement.

Binning-Based Quantizers Satisfy RTVC: Uniform discretization (binning) naturally preserves smoothness; learned quantizers optimize in-distribution error at expense of smoothness guarantees.

# Quantizer Design Comparison:
#
# Learning-Based Quantizer (FAILS):
#   - Low in-distribution error (good on training data)
#   - Non-smooth boundaries between bins
#   - Fails RTVC: creates discontinuities in policy
#   - Result: exponential state divergence under deployment
#
# Binning-Based Quantizer (WORKS):
#   - Slightly higher in-distribution error
#   - Smooth boundaries by construction
#   - Satisfies RTVC: stable state evolution
#   - Result: controlled error propagation

Key Findings

# Model-based augmentation: learn auxiliary dynamics
# During imitation learning, simultaneously train:
# 1. Policy π_θ from expert demonstrations
# 2. Transition model M_φ on expert trajectory data
#
# Loss = Imitation(π) + TD(M)  # joint optimization
# During deployment: use model to detect out-of-distribution
# trajectories; apply corrective actions to return to safe region
#
# Result: quantization error no longer compounds geometrically

Implications for Practitioners

Use Binning, Not Learned Quantizers: Even if learned quantizer has lower training error, binning provides safety guarantees critical for deployment.
Horizon Matters: Short-horizon tasks (H<20) tolerate poor quantizers; long-horizon tasks (H>50) require careful quantizer design.
Dynamics Stability is Critical: Check system stability (P-IISS property) before deployment. Marginal/unstable dynamics + quantization = failure.
Model Augmentation ROI: If rollouts frequently go out-of-distribution, training auxiliary transition model is worth 2×+ error reduction.
Sample Size Strategies:
- For short horizons: increase samples (improves statistical term)
- For long horizons: improve quantizer design (fixes exponential scaling)

Information-Theoretic Limits

Lower bounds prove that regret decomposition (statistical + quantization) is unavoidable—no algorithm achieves better rates. However, practitioners can optimize:

Quantizer smoothness (reduce εq via careful discretization)
Horizon length (design tasks to require fewer steps)
Dynamics stability (prefer deterministic or low-variance environments)