| name | linear-recurrent-memory-pomdp-rl |
| description | Theoretical justification for why linear recurrent neural networks work as memory units in partially observable RL. Constructs linear filters that reproduce HMM belief logits and achieve vanishing state-decoding error under near-deterministic transitions. |
Linear Recurrent Memory in POMDP RL
Paper: arXiv:2605.31261 | Submitted: 29 May 2026
Authors: Yike Zhao, Onno Eberhard, Malek Khammassi, Ali H. Sayed, Michael Muehlebach
Core Concept
This work provides theoretical justification for the empirical success of linear recurrent neural networks (LRNNs) as memory units in partially observable reinforcement learning (POMDPs). It constructs two linear filters that mathematically explain LRNN effectiveness.
Key Results
- Filter 1: Exactly reproduces pre-softmax logits of HMM belief vector → sufficient statistic for optimal policy
- Filter 2: Achieves vanishing state-decoding error under nearly deterministic transitions → reduces state ambiguity to near zero
Mathematical Framework
1. Hidden Markov Model (HMM) Setup
State: s_t ∈ S (hidden)
Observation: o_t ∈ O (observed)
Transition: P(s_t | s_{t-1})
Emission: P(o_t | s_t)
Belief vector b_t = P(s_t | o_1, ..., o_t)
2. Linear Filter Construction
Filter 1 (Exact Belief Reproduction):
For deterministic transition matrix T:
ℓ_t = T^{-1} · ℓ_{t-1} + f(o_t)
where ℓ_t reproduces exact pre-softmax logits of belief.
This ℓ_t is a sufficient statistic for optimal policy π*(a|ℓ_t).
Filter 2 (Vanishing Decoding Error):
For nearly deterministic T (high diagonal probability):
e_decoder → 0 as T → deterministic
State ambiguity reduced to near zero.
3. Action-Controlled Extension
For POMDPs with action-dependent transitions:
T(a): transition matrix conditioned on action a
ℓ_t(a) = T(a)^{-1} · ℓ_{t-1} + f(o_t, a)
Linear filter becomes time-varying with action-dependent dynamics.
Why Linear RNNs Work
Sufficient Statistics Property
LRNNs approximate the belief vector computation:
- Linear recurrences can reproduce HMM belief logits
- Belief is optimal memory for POMDP decision making
- Linear structure preserves this information
Near-Deterministic Case
When environment transitions are nearly deterministic:
- LRNNs achieve near-perfect state decoding
- Memory requirements minimal
- Linear sufficient for state tracking
Implementation Guidelines
1. Linear Recurrent Unit Design
h_t = A · h_{t-1} + B · x_t
output = C · h_t
- A: Transition dynamics matrix (approximate HMM transition inverse)
- B: Observation embedding
- C: Policy readout
2. Key Design Principles
- Match transition structure: A should approximate T^{-1} of environment
- Observation encoding: B transforms observations to filter updates
- Dimension: State space cardinality determines hidden dimension
3. Training Considerations
- LRNNs can be trained via standard RL (policy gradient, actor-critic)
- Linear structure enables efficient optimization
- No need for complex nonlinear memory
Applications
Primary Use Cases
- POMDP RL - partially observable environments
- Recurrent policy networks - memory-efficient architectures
- Belief-based planning - optimal decision making under uncertainty
- Efficient RL architectures - reduced computational complexity
When to Use Linear Recurrent Memory
- Partially observable environments
- Near-deterministic dynamics
- Need for efficient, interpretable memory
- Belief-based decision making
Related Architectures
- Mamba/SSMs: Linear recurrent state-space models
- Linear Transformers: Linear attention mechanisms
- RWKV: Receptance Weighted Key Value
Performance Insights
From paper experiments:
- Linear filter serves as strong feature extractor
- Matches or exceeds nonlinear RNN performance in tested games
- More efficient than LSTM/GRU for near-deterministic POMDPs
Activation Keywords
- linear recurrent memory RL
- POMDP belief tracking
- sufficient statistic RL
- state decoding error
- HMM belief vector
- linear RNN theory
- recurrent memory justification
Related Skills
- [[efficient-tdmpc]] - Model-based RL with memory
- [[precise-sde-consistent-rl-flow-matching]] - Flow matching RL
- [[rat-randomized-advantage-transformation]] - Policy optimization
References
- Paper: arXiv:2605.31261 - "Why Linear Recurrent Memory Works in Partially Observable Reinforcement Learning"
- HMM theory: belief vector computation
- Linear RNN architectures: Mamba, RWKV, Linear Transformers
Category: reinforcement-learning, POMDP, recurrent-memory, theoretical-RL