name	llm-sleep-paradigm-self-modify-consolidate
description	Sleep-Dreaming paradigm for LLM continual learning via Knowledge Seeding (on-policy distillation + RL imitation learning) and RL-generated synthetic curriculum for self-improvement.

Language Models Need Sleep: Self-Modification and Memory Consolidation

Core Concept

Inspired by human learning, introduce Sleep paradigm enabling LLMs to:

Continually learn and transfer in-context knowledge to long-term parameters
Distill short-term fragile memories into stable knowledge
Recursively improve with self-generated training data

Two-Stage Process

Stage 1: Memory Consolidation (Knowledge Seeding)

Upward distillation: Smaller-self memories → larger network
Generalized Distillation: On-policy distillation + RL-based imitation learning
Preserves knowledge while expanding capacity

Stage 2: Dreaming (Self-Improvement)

RL generates curriculum of synthetic data
Rehearse new knowledge
Refine existing capabilities
No human supervision required

Implementation

Knowledge Seeding process: on-policy distillation + RL imitation learning
Dreaming phase: RL-curriculum generation
Recursive self-improvement loop

Applications

Long-horizon tasks
Continual learning
Knowledge incorporation
Few-shot generalization

Source

arXiv: 2606.03979
Title: Language Models Need Sleep: Learning to Self-Modify and Consolidate Memories
Authors: Ali Behrouz, Farnoosh Hashemi, Vahab Mirrokni

Activation Keywords

sleep paradigm, knowledge seeding, dreaming phase, continual learning, self-modification, memory consolidation, synthetic curriculum

name	llm-sleep-paradigm-self-modify-consolidate
description	Sleep-Dreaming paradigm for LLM continual learning via Knowledge Seeding (on-policy distillation + RL imitation learning) and RL-generated synthetic curriculum for self-improvement.

Language Models Need Sleep: Self-Modification and Memory Consolidation

Core Concept

Inspired by human learning, introduce Sleep paradigm enabling LLMs to:

Continually learn and transfer in-context knowledge to long-term parameters
Distill short-term fragile memories into stable knowledge
Recursively improve with self-generated training data

Two-Stage Process

Stage 1: Memory Consolidation (Knowledge Seeding)

Upward distillation: Smaller-self memories → larger network
Generalized Distillation: On-policy distillation + RL-based imitation learning
Preserves knowledge while expanding capacity

Stage 2: Dreaming (Self-Improvement)

RL generates curriculum of synthetic data
Rehearse new knowledge
Refine existing capabilities
No human supervision required

Implementation

Knowledge Seeding process: on-policy distillation + RL imitation learning
Dreaming phase: RL-curriculum generation
Recursive self-improvement loop

Applications

Long-horizon tasks
Continual learning
Knowledge incorporation
Few-shot generalization

Source

arXiv: 2606.03979
Title: Language Models Need Sleep: Learning to Self-Modify and Consolidate Memories
Authors: Ali Behrouz, Farnoosh Hashemi, Vahab Mirrokni

Activation Keywords

sleep paradigm, knowledge seeding, dreaming phase, continual learning, self-modification, memory consolidation, synthetic curriculum

llm-sleep-paradigm-self-modify-consolidate

Language Models Need Sleep: Self-Modification and Memory Consolidation

Core Concept

Two-Stage Process

Stage 1: Memory Consolidation (Knowledge Seeding)

Stage 2: Dreaming (Self-Improvement)

Implementation

Applications

Source

Activation Keywords

More from this repository

Language Models Need Sleep: Self-Modification and Memory Consolidation

Core Concept

Two-Stage Process

Stage 1: Memory Consolidation (Knowledge Seeding)

Stage 2: Dreaming (Self-Improvement)

Implementation

Applications

Source

Activation Keywords

More from this repository