name	precise-sde-consistent-rl-flow-matching
description	Precise — SDE-consistent stochastic sampling for RL post-training of flow-matching models with clean-latent posterior mean freezing.

Precise: SDE-Consistent RL Flow Matching

Overview

Stochastic sampler design for RL post-training of flow-matching models. Decomposes sampler into exploration amount + discretization faithfulness. Uses "clean-latent posterior mean freezing" to ensure SDE-consistency at small step counts.

Core Methodology

Problem

Flow-matching models use deterministic ODE; RL needs stochastic policy
Standard reverse-time SDE introduces excess noise at small step counts
Exploration vs. denoising stability tradeoff

Solution: Precise Framework

Sampler Decomposition: Exploration amount + discretization faithfulness
SDE Schedule Design: Balance exploration vs. denoising stability
Clean-Latent Posterior Mean Freezing: Freeze posterior mean during sampling
SDE-Consistency: Ensure discretization matches continuous SDE at small steps

Key Insight

At small step counts, standard samplers add excess discretization noise. Precise freezes the clean-latent posterior mean to keep the denoising trajectory SDE-consistent.

Implementation Steps

Derive SDE schedule from ODE: add noise proportional to exploration needs
Identify clean-latent posterior mean in flow-matching architecture
Freeze posterior mean during sampling process
Tune exploration amount based on reward landscape
Use small step counts for RL efficiency

Applications

RL post-training for image generation models
Diffusion policy alignment (PickScore, HPSv2.1)
Flow-matching models with reward optimization
Text-to-image generation RL

Pitfalls

Don't: Use standard reverse-time SDE directly at small steps
Check: Posterior mean freezing correctly implemented
Monitor: Training time vs. prior methods (should see reduction)

Related Skills

[[som-score-based-meanflow-policy-optimization]] — MeanFlow one-step policy
[[daca-grpo-denoising-credit-assignment]] — Denoising-aware GRPO

Activation Keywords

Precise, SDE-consistent sampling, flow-matching RL, stochastic sampler design, posterior mean freezing, diffusion RL post-training, clean-latent freezing, small-step sampling

Source

arXiv:2605.23522 — Precise: SDE-Consistent Stochastic Sampling for RL Post-Training of Flow-Matching Models

name	precise-sde-consistent-rl-flow-matching
description	Precise — SDE-consistent stochastic sampling for RL post-training of flow-matching models with clean-latent posterior mean freezing.