name	curverl-distribution-aware-rlvr
description	CurveRL methodology — principled distribution-aware context reweighting for RLVR, using quantile coordinate transform where prompt weights depend on pass-rate rank and density rather than absolute values.

CurveRL: Distribution-Aware Context Reweighting for LLM Reasoning

Paper: CurveRL: Principled Distribution-Aware Context Reweighting for LLM Reasoning arXiv: 2605.24331 Authors: Ke Sun, Yizhou Zhao, Jiayi Xin, Qi Long, Weijie Su Submitted: 23 May 2026

Core Idea

Context/prompt-level reweighting is central to RLVR for LLM reasoning, but the principle for optimal weighting is poorly understood. CurveRL formulates prompt reweighting as a functional derivative in pass-rate function space, yielding a unified framework accommodating REINFORCE and GRPO. It uses a quantile coordinate transform: weight depends on pass-rate rank and density, not absolute values.

Key Contributions

Unified optimality framework: Prompt reweighting as functional derivative in pass-rate function space
CurveRL algorithm: Distribution-aware reweighting via quantile coordinate transform
Outperforms GRPO consistently across multiple benchmarks
Context-distribution control as a principled axis for RLVR analysis

Method Details

Unified Framework

Given pass rates p_i for prompt i, optimal weight w_i = F'(p_i) in pass-rate function space. Different F recovers REINFORCE (linear) and GRPO (group-based).

CurveRL Algorithm

Map pass rates to quantile values based on rank in current batch
Density-aware weighting: weight depends on local density around each prompt
Online adaptation as pass-rate distribution evolves

Key Formula

w_i = φ(rank(p_i) / N) · ψ(density(p_i))

Implementation

Maintain running pass-rate distribution estimate
Quantile buckets adaptive based on batch statistics
Drop-in replacement for GRPO weighting scheme
No architecture changes needed

Activation Keywords

CurveRL, distribution-aware reweighting, RLVR, prompt reweighting, context reweighting, quantile coordinate transform, LLM reasoning RL, pass-rate reweighting, context-distribution control

CurveRL: Distribution-Aware Context Reweighting for LLM Reasoning

Core Idea

Key Contributions

Unified optimality framework: Prompt reweighting as functional derivative in pass-rate function space

CurveRL algorithm: Distribution-aware reweighting via quantile coordinate transform

Outperforms GRPO consistently across multiple benchmarks

Context-distribution control as a principled axis for RLVR analysis

Method Details

Unified Framework

Given pass rates p_i for prompt i, optimal weight w_i = F'(p_i) in pass-rate function space. Different F recovers REINFORCE (linear) and GRPO (group-based).

CurveRL Algorithm

Map pass rates to quantile values based on rank in current batch

Density-aware weighting: weight depends on local density around each prompt

Online adaptation as pass-rate distribution evolves

Key Formula

w_i = φ(rank(p_i) / N) · ψ(density(p_i))

Implementation

Maintain running pass-rate distribution estimate

Quantile buckets adaptive based on batch statistics

Drop-in replacement for GRPO weighting scheme

No architecture changes needed

Activation Keywords

CurveRL, distribution-aware reweighting, RLVR, prompt reweighting, context reweighting, quantile coordinate transform, LLM reasoning RL, pass-rate reweighting, context-distribution control

curverl-distribution-aware-rlvr

CurveRL: Distribution-Aware Context Reweighting for LLM Reasoning

Core Idea

Key Contributions

Method Details

Unified Framework

CurveRL Algorithm

Key Formula

Implementation

Activation Keywords

More from this repository

More from this repository

CurveRL: Distribution-Aware Context Reweighting for LLM Reasoning

Core Idea

Key Contributions

Method Details

Unified Framework

CurveRL Algorithm

Key Formula

Implementation

Activation Keywords