Skip to main content
Run any Skill in Manus
with one click

agpo-adaptive-group-policy-optimization

AGPO (Adaptive Group Policy Optimization) methodology — a critic-free refinement of GRPO that uses group-level statistics to adaptively control update magnitude and exploration. Uses a shared probe-derived statistical state to drive adaptive clipping (based on reward dispersion, skewness, probe entropy, policy entropy, KL drift) and bidirectional adaptive temperature sampling. Outperforms PPO/GRPO on 9 math/STEM benchmarks with Qwen2.5-14B. Use when: improving GRPO training stability, reducing hyperparameter tuning burden in LLM RL post-training, adaptive exploration for reasoning models. Activation: AGPO, adaptive group policy optimization, adaptive clipping GRPO, bidirectional adaptive temperature, critic-free RLVR, statistical feedback control.

Overview

AGPO (Adaptive Group Policy Optimization) methodology — a critic-free refinement of GRPO that uses group-level statistics to adaptively control update magnitude and exploration. Uses a shared probe-derived statistical state to drive adaptive clipping (based on reward dispersion, skewness, probe entropy, policy entropy, KL drift) and bidirectional adaptive temperature sampling. Outperforms PPO/GRPO on 9 math/STEM benchmarks with Qwen2.5-14B. Use when: improving GRPO training stability, reducing hyperparameter tuning burden in LLM RL post-training, adaptive exploration for reasoning models. Activation: AGPO, adaptive group policy optimization, adaptive clipping GRPO, bidirectional adaptive temperature, critic-free RLVR, statistical feedback control.

Install command
npx skills add https://github.com/hiyenwong/ai_collection --skill agpo-adaptive-group-policy-optimization

Copy and paste this command into Claude Code to install the skill

Stars1
Forks0
UpdatedJune 4, 2026 at 02:00
SKILL.md
readonly