Skip to main content
Run any Skill in Manus
with one click

advantage-collapse-grpo-avspo

Advantage Collapse in Group Relative Policy Optimization (GRPO): Diagnosis and Mitigation via Adaptive Virtual Sample Policy Optimization (AVSPO). Introduces the Advantage Collapse Rate (ACR) metric to diagnose training stagnation, and proposes AVSPO to inject virtual reward samples guided by real-time ACR monitoring. Use when: diagnosing GRPO training failures, improving LLM reasoning RL post-training, mitigating advantage collapse, ICML 2026 accepted. Activation: advantage collapse GRPO, AVSPO, ACR metric, GRPO diagnosis, virtual sample policy optimization, RLVR training stagnation.

Overview

Advantage Collapse in Group Relative Policy Optimization (GRPO): Diagnosis and Mitigation via Adaptive Virtual Sample Policy Optimization (AVSPO). Introduces the Advantage Collapse Rate (ACR) metric to diagnose training stagnation, and proposes AVSPO to inject virtual reward samples guided by real-time ACR monitoring. Use when: diagnosing GRPO training failures, improving LLM reasoning RL post-training, mitigating advantage collapse, ICML 2026 accepted. Activation: advantage collapse GRPO, AVSPO, ACR metric, GRPO diagnosis, virtual sample policy optimization, RLVR training stagnation.

Install command
npx skills add https://github.com/hiyenwong/ai_collection --skill advantage-collapse-grpo-avspo

Copy and paste this command into Claude Code to install the skill

Stars1
Forks0
UpdatedJune 4, 2026 at 02:00
SKILL.md
readonly