Skip to main content
Run any Skill in Manus
with one click

gcpo-cooperative-policy-optimization

Group Cooperative Policy Optimization (GCPO) replaces GRPO's winner-takes-all competition with team cooperation. Rollouts are rewarded by contribution to team's valid solution coverage, measured as determinant volume over reward-weighted semantic embeddings. Solves exploration collapse in RLVR for LLM reasoning.

Overview

Group Cooperative Policy Optimization (GCPO) replaces GRPO's winner-takes-all competition with team cooperation. Rollouts are rewarded by contribution to team's valid solution coverage, measured as determinant volume over reward-weighted semantic embeddings. Solves exploration collapse in RLVR for LLM reasoning.

Install command
npx skills add https://github.com/hiyenwong/ai_collection --skill gcpo-cooperative-policy-optimization

Copy and paste this command into Claude Code to install the skill

Stars1
Forks0
UpdatedJune 4, 2026 at 02:00
SKILL.md
readonly