Skip to main content
Run any Skill in Manus
with one click

gaussian-grpo

Gaussian Group Relative Policy Optimization (G²RPO) for multimodal RL training. Replaces linear scaling with distributional matching to ensure gradient equity across diverse tasks. Use when training multimodal models, balancing perception vs reasoning, or stabilizing RL across heterogeneous reward topologies. Keywords: G²RPO, Gaussian GRPO, multimodal RL, entropy shaping, response length shaping, GRPO, reinforcement learning.

Overview

Gaussian Group Relative Policy Optimization (G²RPO) for multimodal RL training. Replaces linear scaling with distributional matching to ensure gradient equity across diverse tasks. Use when training multimodal models, balancing perception vs reasoning, or stabilizing RL across heterogeneous reward topologies. Keywords: G²RPO, Gaussian GRPO, multimodal RL, entropy shaping, response length shaping, GRPO, reinforcement learning.

Install command
npx skills add https://github.com/hiyenwong/ai_collection --skill gaussian-grpo

Copy and paste this command into Claude Code to install the skill

Stars1
Forks0
UpdatedJune 4, 2026 at 02:00
SKILL.md
readonly