Skip to main content
Run any Skill in Manus
with one click

delta-discriminative-token-credit-assignment

DelTA (Discriminative Token Credit Assignment) methodology for Reinforcement Learning from Verifiable Rewards (RLVR). Introduces a discriminator view of RLVR updates showing policy-gradient implicitly acts as a linear discriminator over token-gradient vectors. Proposes token-level coefficient estimation to amplify discriminative directions and downweight shared patterns (e.g. formatting tokens). Outperforms baselines by 3.26 pts on Qwen3-8B and 2.62 pts on Qwen3-14B across math benchmarks. Use when: improving token-level credit assignment in RLVR/GRPO, LLM reasoning RL post-training, reducing influence of formatting tokens in policy gradients. Activation: DelTA, discriminative token credit assignment, token-level RLVR, RLVR discriminator, token coefficient estimation, side-specific token gradients.

Overview

DelTA (Discriminative Token Credit Assignment) methodology for Reinforcement Learning from Verifiable Rewards (RLVR). Introduces a discriminator view of RLVR updates showing policy-gradient implicitly acts as a linear discriminator over token-gradient vectors. Proposes token-level coefficient estimation to amplify discriminative directions and downweight shared patterns (e.g. formatting tokens). Outperforms baselines by 3.26 pts on Qwen3-8B and 2.62 pts on Qwen3-14B across math benchmarks. Use when: improving token-level credit assignment in RLVR/GRPO, LLM reasoning RL post-training, reducing influence of formatting tokens in policy gradients. Activation: DelTA, discriminative token credit assignment, token-level RLVR, RLVR discriminator, token coefficient estimation, side-specific token gradients.

Install command
npx skills add https://github.com/hiyenwong/ai_collection --skill delta-discriminative-token-credit-assignment

Copy and paste this command into Claude Code to install the skill

Stars1
Forks0
UpdatedJune 4, 2026 at 02:00
SKILL.md
readonly