Complete workflow for adding a new RL training algorithm. Covers paradigm selection, TrainingArguments subclass, trainer implementation, registry, example config, and verification. Trigger: 'add algorithm', 'new trainer', 'new training method', 'implement algorithm'.
Mandatory pre-commit code review gate. Checks constraint violations, cross-module consistency, and implementation quality. Trigger proactively when changes span multiple files or touch shared infrastructure. Trigger: 'review', 'check before commit'.
Complete workflow for adding a new model adapter. Covers analysis, sample dataclass, adapter implementation (4 abstract methods + per-modality encoder overrides), registry, example YAML, and verification. Trigger: 'add model', 'support new model', 'integrate model', 'new adapter'.
Feature development with cross-module impact analysis. Covers trainer hierarchy, model adapters, reward pipeline, config system, sample dataclasses, and distributed training paths. Trigger: 'add feature', 'implement', 'refactor', 'reorganize', 'new capability'.
Bug fixing and debugging for ANY error, crash, loss divergence, gradient explosion, distributed hang, NaN, or unexpected behavior. Covers quick fixes and full protocol with 5-phase investigation. Trigger: 'fix bug', 'fix error', 'broken', 'crash', 'doesn't work', 'fails with', 'loss NaN', 'training hangs', 'OOM'.
Complete workflow for adding a new reward model. Covers pointwise vs groupwise design, __call__ contract, registration, YAML config, multi-reward setup, and verification. Trigger: 'add reward', 'new reward model', 'custom reward', 'scoring function'.