원클릭으로
agent-validation-integration
v4.1.0 integration of agent validation learnings into live trading, backtesting, and feedback loops
Codex 또는 Claude로 설치 이 Prompt를 복사해 Codex, Claude 또는 다른 어시스턴트에 붙여 넣으면 Skill 페이지를 검토하고 설치를 진행할 수 있습니다.
메뉴
v4.1.0 integration of agent validation learnings into live trading, backtesting, and feedback loops
Codex 또는 Claude로 설치 이 Prompt를 복사해 Codex, Claude 또는 다른 어시스턴트에 붙여 넣으면 Skill 페이지를 검토하고 설치를 진행할 수 있습니다.
SOC 직업 분류 기준
Normalize long-form CODEX cycle folders to short form before notebooks run. Trigger: cyc001_reg001_*, hard-coded cyc paths breaking, staged CODEX raw data failing in Notebooks 1/2.
v5.6.0 joint multi-TF model: single model per symbol with broadcast 1Hour context replaces dual 15Min/1Hour models. Trigger: (1) replacing weighted-voting model aggregation, (2) adding broadcast features to vectorized env, (3) limited training data + worried about overfitting from doubling obs_dim, (4) backtest builder mismatch with newer feature counts.
DEPRECATED in v5.6.0 — see joint-multi-tf-v560 skill. Documents the v5.2.0 dual-model approach (train separate 15Min/1Hour models, combine via weighted voting). Still relevant for: (1) loading legacy v5.5.0 dual models, (2) understanding the historical aggregation layer, (3) resampling pattern via origin='start'.
Surface a shipped-but-undocumented CLI feature in user-facing docs. Trigger: user reports a known feature missing from README/readthedocs even though the CLI command exists.
KINTSUGI Snakefile + CLI changes that route SLURM jobs around accounts saturated by OTHER users on the same QOS pool. Trigger: QOSGrpMemLimit, jobs stuck pending despite available GPU slots in config, noisy neighbor on shared QOS, multi-user investment pool exhaustion, _build_cycle_assignment static-vs-live.
KINTSUGI SLURM batch processing: Maximize throughput using multi-account resource calculation with GPU+CPU pools per account. Trigger: SLURM job submission, batch processing, resource maximization, GPU+CPU concurrent, headless processing, resource pool.
| name | agent-validation-integration |
| description | v4.1.0 integration of agent validation learnings into live trading, backtesting, and feedback loops |
| author | Claude Code |
| date | "2026-02-21T00:00:00.000Z" |
| Item | Details |
|---|---|
| Date | 2026-02-21 |
| Goal | Integrate v3.0 agent validation insights (overtrading, DSR dominance, direction collapse) into the full pipeline: backtest diagnostics, live monitoring, gating, and training feedback loop |
| Environment | Alpaca Trading v4.0.0 → v4.1.0 |
| Status | Implemented, 757 tests passing |
v3.0 agent validation showed agents correctly identified CRITICAL issues (DSR dominance, overtrading, direction collapse) but the insights stayed trapped in training logs. No mechanism existed to:
Additionally, gating thresholds (APPROVED: fitness>=0.70, PF>=1.8) were unreachable, causing every model to classify as DROP.
Added ModelHealthDiagnostics to BacktestResult:
@dataclass
class ModelHealthDiagnostics:
hold_pct: float # % HOLD actions
buy_pct: float
sell_pct: float
close_pct: float
trades_per_bar: float
direction_accuracy: float
is_overtrading: bool # hold_pct < 0.30
is_direction_collapse: bool # direction_accuracy < 0.45
Both BacktestEngine and RealisticBacktestEngine track actions during simulation.
# ModelHealthMonitor (evaluation/model_health.py)
monitor = ModelHealthMonitor(window=100)
monitor.record(symbol, action, confidence, price)
health = monitor.check_health(symbol) # HealthSnapshot
# Integrated into live_trader.py main loop
# Writes to gate_status.json for dashboard consumption
# Added to CircuitBreakerConfig:
overtrading_hold_pct_threshold: float = 0.15
direction_accuracy_threshold: float = 0.40
# RealTimeRiskMonitor.check_model_health(health_data) triggers alerts
# LivePerformanceBridge (evaluation/live_bridge.py)
bridge = LivePerformanceBridge(db_path="data/trading_performance.db")
bridge.sync() # Reads PerformanceTracker SQLite → writes AgentMemory JSON
# Agent prompts automatically include live data via format_for_prompt()
# After training with agents:
diagnostics = trainer.get_diagnostic_summary()
# Next training run:
new_config = config.apply_diagnostic_overrides(diagnostics)
| APPROVED | REVIEW | |
|---|---|---|
| Fitness | ≥0.35 (was 0.70) | ≥0.10 (was 0.50) |
| PF | ≥1.4 (was 1.8) | ≥1.1 (was 1.3) |
| Consistency | ≥70% (was 85%) | ≥50% (was 65%) |
| MaxDD | ≤10% (was 8%) | ≤20% (was 15%) |
| Attempt | Why it Failed | Lesson Learned |
|---|---|---|
| Unreachable gating thresholds (PF>=1.8) | Every model classified as DROP → zero useful signal | Calibrate thresholds to population, tighten as models improve |
| Backtest without action distribution | Can't detect overtrading in historical results | Always track action distribution alongside PF/Sharpe |
| No live-to-training feedback | Agent memory only has training data, misses live performance drift | LivePerformanceBridge closes the loop |
| Direct import in signals.py | Circular import: signals → server → routes → signals | Use lazy import in function wrapper (_get_gate_data()) |
hasattr(mock, 'obs_dim') in tests | Always True for Mock objects → TypeError in arithmetic | Use isinstance(getattr(obj, 'attr', None), expected_type) |
write_gate_status() in live_trader.py is the IPC mechanism between trader and dashboard — extend it for new data, don't create parallel channelssave_run_summary() must preserve live_* fields when recomputing training patternsalpaca_trading/evaluation/model_health.py — ModelHealthMonitoralpaca_trading/evaluation/live_bridge.py — LivePerformanceBridgealpaca_trading/training/multi_agent.py — get_diagnostic_summary()