ワンクリックで
agent-validation-integration
v4.1.0 integration of agent validation learnings into live trading, backtesting, and feedback loops
Codex または Claude でインストール この Prompt をコピーして Codex、Claude、または他のアシスタントに貼り付けると、Skill ページを確認してインストールできます。
メニュー
v4.1.0 integration of agent validation learnings into live trading, backtesting, and feedback loops
Codex または Claude でインストール この Prompt をコピーして Codex、Claude、または他のアシスタントに貼り付けると、Skill ページを確認してインストールできます。
Normalize long-form CODEX cycle folders to short form before notebooks run. Trigger: cyc001_reg001_*, hard-coded cyc paths breaking, staged CODEX raw data failing in Notebooks 1/2.
v5.6.0 joint multi-TF model: single model per symbol with broadcast 1Hour context replaces dual 15Min/1Hour models. Trigger: (1) replacing weighted-voting model aggregation, (2) adding broadcast features to vectorized env, (3) limited training data + worried about overfitting from doubling obs_dim, (4) backtest builder mismatch with newer feature counts.
DEPRECATED in v5.6.0 — see joint-multi-tf-v560 skill. Documents the v5.2.0 dual-model approach (train separate 15Min/1Hour models, combine via weighted voting). Still relevant for: (1) loading legacy v5.5.0 dual models, (2) understanding the historical aggregation layer, (3) resampling pattern via origin='start'.
Surface a shipped-but-undocumented CLI feature in user-facing docs. Trigger: user reports a known feature missing from README/readthedocs even though the CLI command exists.
KINTSUGI Snakefile + CLI changes that route SLURM jobs around accounts saturated by OTHER users on the same QOS pool. Trigger: QOSGrpMemLimit, jobs stuck pending despite available GPU slots in config, noisy neighbor on shared QOS, multi-user investment pool exhaustion, _build_cycle_assignment static-vs-live.
KINTSUGI SLURM batch processing: Maximize throughput using multi-account resource calculation with GPU+CPU pools per account. Trigger: SLURM job submission, batch processing, resource maximization, GPU+CPU concurrent, headless processing, resource pool.
SOC 職業分類に基づく
| name | agent-validation-integration |
| description | v4.1.0 integration of agent validation learnings into live trading, backtesting, and feedback loops |
| author | Claude Code |
| date | "2026-02-21T00:00:00.000Z" |
| Item | Details |
|---|---|
| Date | 2026-02-21 |
| Goal | Integrate v3.0 agent validation insights (overtrading, DSR dominance, direction collapse) into the full pipeline: backtest diagnostics, live monitoring, gating, and training feedback loop |
| Environment | Alpaca Trading v4.0.0 → v4.1.0 |
| Status | Implemented, 757 tests passing |
v3.0 agent validation showed agents correctly identified CRITICAL issues (DSR dominance, overtrading, direction collapse) but the insights stayed trapped in training logs. No mechanism existed to:
Additionally, gating thresholds (APPROVED: fitness>=0.70, PF>=1.8) were unreachable, causing every model to classify as DROP.
Added ModelHealthDiagnostics to BacktestResult:
@dataclass
class ModelHealthDiagnostics:
hold_pct: float # % HOLD actions
buy_pct: float
sell_pct: float
close_pct: float
trades_per_bar: float
direction_accuracy: float
is_overtrading: bool # hold_pct < 0.30
is_direction_collapse: bool # direction_accuracy < 0.45
Both BacktestEngine and RealisticBacktestEngine track actions during simulation.
# ModelHealthMonitor (evaluation/model_health.py)
monitor = ModelHealthMonitor(window=100)
monitor.record(symbol, action, confidence, price)
health = monitor.check_health(symbol) # HealthSnapshot
# Integrated into live_trader.py main loop
# Writes to gate_status.json for dashboard consumption
# Added to CircuitBreakerConfig:
overtrading_hold_pct_threshold: float = 0.15
direction_accuracy_threshold: float = 0.40
# RealTimeRiskMonitor.check_model_health(health_data) triggers alerts
# LivePerformanceBridge (evaluation/live_bridge.py)
bridge = LivePerformanceBridge(db_path="data/trading_performance.db")
bridge.sync() # Reads PerformanceTracker SQLite → writes AgentMemory JSON
# Agent prompts automatically include live data via format_for_prompt()
# After training with agents:
diagnostics = trainer.get_diagnostic_summary()
# Next training run:
new_config = config.apply_diagnostic_overrides(diagnostics)
| APPROVED | REVIEW | |
|---|---|---|
| Fitness | ≥0.35 (was 0.70) | ≥0.10 (was 0.50) |
| PF | ≥1.4 (was 1.8) | ≥1.1 (was 1.3) |
| Consistency | ≥70% (was 85%) | ≥50% (was 65%) |
| MaxDD | ≤10% (was 8%) | ≤20% (was 15%) |
| Attempt | Why it Failed | Lesson Learned |
|---|---|---|
| Unreachable gating thresholds (PF>=1.8) | Every model classified as DROP → zero useful signal | Calibrate thresholds to population, tighten as models improve |
| Backtest without action distribution | Can't detect overtrading in historical results | Always track action distribution alongside PF/Sharpe |
| No live-to-training feedback | Agent memory only has training data, misses live performance drift | LivePerformanceBridge closes the loop |
| Direct import in signals.py | Circular import: signals → server → routes → signals | Use lazy import in function wrapper (_get_gate_data()) |
hasattr(mock, 'obs_dim') in tests | Always True for Mock objects → TypeError in arithmetic | Use isinstance(getattr(obj, 'attr', None), expected_type) |
write_gate_status() in live_trader.py is the IPC mechanism between trader and dashboard — extend it for new data, don't create parallel channelssave_run_summary() must preserve live_* fields when recomputing training patternsalpaca_trading/evaluation/model_health.py — ModelHealthMonitoralpaca_trading/evaluation/live_bridge.py — LivePerformanceBridgealpaca_trading/training/multi_agent.py — get_diagnostic_summary()