Exécutez n'importe quel Skill dans Manus
en un clic

Exécutez n'importe quel Skill dans Manus en un clic

backtest-observation-mismatch

Étoiles2

Forks0

Mis à jour23 mars 2026 à 13:59

Backtest observation features that receive hardcoded defaults vs training dynamic values — and why fixing them DEGRADES results. Includes walk-forward batch mode fix.

Installation

Installer avec Codex ou Claude Copiez ce prompt, collez-le dans Codex, Claude ou un autre assistant, puis laissez-le vérifier la page du skill et l'installer pour vous.

Exécuter dans Manus

Source

smith6jt-cop

smith6jt-cop/Skills_Registry

Ouvrir le dépôt GitHub Voir les dépôts du créateur

Téléchargement

Exécuter dans Manus

Métiers associésSOC

Basé sur la classification professionnelle SOC

Développeurs de logicielsProfessions informatiques et mathématiques·SOC 15-1252

SKILL.md

readonly

name	backtest-observation-mismatch
description	Backtest observation features that receive hardcoded defaults vs training dynamic values — and why fixing them DEGRADES results. Includes walk-forward batch mode fix.
version	1.0.0
tags	["backtest","observation","mismatch","walk-forward","win-rate"]
triggers	["backtest win rate","backtest accuracy","observation mismatch","walk-forward batch","BacktestObservationBuilder","sub-50% win rate"]

Backtest Observation Feature Mismatch (v5.3.0)

Problem

16/65 features in BacktestObservationBuilder.get_obs_at_bar() receive hardcoded default values instead of the dynamic values they receive during training. Sub-50% win rates in backtest led to investigation.

Finding: Sub-50% Win Rates Are Expected

Models use high R-multiple strategies — win rate < 50% but avg win >> avg loss. Example: CNM_1Hour has 46.7% win rate but PF=1.80 because avg win ($138.86) is 2.05x avg loss ($67.61). This is normal in quantitative trading.

Feature Defaults (DO NOT CHANGE)

Feature Group	Count	Default	Training Value	Why Default Is Better
Intraday	4	0.5	`step % 390 / 390`	Training pattern uses per-episode step counter that can't be replicated in backtest
Account state	3	0.0/0.5/0.0	Dynamic P&L/win_rate/drawdown	Episode dynamics (4096 envs, ~400 steps) don't match single-pass backtest
Position sizing	3	0.0/1.0/0.0	Dynamic position/capital state	Same episode dynamics issue
Regime bars	3	0.0	Dynamic 0-1.0 duration	Infrastructure exists but requires GPU Markov system
Calendar	7	0.5 fallback	Real calendar data	Usually works; 0.5 is safety fallback

Experiment Results (2026-03-23)

Control: All defaults → CNM_1Hour PF=1.80, Win Rate=46.7%, 30 trades
Real timestamps for intraday: PF=1.01, Win Rate=38.0%, 50 trades — DEGRADED
step%390 for intraday: PF=0.81, Win Rate=41.5%, 41 trades — DEGRADED
Dynamic account state: PF=0.91, Win Rate=35.0%, 20 trades — DEGRADED
Only regime bars: PF=1.80, Win Rate=46.7%, 30 trades — NO EFFECT (GPU Markov not loaded)

Walk-Forward Batch Fix

scripts/run_backtest.py: --all --walk-forward N was silently running single-pass backtests because run_all_models() ignored the walk_forward parameter. Fixed by passing walk_forward through to run_walk_forward() per model.

Files

alpaca_trading/gpu/inference_obs_builder.py — BacktestObservationBuilder
alpaca_trading/gpu/vectorized_env.py — Training observation construction
alpaca_trading/backtest/engine.py — BacktestEngine.run()
scripts/run_backtest.py — CLI with batch walk-forward fix

Proper Fix (Future)

Would require either: (1) retraining models with backtest-compatible observation patterns, or (2) implementing full episode simulation in backtest that exactly matches training env dynamics (4096 parallel envs with random resets).

Plus depuis ce dépôt

même dépôt

cycle-dir-normalization

smith6jt-cop/Skills_Registry

Normalize long-form CODEX cycle folders to short form before notebooks run. Trigger: cyc001_reg001_*, hard-coded cyc paths breaking, staged CODEX raw data failing in Notebooks 1/2.

2026-04-162

joint-multi-tf-v560

smith6jt-cop/Skills_Registry

v5.6.0 joint multi-TF model: single model per symbol with broadcast 1Hour context replaces dual 15Min/1Hour models. Trigger: (1) replacing weighted-voting model aggregation, (2) adding broadcast features to vectorized env, (3) limited training data + worried about overfitting from doubling obs_dim, (4) backtest builder mismatch with newer feature counts.

2026-04-112

multi-timeframe-training

smith6jt-cop/Skills_Registry

DEPRECATED in v5.6.0 — see joint-multi-tf-v560 skill. Documents the v5.2.0 dual-model approach (train separate 15Min/1Hour models, combine via weighted voting). Still relevant for: (1) loading legacy v5.5.0 dual models, (2) understanding the historical aggregation layer, (3) resampling pattern via origin='start'.

2026-04-112

dashboard-feature-discovery

smith6jt-cop/Skills_Registry

Surface a shipped-but-undocumented CLI feature in user-facing docs. Trigger: user reports a known feature missing from README/readthedocs even though the CLI command exists.

2026-04-082

live-aware-account-routing

smith6jt-cop/Skills_Registry

KINTSUGI Snakefile + CLI changes that route SLURM jobs around accounts saturated by OTHER users on the same QOS pool. Trigger: QOSGrpMemLimit, jobs stuck pending despite available GPU slots in config, noisy neighbor on shared QOS, multi-user investment pool exhaustion, _build_cycle_assignment static-vs-live.

2026-04-082

slurm-concurrent-processing

smith6jt-cop/Skills_Registry

KINTSUGI SLURM batch processing: Maximize throughput using multi-account resource calculation with GPU+CPU pools per account. Trigger: SLURM job submission, batch processing, resource maximization, GPU+CPU concurrent, headless processing, resource pool.

2026-04-082

Backtest Observation Feature Mismatch (v5.3.0)

Problem

Finding: Sub-50% Win Rates Are Expected

Feature Defaults (DO NOT CHANGE)

Feature Group

Count

Default

Training Value

Why Default Is Better

Intraday

0.5

step % 390 / 390

Training pattern uses per-episode step counter that can't be replicated in backtest

Account state

0.0/0.5/0.0

Dynamic P&L/win_rate/drawdown

Episode dynamics (4096 envs, ~400 steps) don't match single-pass backtest

Position sizing

0.0/1.0/0.0

Dynamic position/capital state

Same episode dynamics issue

Regime bars

0.0

Dynamic 0-1.0 duration

Infrastructure exists but requires GPU Markov system

Calendar

0.5 fallback

Real calendar data

Usually works; 0.5 is safety fallback

Experiment Results (2026-03-23)

Control: All defaults → CNM_1Hour PF=1.80, Win Rate=46.7%, 30 trades

Real timestamps for intraday: PF=1.01, Win Rate=38.0%, 50 trades — DEGRADED

step%390 for intraday: PF=0.81, Win Rate=41.5%, 41 trades — DEGRADED

Dynamic account state: PF=0.91, Win Rate=35.0%, 20 trades — DEGRADED

Only regime bars: PF=1.80, Win Rate=46.7%, 30 trades — NO EFFECT (GPU Markov not loaded)

Walk-Forward Batch Fix

Files

alpaca_trading/gpu/inference_obs_builder.py — BacktestObservationBuilder

alpaca_trading/gpu/vectorized_env.py — Training observation construction

alpaca_trading/backtest/engine.py — BacktestEngine.run()

scripts/run_backtest.py — CLI with batch walk-forward fix

Proper Fix (Future)