| name | quant-backtest |
| description | Institutional-grade Python backtesting framework builder for Codex. Use this skill whenever the user mentions backtesting, quant strategy, alpha model, trading system, signal engine, portfolio backtest, walk-forward optimization, strategy performance, Sharpe ratio calculation, look-ahead bias, transaction cost modeling, or building any systematic trading infrastructure. Also trigger on mentions of vectorized signals, position sizing, mark-to-market, risk metrics (VaR, Sortino, Calmar), regime filters, or factor attribution. If the user says "backtest this idea" or "test my trading strategy", use this skill. |
Quant Backtesting Framework
Build modular, institutional-grade Python backtesting systems. Keep the architecture strategy-agnostic and enforce rigorous data handling, transaction cost modeling, and performance attribution.
Purpose
Use this skill to:
- design a reusable Python backtesting framework
- implement or extend strategy modules
- add risk, attribution, or transaction cost logic
- debug look-ahead bias and data leakage
- scaffold walk-forward and out-of-sample validation workflows
Inputs to Gather
Collect as many of these as are available from the user or repository:
- target asset class or instruments
- bar frequency and date range
- data source or storage format
- strategy hypothesis and signal logic
- position sizing method
- transaction cost assumptions
- benchmark or factor model
- desired outputs such as charts, metrics, reports, or tests
If important inputs are missing, make reasonable defaults explicit and proceed.
Operating Principles
- Prevent look-ahead bias structurally.
- Use log returns internally for compounding math.
- Prefer vectorized pandas and numpy implementations.
- Treat transaction costs as mandatory, not optional.
- Separate in-sample fitting from out-of-sample evaluation.
- Keep modules loosely coupled and easy to swap.
Execution Plan
1) Inspect the repo or task shape
- Find whether the user wants a fresh framework, a strategy added, a bug fixed, or metrics extended.
- Identify existing file layout, coding style, and tests.
- If the repo already has conventions, follow them.
2) Map the work to this module layout
Use or adapt this structure:
backtest/
āāā __init__.py
āāā data.py
āāā signals.py
āāā positions.py
āāā tca.py
āāā risk.py
āāā regime.py
āāā strategy.py
āāā dashboard.py
āāā engine.py
3) Build or update the core modules
Data layer
- Define a
DataProvider abstract base class.
- Default backend should support DuckDB + Parquet.
- Preserve timezone-aware timestamps where possible.
- Forward-fill sparse data carefully, then drop leading invalid rows.
- Standardize outputs to a consistent tabular format.
Signal engine
- Use a
SignalGenerator base class.
- Put raw predictive logic in
generate_raw().
- Apply
.shift(1) in generate() as a non-negotiable invariant.
- Keep signals vectorized and timestamp aligned.
Position and accounting engine
- Track positions, cash, gross exposure, net exposure, leverage, and mark-to-market equity.
- Support at least equal-weight, fully deployed, and volatility-aware sizing.
- Ensure position application timing is consistent with shifted signals.
Transaction cost analysis
- Model commissions, slippage, and regulatory fees where relevant.
- Base cost calculations on turnover or notional traded.
- Report both gross and net results, but emphasize net results.
Risk and metrics
Compute at least:
- total return
- CAGR
- annualized volatility
- Sharpe
- Sortino
- max drawdown
- Calmar
- VaR 95 and 99, historical and parametric
Where feasible, also include:
- expected shortfall
- beta / alpha attribution
- factor regression
- turnover
- hit rate
- exposure statistics
Regime filter
- Keep regime logic separate from alpha logic.
- Support simple threshold filters and optional probabilistic models such as HMMs.
- Shift regime masks when needed so they do not leak future information.
Strategy abstraction
- Create a base
Strategy class that composes:
- a signal generator
- a position manager
- an optional regime filter
Orchestrator
- Build a backtest engine that wires together:
- data fetch
- signal generation
- sizing
- cost application
- equity curve construction
- metric computation
4) Validate correctness
Before finalizing:
- confirm signals are shifted
- confirm returns and positions are aligned
- confirm transaction costs are applied
- confirm metrics run on realized strategy returns
- confirm tests or sanity checks cover edge cases
5) Deliver useful outputs
Depending on the task, provide:
- code changes
- a concise architecture summary
- example usage
- tests
- assumptions and next steps
- notes on limitations or future extension points
Reference Implementation Notes
Data layer skeleton
from abc import ABC, abstractmethod
import numpy as np
import pandas as pd
class DataProvider(ABC):
@abstractmethod
def fetch(self, symbols: list[str], start: str, end: str) -> pd.DataFrame:
pass
def log_returns(self, prices: pd.DataFrame) -> pd.DataFrame:
return np.log(prices / prices.shift(1))
def clean(self, df: pd.DataFrame) -> pd.DataFrame:
return df.ffill().dropna(how="all")
Signal safety skeleton
from abc import ABC, abstractmethod
import pandas as pd
class SignalGenerator(ABC):
@abstractmethod
def generate_raw(self, data: pd.DataFrame) -> pd.DataFrame:
pass
def generate(self, data: pd.DataFrame) -> pd.DataFrame:
raw = self.generate_raw(data)
return raw.shift(1)
Core invariant checklist
- never override the shifted execution invariant without a very explicit reason
- never present uncosted performance as the primary result
- never treat in-sample performance as expected live performance
- never mix raw prices and return series without clear alignment
Task-Specific Playbooks
A. Build from scratch
- Create the module layout.
- Implement base abstractions first.
- Add one example strategy end to end.
- Add a fee model and risk metrics.
- Add a minimal dashboard or report output.
- Add tests for alignment, look-ahead protection, and metric sanity.
B. Add a strategy
- Inspect the current base strategy and signal interfaces.
- Implement new signal logic in
generate_raw().
- Reuse existing sizing, cost, and metrics machinery.
- Add tests showing no future leakage.
- Include a usage example with realistic parameters.
C. Add risk metrics
- Extend the metrics engine rather than scattering calculations.
- Keep annualization assumptions centralized.
- Document formulas and units.
- Add regression or benchmark alignment checks when attribution is involved.
D. Debug look-ahead bias
- Check whether signals are shifted.
- Check whether rolling windows include the current bar improperly.
- Check whether positions and returns are multiplied on the right dates.
- Check whether regime filters or benchmark series leak future data.
- Add explicit tests that fail when shift protection is removed.
E. Add transaction cost modeling
- Identify the portfolio turnover source.
- Compute trades from position deltas, not level positions.
- Apply cost assumptions consistently by asset class.
- Report performance degradation from gross to net.
Output Style
When using this skill, prefer:
- production-leaning code over pseudo-code
- minimal but clear comments
- explicit assumptions
- concise summaries of what changed and why
- tests when files are modified
Example Quick Start
provider = DuckDBParquetProvider("./data")
strategy = MyStrategy("example")
engine = BacktestEngine(provider, FeeSchedule(pct_of_notional=0.001, slippage_bps=5))
results = engine.run(strategy, ["SPY"], "2010-01-01", "2024-01-01")
print(results["metrics"])
Final Checks
Before finishing, verify:
- the framework remains strategy-agnostic
- data, signal, execution, accounting, and metrics are modular
- performance metrics are based on net realized returns
- the result is easy for a future engineer to extend