| name | evolution-engine |
| description | Domain knowledge for the Evolution Engine — LLM-powered autonomous strategy discovery from raw OHLCV data. Covers the generate-backtest-select-evolve loop, vectorized backtesting, out-of-sample validation, and strategy graduation. Use when discovering trading patterns, running backtests, evolving strategies, or reviewing evolution logs. Triggers on "evolve", "discover patterns", "backtest", "evolution", "strategy generation", "candidate strategy". |
Evolution Engine
Overview
The Evolution Engine autonomously discovers trading strategies from raw price data. It uses LLM-powered pattern generation combined with vectorized backtesting to evolve, test, and graduate viable trading rules — without manual rule writing.
This is not parameter optimization on a known strategy. It's open-ended strategy discovery: the LLM proposes novel entry/exit logic, the engine validates it against real data, and natural selection eliminates the losers.
How It Works
The Evolution Loop
OHLCV Data → LLM Generation → Vectorized Backtest → Selection → Mutation → Repeat
↓
Out-of-Sample Validation
↓
Graduated Strategies
Step-by-Step
- Data Fetch: Pull OHLCV candles from Binance public API (no key needed)
- Generate: LLM analyzes price patterns and proposes N candidate strategies (entry/exit rules, position sizing, stop loss)
- Backtest: Each candidate is backtested vectorized (numpy, no loop-per-candle) for speed
- Score: Candidates scored by Sharpe ratio, win rate, max drawdown, total return
- Select: Top K candidates survive. Bottom candidates are eliminated (graveyard).
- Mutate: LLM takes survivors and generates variations (parameter tweaks, rule modifications)
- Repeat: Steps 3-6 for N generations
- Validate: Final survivors are tested on held-out out-of-sample data
- Graduate: Strategies that pass OOS validation are marked as graduated
Key Design Decisions
- LLM generates rules, not parameters. The engine doesn't optimize MACD(12,26,9) → MACD(14,28,10). It discovers entirely new rule combinations.
- Vectorized backtesting. No candle-by-candle loops. Numpy vectorized operations make backtests 100x faster than event-driven simulators.
- OOS validation is mandatory. In-sample performance means nothing. Only OOS-validated strategies graduate.
- Graveyard is data. Failed strategies are logged with failure reasons. This prevents re-discovering the same dead ends.
MCP Tools
| Tool | Purpose |
|---|
evolution_fetch_market_data | Fetch OHLCV data from Binance for a symbol/timeframe/period |
evolution_discover_patterns | LLM-powered pattern discovery — generates N candidate strategies |
evolution_run_backtest | Backtest a single candidate — returns Sharpe, win rate, drawdown |
evolution_evolve_strategy | Full evolution loop: generate → backtest → select → mutate × N generations |
evolution_get_log | History of evolution runs: graduated strategies, graveyard, metrics |
Backtest Metrics
Every backtest produces:
| Metric | Minimum for Graduation |
|---|
| Sharpe Ratio | > 1.0 (OOS) |
| Win Rate | > 40% |
| Max Drawdown | < 25% |
| Number of Trades | > 30 (statistical significance) |
| Profit Factor | > 1.2 |
These thresholds are guidelines. Context matters — a Sharpe of 0.9 with 500 trades may be more reliable than 2.5 with 15 trades.
Best Practices
Before Running Evolution
- Choose the right timeframe. 1h and 4h produce the most tradeable strategies. 1m is noise. 1d may not have enough data points.
- Use enough data. 90 days minimum for 1h data. 180 days for 4h. Less data = more overfitting risk.
- Start small. 3 generations × 10 candidates is a good starting point. Don't jump to 10 × 50.
During Evolution
- Don't interrupt. Each generation builds on the previous. Stopping mid-run wastes compute.
- Monitor the graveyard. If 90% of candidates fail on the same metric (e.g., max drawdown), the symbol/timeframe may not be suitable.
- Watch for convergence. If surviving strategies across generations look increasingly similar, the engine has found a local optimum.
After Evolution
- Never deploy without OOS validation. In-sample results are marketing, not science.
- Paper trade first. Even OOS-validated strategies should be paper traded for 2-4 weeks.
- Check regime sensitivity. A strategy discovered in a trending market may fail in ranging conditions. Test across multiple market regimes.
- Log everything. Use
evolution_get_log to review what was tried, what failed, and why.
When NOT to Use Evolution
- Not for parameter optimization. If you already have a strategy and just want to tune parameters, use a traditional optimizer.
- Not for HFT. The engine works on candle data, not tick data. Sub-minute strategies need different infrastructure.
- Not as a replacement for domain knowledge. Evolution discovers patterns, but you still need to understand why a pattern works before risking real money.
Common Mistakes
| Mistake | Why It's Bad | Fix |
|---|
| Too few data points | Strategies overfit to noise | Use 90+ days for 1h, 180+ for 4h |
| Skipping OOS validation | In-sample Sharpe of 3.0 means nothing | Always validate on held-out data |
| Too many generations | Overfitting through excessive selection pressure | 3-5 generations is usually sufficient |
| Deploying immediately | No buffer for regime changes | Paper trade 2-4 weeks first |
| Ignoring the graveyard | Re-discovering dead strategies wastes compute | Review evolution_get_log before new runs |
| Using correlated symbols | BTCUSDT and ETHUSDT strategies overlap heavily | Test on uncorrelated markets |
Requirements
ANTHROPIC_API_KEY — Required for LLM-powered pattern discovery
- Binance public API — Used for OHLCV data (no API key needed)
- Python with numpy — For vectorized backtesting