Program backtesting
Program strategies are replayed against historical kline data. The sandbox receives the same MarketData API but backed by historical snapshots instead of live data.Running a program backtest
Prompt backtesting
Prompt backtesting re-runs AI decisions on historical market snapshots. This helps you evaluate whether prompt changes improve decision quality without waiting for live results.Prompt backtests call the LLM for each historical snapshot, so they consume API credits and take longer than program backtests. Use shorter time ranges and larger intervals for initial iteration.
Configuration options
| Parameter | Type | Default | Description |
|---|---|---|---|
symbols | string[] | — | Symbols to include in the backtest |
start_date | datetime | — | Start of the historical period |
end_date | datetime | — | End of the historical period |
interval | string | 1h | Candlestick interval (1m, 5m, 15m, 1h, 4h, 1d) |
initial_balance | float | 10000.0 | Starting virtual balance in USD |
slippage_pct | float | 0.05 | Simulated slippage as a percentage of order value |
maker_fee_pct | float | 0.02 | Simulated maker fee percentage |
taker_fee_pct | float | 0.05 | Simulated taker fee percentage |
Results
Backtest results include a comprehensive performance summary:Metrics
| Metric | Description |
|---|---|
| Total PnL | Net profit/loss over the test period |
| PnL % | Return as a percentage of initial balance |
| Sharpe Ratio | Risk-adjusted return (annualized) |
| Max Drawdown | Largest peak-to-trough decline |
| Win Rate | Percentage of profitable trades |
| Profit Factor | Gross profit divided by gross loss |
| Total Trades | Number of trades executed |
| Avg Trade Duration | Mean time a position was held |
Example response
Equity curve
The equity curve tracks portfolio value over time. Use it to visualize drawdown periods and growth patterns.Trade log
Every simulated trade is recorded with:- Entry/exit timestamps and prices
- Position size, leverage, and direction
- PnL and fees
- The decision reasoning (program output or LLM response)
Best practices
Use realistic fees
Set slippage and fee parameters to match your actual exchange costs. Overly optimistic assumptions inflate results.
Avoid overfitting
Test on out-of-sample data. Split your historical period into train and test windows.
Account for regime changes
Markets shift between trending and ranging. A strategy that works in one regime may fail in another.
Start with longer intervals
Begin with
4h or 1d intervals to iterate quickly, then refine with shorter intervals once the logic is sound.