Single Agent Backtesting

The single agent backtest runs the perpetual futures decision logic against historical OHLCV data, simulating trades without placing real orders. Use it to validate your configuration before going live.

Quick Start

uv run python backtest.py \
  --symbol BTC \
  --strategy single \
  --start-date 2024-01-01 \
  --end-date 2024-12-01

Command Reference

usage: backtest.py [--symbol SYMBOL] [--strategy {single,grid}]
                   [--start-date DATE] [--end-date DATE]
                   [--config CONFIG] [--resume-from REPORT]
                   [--output-dir DIR]

Options:
  --symbol          Asset to backtest (e.g. BTC, ETH)
  --strategy        Backtest strategy: single (default) or grid
  --start-date      Start date in YYYY-MM-DD format
  --end-date        End date in YYYY-MM-DD format
  --config          Path to config.yaml (default: config.yaml)
  --resume-from     Resume from a previous run's live_report.json
  --output-dir      Output directory (default: backtest_results/)

How the Backtest Works

Data download: Historical OHLCV data for the symbol and date range is fetched from Hyperliquid. Data is cached locally under data/ to avoid re-downloading.
Indicator computation: The same indicators.py functions used in live trading compute MA, RSI, MACD, Bollinger Bands from the historical candles.
LLM decisions: The actual LLM is called for each decision point using your configured model. This means backtesting uses real API credits.
Simulated execution: Trades are simulated using the historical close price. Slippage and fees are modeled (Hyperliquid Tier 0 fees).
Results output: A detailed report is saved to backtest_results/.

Real LLM Calls

Unlike some backtest frameworks that replay cached decisions, Quant Flow calls the LLM for each historical decision point. This gives realistic results but costs real API credits. A 12-month backtest on BTC with 3-minute intervals = ~175,000 decision points. Configure scheduler.interval_minutes appropriately to control cost.

Output

After the backtest completes, you'll find in backtest_results/<symbol>_<timestamp>/:

backtest_results/
└── BTC_20241201_143022/
    ├── live_report.json       # full trade-by-trade log
    ├── summary.json           # aggregate statistics
    ├── equity_curve.png       # equity curve chart
    └── trade_log.csv          # CSV export of all trades

Key Metrics in summary.json

{
  "total_return_pct": 34.2,
  "sharpe_ratio": 1.87,
  "max_drawdown_pct": -12.4,
  "win_rate": 0.58,
  "total_trades": 247,
  "avg_profit_per_trade_usd": 1.83,
  "largest_win_usd": 48.20,
  "largest_loss_usd": -23.10,
  "profit_factor": 1.94
}

Resuming Interrupted Backtests

If a backtest is interrupted (network issue, API error), resume from where it stopped:

uv run python backtest.py \
  --symbol BTC \
  --strategy single \
  --start-date 2024-01-01 \
  --end-date 2024-12-01 \
  --resume-from backtest_results/BTC_20241201_143022/live_report.json

The backtest reads existing trades from live_report.json and continues from the last completed decision point.

Configuration Tips for Backtesting

Use a separate config file for backtesting to avoid accidentally changing live settings:

cp config.yaml config.backtest.yaml

Adjust for cost efficiency:

# config.backtest.yaml
scheduler:
  interval_minutes: 30     # hourly decisions instead of 3-min (reduces API calls)

debate:
  enabled: false           # save API credits during backtest

market_monitor:
  enabled: false           # not applicable in backtest

Run with:

uv run python backtest.py --symbol BTC --strategy single \
  --start-date 2024-01-01 --end-date 2024-12-01 \
  --config config.backtest.yaml

Understanding Results

On Win Rate

A win rate of 50–60% is typical for a well-tuned strategy. More important is profit factor (gross profit ÷ gross loss) — aim for > 1.5.

On Drawdown

Max drawdown of 10–15% is generally acceptable for a leveraged crypto strategy. Higher drawdown means higher risk of hitting your protections.max_drawdown.max_drawdown_pct in live trading.

On Sharpe Ratio

Sharpe > 1.5 is good; > 2.0 is excellent for crypto markets.

Backtest ≠ Live Performance

Historical results do not guarantee future performance. LLM behavior, market microstructure, and funding rate dynamics change over time. Always start live trading with minimum position sizes.

Quick Start​

Command Reference​

How the Backtest Works​

Output​

Key Metrics in summary.json​

Resuming Interrupted Backtests​

Configuration Tips for Backtesting​

Understanding Results​

On Win Rate​

On Drawdown​

On Sharpe Ratio​

Next Steps​