The ML4T Workflow: From Model to Strategy Backtesting

This chapter integrates the various building blocks of the machine learning for trading (ML4T) workflow and presents an end-to-end perspective on the process of designing, simulating, and evaluating an ML-driven trading strategy. Most importantly, it demonstrates in more detail how to prepare, design, run and evaluate a backtest using the Python libraries backtrader and Zipline.

The ultimate goal of the ML4T workflow is to gather evidence from historical data that helps decide whether to deploy a candidate strategy in a live market and put financial resources at risk. This process builds on the skills you developed in the previous chapters because it relies on your ability to

work with a diverse set of data sources to engineer informative factors,
design ML models that generate predictive signals to inform your trading strategy, and
optimize the resulting portfolio from a risk-return perspective.

A realistic simulation of your strategy also needs to faithfully represent how security markets operate and how trades are executed. Therefore, the institutional details of exchanges, such as which order types are available and how prices are determined (see Chapter 2, Market and Fundamental Data, also matter when you design a backtest or evaluate whether a backtesting engine includes the requisite features for accurate performance measurements. Finally, there are several methodological aspects that require attention to avoid biased results and false discoveries that will lead to poor investment decisions.

The data used for some of the backtest simulations are generated by the script data_prep.py in the data directory and are based on the linear regression return predictions in Chapter 7, Linear Models.

How to backtest an ML-driven strategy

In a nutshell, the ML4T workflow is about backtesting a trading strategy that leverages machine learning to generate trading signals, select and size positions, or optimize the execution of trades. It involves the following steps, with a specific investment universe and horizon in mind: - Source and prepare market, fundamental, and alternative data - Engineer predictive alpha factors and features - Design, tune, and evaluate ML models to generate trading signals - Decide on trades based on these signals, e.g. by applying rules - Size individual positions in the portfolio context - Simulate the resulting trades triggered using historical market data - Evaluate how the resulting positions would have performed

The pitfalls of backtesting and how to avoid them

Backtesting simulates an algorithmic strategy based on historical data with the goal of producing performance results that generalize to new market conditions. In addition to the generic uncertainty around predictions in the context of ever-changing markets, several implementation aspects can bias the results and increase the risk of mistaking in-sample performance for patterns that will hold out-of-sample.

Getting the data right

Data issues that undermine the validity of a backtest include - look-ahead bias, - survivorship bias, - outlier control, as well as - the selection of the sample period.

Getting the simulation right

Practical issues related to the implementation of the historical simulation include: - a failure to mark to market to accurately reflect market prices and account for drawdowns; - unrealistic assumptions about the availability, cost, or market impact of trades; or - incorrect timing of signals and trade execution.

Getting the statistics right: Data-snooping and backtest-overfitting

The most prominent challenge to backtest validity, including to published results, relates to the discovery of spurious patterns due to multiple testing during the strategy-selection process. Selecting a strategy after testing different candidates on the same data will likely bias the choice because a positive outcome is more likely to be due to the stochastic nature of the performance measure itself. In other words, the strategy is overly tailored, or overfit, to the data at hand and produces deceptively positive results.

Marcos Lopez de Prado has published extensively on the risks of backtesting, and how to detect or avoid it. This includes an online simulator of backtest-overfitting.

Code Example: The deflated Sharpe Ratio

De Lopez Prado and David Bailey derived a deflated SR to compute the probability that the SR is statistically significant while controlling for the inflationary effect of multiple testing, non-normal returns, and shorter sample lengths.

The pyton script deflated_sharpe_ratio in the directory multiple_testing contains the Python implementation with references for the derivation of the related formulas.

References

The Deflated Sharpe Ratio: Correcting for Selection Bias, Backtest Overfitting and Non-Normality, Bailey, David and Lopez de Prado, Marcos, Journal of Portfolio Management, 2013
Backtest Overfitting: An Interactive Example
Backtesting, Lopez de Prado, Marcos, 2015
Secretary Problem (Optimal Stopping)
Optimal Stopping and Applications, Ferguson, Math Department, UCLA
Advances in Machine Learning Lectures 4/10 - Backtesting I, Marcos Lopez de Prado, 2018
Advances in Machine Learning Lectures 5/10 - Backtesting II, Marcos Lopez de Prado, 2018

How a backtesting engine works

Put simply, a backtesting engine iterates over historical prices (and other data), passes the current values to your algorithm, receives orders in return, and keeps track of the resulting positions and their value. In practice, there are numerous requirements to create a realistic and robust simulation of the ML4T workflow depicted above. The difference between vectorized and event-driven approaches illustrates how the faithful reproduction of the actual trading environment adds significant complexity.

Vectorized vs event-driven backtesting

A vectorized backtest is the most basic way to evaluate a strategy. It simply multiplies a signal vector that represents the target position size with a vector of returns for the investment horizon to compute the period performance.

Code example: a simple vectorized backtest

We illustrate the vectorized approach using the daily return predictions that we created using ridge regression in Chapter 7

The code examples for this section are in the notebook vectorized_backtest.

Key Implementation Aspects

The requirements for a realistic simulation may be met by a single platform that supports all steps of the process in an end-to-end fashion, or by multiple tools that each specialize in different aspects. For instance, you could handle the design and testing of ML models that generate signals using generic ML libraries like scikit-learn or others that we will encounter in this book and feed the model outputs into a separate backtesting engine. Alternatively, you could run the entire ML4T workflow end-to-end on a single platform like Quantopian and QuantConnect.

The following implementation details need to be addressed to put this process in action, and are discussed in more detail in this section of the book: - Data ingestion: Format, frequency, and timing - Factor engineering: Built-in computations vs third-party libraries - ML models, predictions, and signals - Trading rules and execution - Performance evaluation

backtrader: a flexible tool for local backtests

backtrader is a popular, flexible, and user-friendly Python library for local backtests with great documentation, developed since 2015 by Daniel Rodriguez. In addition to a large and active community of individual traders, there are several banks and trading houses that use backtrader to prototype and test new strategies before porting them to a production-ready platform using, e.g., Java. You can also use backtrader for live trading with several brokers of your choice (see the backtrader documentation and Chapter 23, Next Steps)).

The code examples for this section are in the notebook backtesting_with_backtrader.

Key concepts of backtrader’s Cerebro architecture

Backtrader’s Cerebro (Spanish for “brain”) architecture represents the key components of the backtesting workflow as (extensible) Python objects. These objects interact to facilitate the processing of input data and the computation of factors, formulate and execute a strategy, receive and execute orders, and track and measure performance. A Cerebro instance orchestrates the overall process from collecting inputs, executing the backtest bar-by-bar, and providing results.

The library uses conventions for these interactions that allow you to omit some detail and streamline the backtesting setup. I highly recommend browsing the documentation to dive deeper if you plan on using backtrader to develop your own strategies.

Code Example: How to use backtrader in practice

We are going to demonstrate backtrader using again the daily return predictions by the ridge regression from Chapter 7, Linear Models, as for the vectorized backtest earlier in this chapter. We will create the Cerebro instance, load the data, formulate and add the Strategy, run the backtest, and review the results.

The notebook backtesting_with_backtrader contains the code examples and some additional details.

Resources

zipline: production-ready backtesting by Quantopian

The open source Zipline library is an event-driven backtesting system maintained and used in production by the crowd-sourced quantitative investment fund Quantopian to facilitate algorithm-development and live-trading. It automates the algorithm's reaction to trade events and provides it with current and historical point-in-time data that avoids look-ahead bias.

Chapter 4, we introduced zipline to simulate the computation of alpha factors, and in Chapter 5 we added trades to simulate a simple strategy and measure its performance as well as optimize portfolio holdings using different techniques.

Code Examples: Ingesting Data and training ML models, offline and on Quantopian

The code for this section is in the subdirectory ml4t_workflow_with_zipline. Please see the README for details.