Financial Feature Engineering: How to research Alpha Factors

Algorithmic trading strategies are driven by signals that indicate when to buy or sell assets to generate superior returns relative to a benchmark such as an index. The portion of an asset's return that is not explained by exposure to this benchmark is called alpha, and hence the signals that aim to produce such uncorrelated returns are also called alpha factors.

If you are already familiar with ML, you may know that feature engineering is a key ingredient for successful predictions. This is no different in trading. Investment, however, is particularly rich in decades of research into how markets work and which features may work better than others to explain or predict price movements as a result. This chapter provides an overview as a starting point for your own search for alpha factors.

This chapter also presents key tools that facilitate the computing and testing alpha factors. We will highlight how the NumPy, pandas and TA-Lib libraries facilitate the manipulation of data and present popular smoothing techniques like the wavelets and the Kalman filter that help reduce noise in data.

We also preview how you can use the trading simulator Zipline to evaluate the predictive performance of (traditional) alpha factors. We discuss key alpha factor metrics like the information coefficient and factor turnover. An in-depth introduction to backtesting trading strategies that use machine learning follows in Chapter 6, which covers the ML4T workflow that we will use throughout the book to evaluate trading strategies.

Please see the Appendix - Alpha Factor Library for additional material on this topic, including numerous code examples that compute a broad range of alpha factors.

Alpha Factors in practice: from data to signals

Alpha factors are transformations of market, fundamental, and alternative data that contain predictive signals. They are designed to capture risks that drive asset returns. One set of factors describes fundamental, economy-wide variables such as growth, inflation, volatility, productivity, and demographic risk. Another set consists of tradeable investment styles such as the market portfolio, value-growth investing, and momentum investing.

There are also factors that explain price movements based on the economics or institutional setting of financial markets, or investor behavior, including known biases of this behavior. The economic theory behind factors can be rational, where the factors have high returns over the long run to compensate for their low returns during bad times, or behavioral, where factor risk premiums result from the possibly biased, or not entirely rational behavior of agents that is not arbitraged away.

Building on Decades of Factor Research

In an idealized world, categories of risk factors should be independent of each other (orthogonal), yield positive risk premia, and form a complete set that spans all dimensions of risk and explains the systematic risks for assets in a given class. In practice, these requirements will hold only approximately.


Engineering alpha factors that predict returns

Based on a conceptual understanding of key factor categories, their rationale and popular metrics, a key task is to identify new factors that may better capture the risks embodied by the return drivers laid out previously, or to find new ones. In either case, it will be important to compare the performance of innovative factors to that of known factors to identify incremental signal gains.

Code Example: How to engineer factors using pandas and NumPy

The notebook feature_engineering.ipynb in the data directory illustrates how to engineer basic factors.

Code Example: How to use TA-Lib to create technical alpha factors

The notebook how_to_use_talib illustrates the usage of TA-Lib, which includes a broad range of common technical indicators. These indicators have in common that they only use market data, i.e., price and volume information.

The notebook common_alpha_factors in th appendix contains dozens of additional examples.

Code Example: How to denoise your Alpha Factors with the Kalman Filter

The notebook kalman_filter_and_wavelets demonstrates the use of the Kalman filter using the PyKalman package for smoothing; we will also use it in Chapter 9 when we develop a pairs trading strategy.

Code Example: How to preprocess your noisy signals using Wavelets

The notebook kalman_filter_and_wavelets also demonstrates how to work with wavelets using the PyWavelets package.


From signals to trades: backtesting with `Zipline`

The open source zipline library is an event-driven backtesting system maintained and used in production by the crowd-sourced quantitative investment fund Quantopian to facilitate algorithm-development and live-trading. It automates the algorithm's reaction to trade events and provides it with current and historical point-in-time data that avoids look-ahead bias.

  • Chapter 8 contains a more comprehensive introduction to Zipline.
  • Please follow the instructions in the installation folder.

Code Example: How to use Zipline to backtest a single-factor strategy

The notebook single_factor_zipline develops and test a simple mean-reversion factor that measures how much recent performance has deviated from the historical average. Short-term reversal is a common strategy that takes advantage of the weakly predictive pattern that stock price increases are likely to mean-revert back down over horizons from less than a minute to one month.

Code Example: Combining factors from diverse data sources on the Quantopian platform

The Quantopian research environment is tailored to the rapid testing of predictive alpha factors. The process is very similar because it builds on zipline, but offers much richer access to data sources.

The notebook multiple_factors_quantopian_research illustrates how to compute alpha factors not only from market data as previously but also from fundamental and alternative data.

Code Example: Separating signal and noise – how to use alphalens

The notebook performance_eval_alphalens introduces the alphalens library for the performance analysis of predictive (alpha) factors, open-sourced by Quantopian. It demonstrates how it integrates with the backtesting library zipline and the portfolio performance and risk analysis library pyfolio that we will explore in the next chapter.

alphalens facilitates the analysis of the predictive power of alpha factors concerning the: - Correlation of the signals with subsequent returns - Profitability of an equal or factor-weighted portfolio based on a (subset of) the signals - Turnover of factors to indicate the potential trading costs - Factor-performance during specific events - Breakdowns of the preceding by sector

The analysis can be conducted using tearsheets or individual computations and plots. The tearsheets are illustrated in the online repo to save some space.

  • See here for a detailed alphalens tutorial by Quantopian

Alternative Algorithmic Trading Libraries and Platforms