Dangerous Lies Your Backtest TellsDangerous Lies Your Backtest Tells
We are easily hooked on the dopamine rush of seeing profitable equity curves during backtesting. The allure of parabolic returns is often so strong it is blinding to the inherent flaws that exist, to varying degrees, in every backtest.
Backtesting, while often seen as an essential step in designing and verifying trading strategies - is far from a foolproof method. Many traders place too much confidence in their backtested results, only to see their strategies fail when used in the live markets. The reality is that backtesting is riddled with limitations and biases that lead to a false sense of security in a strategy’s effectiveness. Let’s take a comprehensive look into the many flaws of backtesting, and explore the common pitfalls of using a simple back test as your only method of verifying a strategy's efficacy.
1. Choosing the Winning Team After the Game is Already Over
(Selection Bias)
When selecting which instruments for backtesting, it is common to choose assets you are already interested in or those that performed well in the past. This introduces selection bias, as the strategy is tested on assets that may have been outliers. While this may produce impressive backtest results, it creates an illusion of reliability that may not hold up when applied to other assets or future market conditions - a theme that will be common for most of the explored backtesting drawbacks.
Example:
Imagine backtesting a Long only strategy using only tech stocks that surged during a market boom. The strategy might look incredibly successful in the backtest, but when applied to other sectors or different market phases it will most likely fail to perform - because the selection was based on past winners rather than a broader, more balanced approach.
2. You Only See the Ships that Make it to Shore
(Survivorship Bias)
Similar to the above, survivorship bias occurs when backtests only include assets that have survived of the test period - excluding those that were delisted, went bankrupt, or failed entirely. This creates a skewed dataset, inflating performance metrics beyond reasonable levels once again. By only focusing on assets that are still around, you overlook the fact that many others didn’t make it - and these failures could have significantly impacted the strategy’s results. By ignoring delisted companies, or rug-pulled crypto projects, you inherently induce a selection bias - as purely because your chosen instruments didn’t go to zero they must have performed better.
Example:
Suppose you backtest a low-cap cryptocurrency strategy. If your backtest spans for, say, five years the test can give the illusion of success - but what’s missing is the hundreds of tokens that were launched and failed during the same period. How can we possibly assume that we will be lucky enough to only pick tokens that survive the next five years?
3. Reading Tomorrow’s News Today
(Look-Ahead Bias)
Look ahead bias occurs when future information is unintentionally used in past decision making during a backtest. This can often occur due to coding errors in an automated system which leads to unreasonable and unrepeatable results. Look-ahead bias isn’t limited to algorithmic backtesting - it can also affect manual backtests. Traders will often miss false signals because they can already see the outcome of the trade. This knowledge of the future can affect the accuracy of a manual backtest - both as a conscious decision by the trader but also subconsciously.
if Current_Price < Tomorrows_Close
strategy.entry("Enter a Long Position", strategy.long)
// An extreme example
4. Perfecting the Final Chord, but Forgetting the Song
(Recency Bias)
Recency bias occurs when traders place too much emphasis on the most recent data or market conditions in a backtest. This usually occurs when a trader feels they missed an opportunity in the past few months - and tries to develop a strategy that would have captured that specific move. By focusing too heavily on recent history, it is easy to neglect the fact that markets usually move in long cyclical phases. This over optimisation for recent conditions will, at best, result in a strategy that performs well in the short term but fails as soon as market dynamics shift.
Example
Frustrated by missing the most recent leg of the bull market, a trader develops a strategy that would have perfectly performed during this period. However, when the trader begins live trading at the top of the market, the strategy quickly fails. It was only optimized for that short and specific market phase and was unable to adapt to the changing market conditions.
5. Forcing the Square into the Round Hole
(Overfitting)
Overfitting occurs when a strategy is excessively optimized for historical data, capturing noise and random fluctuations rather than meaningful patterns. Overfitting is common when traders test too many parameter combinations, tweaking their strategy until it fits the past data perfectly. In contrast to the previous point, this over optimisation can occur on data of any length, whether years or even longer periods.
Example
Adjusting a large range of parameters in a high frequency strategy by incredibly small increments and deciding to use the calibrations that yield the highest performance.
6. Mixing Oil and Water
(Conflating Trend and Mean Reversion Systems)
Traders often attempt to design strategies that perform well in both trending and mean reverting environments, which leads to muddled logic and poor performance in ALL environments. A trend following strategy is meant to capitalize on sustained price movements, and should naturally underperform during mean-reverting or ‘ranging’ periods. In a range-bound market, a trend-following strategy will often buy near the top of the range after detecting strength, only for the price to reverse. Conversely, a mean reversion strategy is built to profit from oscillations around a stable point and forcing both approaches into a single system results in unrealistic backtest performance and poor real-world results.
One of the common mistakes is when a trend following strategy ‘accidently’ performs well during mean-reverting periods. This skews the backtest metrics because any gains during non-trending markets are multiplied significantly during actual trends. As a result, the backtest shows artificially positive performance - but the strategy quickly falls apart in live trading. Normally, a trend following strategy would incur losses during a range-bound market and only begin to recover once a new trend emerges. However, if a strategy is overfit to handle both the trend and mean reversion periods of the past, it doesn’t need to recover losses and instead compounds gains during the entire trend. This creates inflated backtest results that won’t hold up in real trading.
Example:
A trader develops a trend following system that, through over-optimization, performs surprisingly well during mean-reversion phases. In the backtest, the strategy shows strong returns, even in ranging markets. However, in live trading, the system fails, leaving the trader with poor performance. Instead, the trader should have accepted ‘lower’ returns from a strategy that wasn’t overfit - because in live markets robust strategies with mediocre backtests perform better than overfit strategies that only excel in backtesting.
7. Seeing the World Through a Keyhole
(Limited Data Skewed by Outliers)
Strategies built on assets with limited data are highly susceptible to skew results, especially when outliers dominate the dataset. Without sufficient data, it becomes nearly impossible to assess whether a strategy can consistently perform into the future. Some strategies, like trend following, are designed to capture outliers, that is, the periods of performance above the norm. The issue arises when testing on a small sample as it’s difficult to determine if the strategy can consistently capture trends or just got lucky.
Example:
A trader develops a trend following strategy for a cryptocurrency that has recently launched. The backtest shows massive gains, as it is common for projects to make large returns as soon as they are listed. However without enough data history, it is impossible to assess the actual effectiveness of this strategy, as its performance metrics are positively skewed by the ‘listing pump.’
The image shows a cryptocurrency project launched in October 2020. At first glance, the EMA Crossover strategy appears profitable, but a closer look reveals that most of the profit comes from the first trade, which is considered an outlier. If that trade was removed, the strategy as a whole would become unprofitable. Following this strategy is essentially betting on the project to experience another sharp rise similar to what occurred in 2020. While technically this isn’t impossible, it is much riskier - a more proven and verified strategy would increase your probability of success.
8. Designing a Car that Doesn’t Fit on the Road
(Execution Constraints and Positions Sizing)
In backtesting, real world constraints such as minimum or maximum order sizes are often ignored, leading to unrealistic trade execution. Traders may find that they either don’t have enough capital to satisfy the minimum order size - either immediately or after a small drawdown. Additionally, compounded returns on a backtest can lead to absurd positions sizes that could never be bought or sold in the real market. This particularly is more problematic for deep backtestests.
Example:
A backtest shows spectacular growth, with the account size ballooning overtime and resulting in an extremely high profit percentage. However, in real-word conditions, the required position size to continue executing the strategy becomes so large that it exceeds the liquidity of the market - making it impossible to receive comparable profit percentages on real world trading.
9. Death by a Thousand Paper Cuts
(Not Accounting for Fees, Commissions and Slippage)
When performing a backtest, traders often overlook critical transaction costs such as fees, slippages and spreads. These seemingly small costs can accumulate and significantly erode profits, especially strategies that rely on frequent trades with a low average return per trade. Slippage also should include execution slippage - the time delay between receiving a signal from a system, placing an order and its execution. This is particularly problematic for lower timeframe trading where even minor delays can drastically swing a strategy from profitable to unprofitable
Example:
A day trader runs a backtest on a scalping strategy and sees parabolic returns. However in live trading, the small profits from each trade are wiped out by broker commissions, spreads and the slippage that occurs from both position sizing, and when trades are executed slightly later than expected. This strategy, while successful in the backtest, failed to account for the ‘death by a thousand paper cuts.’
10. Filling Half of the Grocery Cart
(Partial Order Fills)
In low liquidity environments, or when trading large position sizes, partial order fills are common - meaning traders only get a portion of their order executed at their desired price. This can significantly impact returns. Backtests will usually assume complete fills at the exact target price. However, in reality a trader experiencing a partial order fill must decide whether to complete the position at a worse price or leave a portion of the target position size out of the market. Both choices will lead to results that are not comparable to the backtested results.
Example:
A trader places a limit order to buy 100 shares of a low-liquidity stock at a price of $10. The order is only partially filled, with 60 shares bought at $10, while the remaining 40 shares require the new, higher price. The trader now faces the choice of paying more, or leaving part of the trade out. This is a major deviation from the backtest, which assumed the complete position was bought at $10.
11. Betting on Lightning Striking Twice
(Black Swan Events)
Black swan events are rare, inherently unpredictable, and have a significant impact on financial markets. Strategies designed to avoid drawdowns during these events are at risk of being overfit. Traders often fall into the trap of building systems that avoid drawdowns during past black swan events - overfitting their strategies to these rare occurrences. These strategies are unlikely to succeed in regular market conditions and contain no extra edge in protecting a trader from future black swans events.
Example:
After the FTX collapse caused a sharp drop in crypto prices, a trader chooses to develop a swing trading strategy designed to avoid all losses during this event. However, by optimizing the strategy to exit positions before the collapse, the trader unintentionally overfits it. As a result, the strategy begins to sell off positions too early in other situations, cutting profits short. Prior to the FTX collapse, the market was still in an uptrend, and there were no clear signs of an impending downturn - so attempting to optimize for such a rare event ends up compromising the strategy’s performance in more typical market conditions.
12. Expecting a Weeks Pay After Only Working One Shift
(Time of Day and Day of Week Restrictions)
Many traders are only able to trade during specific hours or days of the week, yet their backtests often include data from periods where they are unavailable - such as overnight sessions. This creates an unrealistic expectation of returns. For example, in markets like crypto that trade 24/7, backtesting a day trading strategy on the full market period gives a false impression of potential profits if you can only trade during certain hours. Additionally, market participants also differ depending on the time of day, as entire countries wake up and go to sleep at different times of day. One could make the assumption that human behavior as a whole might be the same, but the number of participants and liquidity will definitely change.
Example:
A day trader backtests a strategy using 24/7 crypto market data - but is only able to trade on weekday afternoons due to other commitments.
13. Siphoning Gas from a Moving Car
(Capital Drain and Addition)
Backtests frequently assume infinite compounding, where no capital is ever added or withdrawn from the trading account. In practice, however, traders will regularly add or remove funds - which significantly impacts the performance of a strategy. For instance, withdrawing money during a drawdown forces the strategy to work harder to recover losses, as it now requires higher returns to break even. Similarly, adding capital can skew results by altering position sizing. While it is necessary to manage capital in this way, backtests usually don’t account for these changes and once again, leads to results that are not repeated in practice.
Example:
A trader consistently pulls a portion of profits from their account each month. In the backtest, no withdrawals are considered, and the strategy appears highly profitable. However, in live trading these regular withdrawals put pressure on the account, and especially over longer periods of time, this reduced level of compound will lead to significant underperformance relative to the backtest due to the reduced compounding effect on returns.
14. Your Subscription Service Increase Price Without You Realizing
(Interest Rates and Funding Costs)
The ‘cost of capital’ - such as leverage costs, interest rate and funding fees - can fluctuate over time, but backtests often overlook these dynamic costs or even fail to account for them altogether. In live markets, these changes can significantly erode profit margins. Not considering these costs, especially the factors affecting their variability, can easily turn a profitable backtest into an unprofitable strategy in live trading.
Example:
A trader backtests a strategy for use in cryptocurrency perpetual futures. The strategy is designed for bull markets but fails to account for the rising funding rates frequently seen during periods of high demand. As the cost to maintain an open position skyrockets, the trader’s profit margins quickly shrink, making the strategy far less viable than the backtest indicated. This is particularly dangerous because as the funding fees erode the position’s margin, the liquidation price rises faster than expected, potentially resulting in the entire position being liquidated - even though the trade appeared profitable on paper.
15. You Can’t Ride the Wave Past the Shore
(Alpha Decay)
In highly competitive markets, especially in high-frequency trading, the edge of a strategy (alpha) can erode over time as more participants exploit similar inefficiencies. This gradual loss of profitability - known as alpha decay - often isn’t captured in backtesting, which assumes static market conditions. Alpha decay is particularly relevant in high-frequency trading, where competition and frontrunning are more intense, while it tends to be less of an issue in higher time-frame swing trading.
16. Playing Chess Against Yourself and Expecting to Win Every Time
(Psychological Factors)
Psychological biases still affect fully systematic traders. The assumption that traders will follow their strategy without hesitation or emotional interference rarely holds true in live trading, especially during periods of drawdown or high volatility. Manual and automated traders alike feel the same compulsion after experiencing drawdown. The temptation to tweak or abandon a strategy during this period is strong and often leads to the worst decision. It is well documented anecdotally that many traders find that after modifying a ‘losing’ strategy, the new version performs worse than the original, as it has been adjusted to avoid the losses of the past and misses future gains by virtue of overfitting.
Example:
An algorithmic trader watches as their automated strategy experiences a significant drawdown. Panicking, the trader tweaks the parameters in order to avoid further losses. Shortly after, the original strategy would have recovered, but the modified version continues to struggle as the adjustments were made in reaction to short term losses instead of accounting for long term performance.
Final Note:
Congratulations if you made it this far! This might not be the most exciting topic, but it’s essential knowledge for every trader and investor. This article was written to warn you of the dangers of relying on backtests - and provides a checklist of common pitfalls to watch out for. Whether you’re running your own backtest or reviewing someone else’s, it’s critical to look beyond the shiny numbers and assess the real-world viability. What looks great on paper may not hold up in the real world.
Best of luck in the markets - but remember: stay prudent, and you’ll make your own luck!
Overfitting
Adaptive vs. Over-fitting StrategiesAdaptive trading strategies and over-fitting strategies are two approaches that have been the subject of much debate in the world of financial markets. On one side, adaptive trading strategies involve the use of machine learning algorithms to analyze market data and adapt to changing market conditions in real-time. These strategies aim to optimize trading performance by continuously learning from market data and adjusting their approach accordingly.
On the other side, over-fitting strategies involve the use of complex models that may be too sensitive to the specific characteristics of a particular market dataset. This can result in the model making predictions that are not applicable to other market conditions, leading to poor performance when the model is deployed in live trading.
One argument in favor of adaptive trading strategies is that they have the potential to significantly improve trading performance by continuously learning from market data and adapting to changing conditions. These strategies can also be more flexible and responsive to market changes, allowing traders to take advantage of opportunities as they arise.
However, there are also valid concerns about the use of adaptive trading strategies. One potential issue is that these strategies may be prone to overfitting, where the model becomes too closely tied to the specific characteristics of a particular dataset and is not able to generalize well to other market conditions. This can lead to poor performance when the model is deployed in live trading.
Another concern about adaptive trading strategies is that they may require a large amount of data to be effective, which may not be practical for traders who are working with limited data sets. Additionally, these strategies may be more complex and require more advanced technical expertise to implement and maintain, which may not be feasible for all traders.
Overall, the debate between adaptive trading strategies and over-fitting strategies is a complex one, and there is no one-size-fits-all answer. The best approach will depend on the specific needs and goals of the trader, as well as the resources and expertise available. Ultimately, it is important for traders to carefully consider the pros and cons of both approaches and choose the one that is most appropriate for their needs.