Evidence-Based Technical Analysis

Evidence-Based Technical Analysis is a great overview. It deserves a general audience, not restricted to technical analysis and not restricted to data analysis.

Professor David Aronson explains the scientific method and how to analyze data. He has great quotes discrediting the Efficient Market Hypothesis. He presents some important observations from Behavioral Finance. He brings out many of the statistical pitfalls found in the literature. I enjoyed reading that scientists don’t use the scientific method. I have known that for years.

In a case study, Professor David Aronson writer examined 6402 technical indicator rules. None showed statistical significance at the 95% level, which was disappointing.

At this point, I introduce my own analysis of what went wrong and how to correct it.

The statistical technique was very careful to eliminate data mining bias, but it was not careful about the power of the test. To show statistical significance, an individual trading rule had to beat randomness by 15% per year. That is an exceedingly high hurdle. Rules exceeding randomness by 5%+ were commonplace.

Data mining bias is what happens when you select the most significant of many possibly rules. We see such randomness reported as fact frequently in medical screening. In the case study, there were 6402 rules. Even if they all had no effect whatsoever, you would expect 5% of them (320 rules) to show statistical significance at the 95% level. You would expect 0.1% of them (6 rules) to show statistical significance at the 99.9% level. You could easily have one of them to show statistical significance at the 99.99% or even the 99.999% level.

Such is the effect of data mining bias.

Keep in mind that the 15% hurdle is built into the data. It cannot be removed with the existing test design. At the same time, understand that a reliable improvement in the total return of 1% is huge when compounded over a decade.

This is a major challenge in test design.

To increase the power of the test to a reasonable level requires dramatically reducing the number of conditions tested. Doing so is difficult. It can be rewarding.

The first step is to cluster rules together. You must introduce plausible theories to make this meaningful. You end up with two sets of results: those that show whether there is an effect and those that optimize the rule.

For example, you may be able to establish that moving averages work with a high degree of confidence. You may optimize the length of the moving average at a much lower level of confidence. You might select the very best condition at a confidence level no better than a coin toss.

Another way to increase the power of a test might be to eliminate bootstrapping. Bootstrapping can eliminate underlying causal relationships. I discovered this when I examined the Sortino ratio and Managing Downside Risk in Financial Markets (Current Research E). In particular, using monthly data removed the effect of valuations on stock returns.

It is possible that some of the 6402 rules are as powerful as 15% (annualized) when applied to actual historical data as opposed to bootstrapped data.

Another opportunity is tying certain rules to specific conditions. For example, the behavior of stock prices is different when a buyout bid has been made. Stock holders are under intense pressure whether to accept an early offer or to wait for a better price, which might not come. Not all buyout attempts succeed.

The rules during buyout attempts should differ from the rules at other times.

Have fun.

John Walter Russell
May 31, 2007
Corrected: June 3, 2007
Modified: June 20, 2007

Data Mining Bias