Data Mining Bias

This extends my discussion of Evidence-Based Technical Analysis. I write this to improve the understanding of the role of theories for coping with data mining bias.

The Introduction of Theories

Evidence-Based Technical Analysis provides a great overview of many statistical issues. It draws our attention to Data Mining Bias, a serious source of error. It deserves a general audience.

Many data surveys fail to take it into account. They claim statistical significance when there is none. If you examine enough possibilities, you will find something that passes a standard statistical test. The cause may be random. But if you test enough “possible” causes, something will pass.

This is a common issue. How you approach the matter depends, on part, on how serious the consequences of claiming that cause-and-effect exist when, in fact, it does not. Many times, the penalty is virtually nonexistent. At other times, such as in medical research, the penalty can be severe.

My experience is in protecting military airplanes.

Ideally, an airplane detects any radar pulse directed toward it while it remains silent. A determined enemy also controls his emissions carefully. Detecting hostile interceptors at long range requires sensitive receivers. Because the microwave spectrum is broad, data mining bias is an issue.

If you have stealth, you have solved the problem. Stealth does not mean invisibility. It means greatly reduced visibility. A hostile airplane has to be very close. Its signal has to be very strong. You can detect it in spite of data mining bias.

It takes more effort without stealth. You must live with false alarms. It helps to introduce theories. Traditionally, aircraft have been most vulnerable to attacks from the rear within a small cone near the line of flight. It makes sense to pay more attention to radar pulses from this region than for others.

A hostile interceptor does not simply fire a missile. It must be directed into position. It must acquire the target. We can use such information. We can generate powerful theories.

In recent years, the military has joined its aircraft and other resources into networks. If any portion of the network knows about a hostile aircraft, it warns the entire flight. This allows us to form even more powerful theories. We can extract useful information from even the weakest radar pulse.

It is in this sense that theories overcome data mining bias. We focus only on what makes sense. We reject the false alarms that we cannot avoid.

Have fun.

John Walter Russell
August 12, 2007