Edited: Refusing to See the Obvious

You should reject all claims that an effect does not exist simply because a statistical test fails to declare significance. Such claims are false.

What is even worse, and I see it often, is that people go out of their way to avoid seeing the obvious.

The Proper Statistical Setting

Statistics can never show that two things are the same. Statistics identify differences, not similarities.

A standard statistical test may declare significance when there is none. The chance of a false alarm or false positive is the type 1 error. When we identify a 95% confidence limit, the chance of a type 1 error is 5%.

Statisticians can and do show similarities, but in a roundabout manner. After making a series of assumptions and setting up a test, a statistician can calculate whether his test would have revealed an actual difference of a specified magnitude. This tells us about the likelihood of a missed detection. This is the type 2 error.

A plot of the probability of a type 2 error versus an actual difference is called the POWER of a test. Unless you specify the power of a test, any failure to report statistical significance is meaningless.

The Scandal in Financial Statistics

Here are quotes from pages 382-383 of David Dreman's Contrarian Investment Strategies: The Next Generation:

"..The statistics of the original mutual fund researchers in the sixties and early seventies failed to turn up such above-average performance by any investors.."

"..On closer examination, the efficient market victory vanished.."

"Even to be flagged on the screen, the manager had to outperform the market by 5.83% annually for 14 years. When we remember a top manager might beat the market by 1.5 or 2% a year over this length of time, the returns required by Jensen to pick up managers outperforming the averages were impossibly high. Only a manager in the league of a Buffett or Templeton might make the grade. One fund outperformed the market by 2.2% a year for 20 years, but according to Jensen's calculations, this superb performance was not statistically significant."

"In another study..it was not possible at a 95% statistical confidence level to say a portfolio up 90% over 10 years was better managed than another portfolio down 3%.."

Faulty Statistical Assumptions

The basic concept behind statistics is brilliant. You set up a hypothesis. It is called the null hypothesis. Then you reject it.

If an assumed probability distribution would have produced an observed result less than 5% of the time, you declare at a 95% confidence level that the differences are real, not simply the result of randomness.

Most often, the assumed probability distribution is the normal, Gaussian, bell shaped curve. [Actually, lognormal. The percentage increases and decreases have a normal distribution, not prices.]

This is an excellent approximation and it works exceedingly well AT THE 90% CONFIDENCE LEVEL (two-sided, 95% one-sided confidence level). This is an exceedingly dangerous assumption when you try to assign high levels of confidence.

Benoit Mandelbrot, who wrote The (Mis)Behavior of Markets, points to the wildly inaccurate risk assessments of insurance and insurance related products (such as commodity futures and cotton farms). A standard finance model declared the odds of ruin at one chance in ten billion billion when real-world odds were one in ten to one in thirty (pages 232-233).

Refusing to See the Obvious

For some reason, which I do not understand, it is considered OK for a financial researcher to miss the obvious. If researchers were to report the sensitivity of their tests accurately, that is, if they were to report the power of their tests and the type 2 errors, we would not run into this problem.

It is my opinion that the failure to see the obvious is weighted heavily by those making honest mistakes. It is also my opinion that many of those making honest mistakes know better. But they are lazy.

Let's take an easy example. If we plot real stock returns versus time (on a semilog graph), we see small variations about a straight line. The line corresponds to a real, annualized return around 6.5% to 7.0%. It is obvious that the long-term of the stock market has been predictable and it is reasonable to expect it to continue to be predictable.

We don't have to be demanding to get useful information. For those with the proper technical skills, it is straightforward to introduce formalism and to show whatever rigor is needed.

In essence, if the price strays far from the long-term rate of return, it will correct. We do not insist that we know the exact details. We do insist that corrections will take place.

If someone simply wants to be contentious, he can demand all sorts of definitions and set criteria that we cannot meet. I have seen this kind of thing many times. It is not technically challenging to refute. Not by any stretch. But it can be exceedingly tedious.

The latest approach that I have seen for rejecting the obvious uses data overlap as a subtle excuse. We have two centuries of stock market returns (1800-2000), with better numbers in later periods, that vary slightly around a rate of 6.5% to 7.0%. If we use a 10-year (or 20-year) period as the minimum amount of time for reporting effects and IF WE INSIST ON NO DATA OVERLAP whatsoever, we have suddenly compressed 200 years of fluctuating stock prices into 20 (or 10) distinct data points (known as degrees of freedom).

It is more difficult to claim statistical significance with 20 data points than with 200.

Before too long, you won't be able to conclude anything from the data. Best of all, you give the IMPRESSION, BUT NOT THE REALITY of doing things right, of applying rigor.

Let us think through this statistical problem with the intent of extracting useful information. Does it make sense to talk about a long-term return of the stock market? Or is the obvious only an illusion?

Look at two 10-year sequences, offset by one year. Consider 1946-1956 and 1947-1957. Eight years overlap, but two do not. The first sequence has 1946. The second sequence has 1957. Both sequences share the years 1947-1956.

We could draw a line for eight years of returns at the overall historical rate. Now the question becomes whether year-to-year random price fluctuations from the two years, 1946 and 1957, are big enough to account for the deviations from the overall historical returns.

The answer is that one year is not quite big enough but two are. From our 200 years, we still have the information from more than 100 independent data points (degrees of freedom).

We do not claim to know the exact confidence level. But we would not have claimed a confidence level above 90% (two-sided, 95% one-sided confidence level) anyway.

Have fun.

John Walter Russell
December 4, 2005