Statistical Approximations

I have copied this part of a write up from Professor Peter Ponzo’s Gummy Stuff web site. I wrote this portion.

Handling the statistics is a more difficult task. There are no standard formulas available for us to use.

We are content to apply the Central Limit Theorem. We restrict ourselves to very coarse levels of precision. We act as if the actual distribution were Gaussian (i.e., normal or bell shaped). We limit ourselves to 90% confidence limits. We reject outright any claims to high statistical precision (such as 98% or 99%). We hope that the actual precision is in the right ballpark (such as between 80% and 95%).

The underlying issue with statistics is that our historical sequences overlap.

One way of handling such a situation is to demand almost complete independence of all data sets. Typically, one calculates an autocorrelation function and determines the number of years that it takes to reach 70% to 90% of the total area (or energy). This kind of approach is helpful when one is most interested in eliminating false alarms.

We look at the problem differently. We want high sensitivity. We are willing to accept a reasonable amount of error.

We observe that prices swing radically from one year to the next but that E10, which is the average of a decade’s earnings, varies slowly. When looking at historical sequences and overlapping data, we focus on the independent data points: the first (or last year) of a sequence. If we have two historical sequences, they will differ by two (or more) points: the first year of one sequence and the last year of the other. Next, we look at the ability of prices to shift enough in those two years to cause the observed variation in Historical Surviving Withdrawal Rates. The answer turns out to be that they easily come close to doing just that. The issue of overlapping sequences turns out to be a reduction in the effective number of data points (degrees of freedom) by something less than one half.

Keep in mind that, if Historical Surviving Withdrawal Rates were all the same number, there would be a lot of variation in a plot of them versus the percentage earnings yield (100% / [P/E10] ). This scatter would be caused by price changes.

There is the loss of a degree of freedom because we calculate the slope of the line. There is a loss of a degree of freedom because we estimate variance from the data.

In this way, we have converted our nonstandard statistical problem to something very close to a standard curve-fitting problem along with a reasonable adjustment.

For the full article, read “Safe Withdrawal Rates versus Valuations.” Both of us participated in the first part.

Gummy's Tutorial (JWR Stuff)