# Common Errors in Statistics and How to Avoid Them - Good P.I

**Download**(direct link)

**:**

**20**> 21 22 23 24 25 26 .. 90 >> Next

A common and lamentable fallacy is that the maximum likelihood estimator has many desirable properties—that it is unbiased and minimizes the mean-squared error. But this is true only for the maximum likelihood estimator of the mean of a normal distribution.2

Statistics instructors would be well advised to avoid introducing maximum likelihood estimation and to focus instead on methods for obtaining minimum loss estimators for a wide variety of loss functions.

INTERVAL ESTIMATES

Point estimates are seldom satisfactory in and of themselves. First, if the observations are continuous, the probability is zero that a point estimate will be correct and equal the estimated parameter. Second, we still require some estimate of the precision of the point estimate.

In this section, we consider one form of interval estimate derived from bootstrap measures of precision. A second form, derived from tests of hypotheses, will be considered in the next chapter.

Nonparametric Bootstrap

The bootstrap can help us obtain an interval estimate for any aspect of a distribution—a median, a variance, a percentile, or a correlation coefficient—if the observations are independent and all come from distributions

2 It is also true in some cases for very large samples. How large the sample must be in each case will depend both upon the parameter being estimated and upon the distribution from which the observations are drawn.

CHAPTER 4 ESTIMATION 45

with the same value of the parameter to be estimated. This interval provides us with an estimate of the precision of the corresponding point estimate.

We resample with replacement repeatedly from the original sample,

1000 times or so, computing the sample statistic for each bootstrap sample.

For example, here are the heights of a group of 22 adolescents, measured in centimeters and ordered from shortest to tallest.

137.0 138.5 140.0 141.0 142.0 143.5 145.0 147.0 148.5 150.0 153.0 154.0

155.0 156.5 157.0 158.0 158.5 159.0 160.5 161.0 162.0 167.5

The median height lies somewhere between 153 and 154 cm. If we want to extend this result to the population, we need an estimate of the precision of this average.

Our first bootstrap sample, arranged in increasing order of magnitude for ease in reading, might look like this:

138.5 138.5 140.0 141.0 141.0 143.5 145.0 147.0 148.5 150.0 153.0 154.0

155.0 156.5 157.0 158.5 159.0 159.0 159.0 160.5 161.0 162.0

Several of the values have been repeated; this is not surprising because we are sampling with replacement, treating the original sample as a stand-in for the much larger population from which the original sample was drawn. The minimum of this bootstrap sample is 138.5, higher than that of the original sample; the maximum at 162.0 is less than the original, while the median remains unchanged at 153.5.

137.0 138.5 138.5 141.0 141.0 142.0 143.5 145.0 145.0 147.0 148.5 148.5

150.0 150.0 153.0 155.0 158.0 158.5 160.5 160.5 161.0 167.5

In this second bootstrap sample, again we find repeated values; this time the minimum, maximum, and median are 137.0, 167.5, and 148.5, respectively.

The medians of 50 bootstrapped samples drawn from our sample ranged between 142.25 and 158.25, with a median of 152.75 (see Figure 4.1). These numbers provide an insight into what might have been had we sampled repeatedly from the original population.

I I I II I I I I I I I I I I I I II I II I

142.25 Medians of bootstrap samples 15B.25

FIGURE 4.1 Scatterplot of 50 Bootstrap Medians Derived from a Sample of Heights.

46 PART II HYPOTHESIS TESTING AND ESTIMATION

We can improve on the interval estimate {142.25, 158.25} if we are willing to accept a small probability that the interval will fail to include the true value of the population median. We will take several hundred bootstrap samples instead of a mere 50, and we will use the 5th and 95th percentiles of the resulting bootstrap distribution to establish the boundaries of a 90% confidence interval.

This method might be used equally well to obtain an interval estimate for any other population attribute: the mean and variance, the 5th percentile or the 25th, and the interquartile range. When several observations are made simultaneously on each subject, the bootstrap can be used to estimate covariances and correlations among the variables. The bootstrap is particularly valuable when trying to obtain an interval estimate for a ratio or for the mean and variance of a nonsymmetric distribution.

Unfortunately, such intervals have two deficiencies:

1. They are biased; that is, they are more likely to contain certain false values of the parameter being estimated than the true one (Efron, 1987).

2. They are wider and less efficient than they could be (Efron,

1987).

Two methods have been proposed to correct these deficiencies; let us consider each in turn.

The first is the Hall-Wilson [Hall and Wilson, 1991] corrections in which the bootstrap estimate is Studentized. For the one-sample case, we want an interval estimate based on the distribution of (9b - 9)/sb, where 9 and 9b are the estimates of the unknown parameter based on the original and bootstrap sample, respectively, and sb denotes the standard deviation of the bootstrap sample. An estimate s of the population variance is required to transform the resultant interval into one about 9 (see Carpenter and Bithell [2000]).

**20**> 21 22 23 24 25 26 .. 90 >> Next