# Common Errors in Statistics and How to Avoid Them - Good P.I

**Download**(direct link)

**:**

**27**> 28 29 30 31 32 33 .. 90 >> Next

2. The location parameters of the two distributions are the same or differ by a known quantity.

3. The two samples are equal in size.

4. The samples are large enough that asymptotic approximations to the distribution of the test statistic are valid.

As an example, the first published solution to this classic testing problem is the z test proposed by Welch [1937] based on the ratio of the two sample variances. If the observations are normally distributed, this ratio has the F distribution, and the test whose critical values are determined by the F distribution is uniformly most powerful among all unbiased tests (Lehmann, 1986, Section 5.3). But with even small deviations from normality, significance levels based on the F distribution are grossly in error (Lehmann, 1986, Section 5.4).

Box and Anderson [1955] propose a correction to the F distribution for “almost” normal data, based on an asymptotic approximation to the permutation distribution of the F ratio. Not surprisingly, their approximation is close to correct only for normally distributed data or for very large samples. The Box-Anderson statistic results in an error rate of 21%, twice the desired value of 10%, when two samples of size 15 are drawn from a gamma distribution with four degrees of freedom.

A more recent permutation test (Bailor, 1989) based on complete enumeration of the permutation distribution of the sample F ratio is exact

60 PART II HYPOTHESIS TESTING AND ESTIMATION

only when the location parameters of the two distributions are known or are known to be equal.

The test proposed by Miller [1968] yields conservative Type I errors, less than or equal to the declared error, unless the sample sizes are unequal. A 10% test with samples of size 12 and 8 taken from normal populations yielded Type I errors 14% of the time.

Fligner and Killeen [1976] propose a permutation test based on the sum of the absolute deviations from the combined sample mean. Their test may be appropriate when the medians of the two populations are equal, but can be virtually worthless otherwise, accepting the null hypothesis up to 100% of the time. In the first edition, Good [2001] proposed a test based on permutations of the absolute deviations from the individual sample medians; this test, alas, is only asymptotically exact and even then only for approximately equal sample sizes, as shown by Baker [1995].

To compute the primitive bootstrap introduced by Efron [1979], we would take successive pairs of samples—one of n observations from the sampling distribution Fn which assigns mass 1/n to the values {Ay i _ 1,

. . . , n}, and one of m observations from the sampling distribution Gm which assigns mass 1/m to the values {X/. j _ n + 1,..., n + m}, and compute the ratio of the sample variances

R _ s2n / (n -1) si/ (m -1)

We would use the resultant bootstrap distribution to test the hypothesis that the variance of F equals the variance of G against the alternative that the variance of G is larger. Under this test, we reject the null hypothesis if the 100(1 - a) percentile is less than 1.

This primitive bootstrap and the associated confidence intervals are close o exact only for very large samples with hundreds of observations. More often the true coverage probability is larger than the desired value.

Two corrections yield vastly improved results. First, for unequal-sized samples, Efron [1982] suggests that more accurate confidence intervals can be obtained using the test statistic

slln

R 2 / s m / m

Second, applying the bias and acceleration corrections described in Chapter 3 to the bootstrap distribution of R' yields almost exact intervals.

CHAPTER 5 TESTING HYPOTHESES: CHOOSING A TEST STATISTIC 61

Lest we keep you in suspense, a distribution-free exact and more powerful test for comparing variances can be derived based on the permutation distribution of Aly’s statistice.

This statistic proposed by Aly [1990] is

where X(!) < X(2) < ... < X(m) are the order statistics of the first sample.

Suppose we have two sets of measurements, 121, 123, 126, 128.5, 129 and in a second sample, 153, 154, 155, 156, 158. We replace these with the deviations zu = X^ +!) - X(2) or 2, 3, 2.5, .5 for the first sample and z2i = 1, 1, 1, 2 for the second.

The original value of the test statistic is 8 + 18 + 15 + 2 = 43. Under the hypothesis of equal dispersions in the two populations, we can exchange labels between zu and z2i for any or all of the values of i. One possible rearrangement of the labels on the deviations puts {2, 1, 1, 2} in the first sample, which yields a value of 8 + 6 + 6 + 8 = 28.

There are 24 = 16 rearrangements of the labels in all, of which only one {2, 3, 2.5, 2} yields a larger value of Aly’s statistic than the original observations. A one-sided test would have two out of 16 rearrangements as or more extreme than the original, and a two-sided test would have four. In either case, we would accept the null hypothesis, though the wiser course would be to defer judgment until we have taken more observations.

**27**> 28 29 30 31 32 33 .. 90 >> Next