# Common Errors in Statistics and How to Avoid Them - Good P.I

**Download**(direct link)

**:**

**28**> 29 30 31 32 33 34 .. 90 >> Next

If our second sample is larger than the first, we have to resample in two stages. First, we select a subset of m values at random without replacement from the n observations in the second, larger sample and compute the order statistics and their differences. Last, we examine all possible values of Alys measure of dispersion for permutations of the combined sample as we did when the two samples were equal in size and compare Alys measure for the original observations with this distribution. We repeat this procedure several times to check for consistency.

m-1

d = ? i(m - i)(X(;+i) - Xw)

i=1

?=^i Xi. - X..)2/(i -1)

has at least three major limitations:

62 PART II HYPOTHESIS TESTING AND ESTIMATION

MATCH SIGNIFICANCE LEVELS BEFORE PERFORMING POWER COMPARISONS

When we studied the small-sample properties of parametric tests based on asymptotic approximations that had performed well in previously published power comparisons, we uncovered another major error in statistics: the failure to match significance levels before performing power comparisons. Asymptotic approximations to cutoff value were used rather than exact values or near estimates.

When a statistical test takes the form of an interval, that is, if we reject when S < c and accept otherwise, then power is a nondecreasing function of significance level; a test based on an interval may have greater power at the 10% significance level than a second different test evaluated at the 5% significance level, even though the second test is uniformly more powerful than the first. To see this, let H denote the primary hypothesis and let K denote an alternative hypothesis:

If Pr{S < c|H} = a < d = Pr{S < c'|H), then c < C, and b = Pr{S < c|K} <

Pr{S < c)K} = b.

Consider a second statistical test depending on S via the monotone increasing function h, where we reject if h[S] < d and accept otherwise. If the cutoff values d < d' correspond to the same significance levels a < d, then b < Pr{h[S] < d|K} < b. Even though the second test is more powerful than the first at level a, this will not be apparent if we substitute an approximate cutoff point c' for an exact one c when comparing the two tests.

To ensure matched significance levels in your own power comparisons, proceed in two stages: First, use simulations to derive exact cutoff values. Then, use these derived cutoff values in determining power. Using this approach, we were able to show that an exact permutation test based on Aly's statistic was more powerful for comparing variances than any of the numerous published inexact parametric tests.

1. Its significance level is heavily dependent on the assumption of normality.

2. The F ratio is optimal for losses that are proportional to the square of the error and is suboptimal otherwise.

3. The F ratio is an omnibus statistic offering all-round power against many alternatives but no particular advantage against any specific one of them.

"Normality is a myth; there never has, and never will be a normal distribution."

Geary[1947, p. 241]

A permutation test is preferred for the k-sample analysis. These tests are distribution-free (though the variances must be the same for all treatments). And you can choose the test statistic that is optimal for a given

CHAPTER 5 TESTING HYPOTHESES: CHOOSING A TEST STATISTIC 63

alternative and loss function and not be limited by the availability of tables.

We take as our model Xj = m + a + %, where we select m so that the treatment effects ai sum to zero; i = 1,. . ., I denotes the treatment, and j = 1,. . ., ni. We assume that the error terms {%} are independent and identically distributed.

We consider two loss functions: one in which the losses associated with overlooking a real treatment effect, a Type II error, are proportional to the sum of the squares of the treatment effects a2 (LS), the other in which the losses are proportional to the sum of the absolute values of the treatment effects, |ai| (LAD).

Our hypothesis, a null hypothesis, is that the differential treatment effects, the {ai}, are all zero. We will also consider two alternative hypotheses: KU that at least one of the differential treatment effects ai is not zero, and IKO that KU is true and there is an ordered response such that ai < a2 < ... < a7.

For testing against IIu with the LS loss function, Good [2002, p. 126] recommends the use of the statistic F2 = Si(SjXij)2 which is equivalent to the F ratio once terms that are invariant under permutations are eliminated.

For testing against KU with the LAD loss function, Good [2002, p.

126] recommends the use of the statistic F1 = Si|SjXij|.

For testing against K0, Good [2001, p. 46] recommends the use of the Pitman correlation statistic Sf[i]XjX,j, where f[i] is a monotone increasing function of i that depends upon the alternative. For example, for testing for a dose response in animals where i denotes the dose, one might use f[i] = log[i + 1].

A permutation test based on the original observations is appropriate only if one can assume that under the null hypothesis the observations are identically distributed in each of the populations from which the samples are drawn. If we cannot make this assumption, we will need to transform the observations, throwing away some of the information about them so that the distributions of the transformed observations are identical.

**28**> 29 30 31 32 33 34 .. 90 >> Next