# Common Errors in Statistics and How to Avoid Them - Good P.I

**Download**(direct link)

**:**

**24**> 25 26 27 28 29 30 .. 90 >> Next

Unequal Variances

If the variances of the two populations are not the same, neither the t test nor the permutation test will yield exact significance levels despite pronouncements to the contrary of numerous experts regarding the permutation tests.

More important than comparing the means of populations can be determining why the variances are different.

There are numerous possible solutions for the Behrens-Fisher problem of unequal variances in the treatment groups. These include the following:

3 Interested readers may want to verify this for themselves by writing out all the possible addignments of six items into two groups of three, 1 2 3 / 4 5 6, 1 2 4 / 3 5 6, and so forth.

54 PART II HYPOTHESIS TESTING AND ESTIMATION

Wilcoxon test; the use of the ranks in the combined sample reduces the impact (though not the entire effect) of the difference in variability between the two samples.

Generalized Wilcoxon test (see OBrien [1988]).

Procedure described in Manly and Francis [1999].

Procedure described in Chapter 7 of Weerahandi [1995].

Procedure described in Chapter 10 of Pesarin [2001].

Bootstrap. See the section on dependent observations in what follows.

Permutation test. Phillip Good conducted simulations for sample sizes between 6 and 12 drawn from normally distributed populations. The populations in these simulations had variances that differed by up to a factor of five, and nominal p values of 5% were accurate to within 1.5%.

Hilton [1996] compared the power of the Wilcoxon test, OBriens test, and the Smirnov test in the presence of both location shift and scale (variance) alternatives. As the relative influence of the difference in variances grows, the OBrien test is most powerful. The Wilcoxon test loses power in the face of different variances. If the variance ratio is 4:1, the Wilcoxon test is not trustworthy.

One point is unequivocal. William Anderson writes, The first issue is to understand why the variances are so different, and what does this mean to the patient. It may well be the case that a new treatment is not appropriate because of higher variance, even if the difference in means is favorable. This issue is important whether or not the difference was anticipated.

Even if the regulatory agency does not raise the issue, I want to do so internally.

David Salsburg agrees. If patients have been assigned at random to the various treatment groups, the existence of a significant difference in any parameter of the distribution suggests that there is a difference in treatment effect. The problem is not how to compare the means but how to determine what aspect of this difference is relevant to the purpose of the study.

Since the variances are significantly different, I can think of two situations where this might occur:

1. In many measurements there are minimum and maximum values that are possible, e.g. the Hamilton Depression Scale, or the number of painful joints in arthritis. If one of the treatments is very effective, it will tend to push values into one of the extremes. This will produce a change in distribution from a relatively symmetric one to a skewed one, with a corresponding change in variance.

2. The experimental subjects may represent a mixture of populations. The difference in variance may occur because the

CHAPTER 5 TESTING HYPOTHESES: CHOOSING A TEST STATISTIC 55

effective treatment is effective for only a subset of the population. A locally most powerful test is given in Conover and Salsburg [1988].

Dependent Observations

The preceding statistical methods are not applicable if the observations are interdependent. There are five cases in which, with some effort, analysis may still be possible: repeated measures, clusters, known or equal pairwise dependence, a moving average or autoregressive process,4 and group randomized trials.

Repeated Measures. Repeated measures on a single subject can be dealt with in a variety of ways including treating them as a single multivariate observation. Good [2001, Section 5.6] and Pesarin [2001, Chapter 11] review a variety of permutation tests for use when there are repeated measures.

Another alternative is to use one of the standard modeling approaches such as random- or mixed-effects models or generalized estimating equations (GEEs). See Chapter 10 for a full discussion.

Clusters. Occasionally, data will have been gathered in clusters from families and other groups who share common values, work, or leisure habits.

If stratification is not appropriate, treat each cluster as if it were a single observation, replacing individual values with a summary statistic such as an arithmetic average (Mosteller and Tukey, 1977).

Cluster-by-cluster means are unlikely to be identically distributed, having variances, for example, that will depend on the number of individuals that make up the cluster. A permutation test based on these means would not be exact.

If there are a sufficiently large number of such clusters in each treatment group, the bootstrap defined in Chapter 3 is the appropriate method of analysis.

**24**> 25 26 27 28 29 30 .. 90 >> Next