Download (direct link):
Let us now explore the implications of these assumptions in a variety of practical testing situations including comparing the means of two populations, comparing the variances of two populations, comparing the means of three or more populations, and testing for significance in two-factor and higher-order experimental designs.
In each instance, before we choose1 a statistic, we check which assumptions are satisfied, which procedures are most robust to violation of these assumptions, and which are most powerful for a given significance level and sample size. To find the most powerful test, we determine which procedure requires the smallest sample size for given levels of Type I and Type II error.
1 Whether Republican or Democrat, Liberal or Conservative, male or female, we have the right to choose and need not be limited by what textbook, half-remembered teacher pronouncements, or software dictate.
52 PART II HYPOTHESIS TESTING AND ESTIMATION
VERIFY THE DATA
The first step in any analysis is to verify that the data have been entered correctly. As noted in Chapter 3, GIGO. A short time ago, a junior biostatistician came into my office asking for help with covariate adjustments for race. "The data for race doesn't make sense," she said. Indeed the proportions of the various races did seem incorrect. No "adjustment" could be made. Nor was there any reason to believe that race was the only variable affected. The first and only solution was to do a thorough examination of the database and, where necessary, trace the data back to its origins until all the bad data had been replaced with good.
The SAS programmer's best analysis tool is PROC MEANS. By merely examining the maximum and minimum values of all variables, it often is possible to detect data that were entered in error. Some years ago, I found that the minimum value of one essential variable was zero. I brought this to the attention of a domain expert who told me that a zero was impossible. As it turns out, the data were full of zeros, the explanation being that the executive in charge had been faking results. Of the 150 subjects in the database, only 50 were real.
Before you begin any analysis, verify that the data have been entered correctly.
COMPARING MEANS OF TWO POPULATIONS
The most common test for comparing the means of two populations is based upon Student’s t. For Student’s t test to provide significance levels that are exact rather than approximate, all the observations must be independent and, under the null hypothesis, all the observations must come from identical normal distributions.
Even if the distribution is not normal, the significance level of the t test is almost exact for sample sizes greater than 12; for most of the distributions one encounters in practice,2 the significance level of the t test is usually within a percent or so of the correct value for sample sizes between 6 and 12.
There are more powerful tests than the t test for testing against nonnormal alternatives. For example, a permutation test replacing the original observations with their normal scores is more powerful than the t test (Lehmann and D’Abrera, 1988).
Permutation tests are derived by looking at the distribution of values the test statistic would take for each of the possible assignments of treatments to subjects. For example, if in an experiment two treatments were
2 Here and throughout this text, we deliberately ignore the many exceptional cases (to the delight of the true mathematician) that one is unlikely to encounter in the real world.
CHAPTER 5 TESTING HYPOTHESES: CHOOSING A TEST STATISTIC 53
assigned at random to six subjects so that three subjects got one treatment and three the other, there would have been a total of 20 possible assignments of treatments to subjects.3 To determine a p value, we compute for the data in hand each of the 20 possible values the test statistic might have taken. We then compare the actual value of the test statistic with these 20 values. If our test statistic corresponds to the most extreme value, we say that p = 1/20 = 0.05 (or 1/10 = 0.10 if this is a two-tailed permutation test).
Against specific normal alternatives, this two-sample permutation test provides a most powerful unbiased test of the distribution-free hypothesis that the centers of the two distributions are the same (Lehmann, 1986, p. 239). For large samples, its power against normal alternatives is almost the same as Student’s t test (Albers, Bickel, and van Zwet, 1976). Against other distributions, by appropriate choice of the test statistic, its power can be superior (Lambert, 1985; and Maritz, 1996).
When the logic of a situation calls for demonstration of similarity rather than differences among responses to various treatments, then equivalence tests are often more relevant than tests with traditional no-effect null hypotheses (Anderson and Hauck, 1986; Dixon, 1998; pp. 257-301).
Two distributions F and G such that G[x] = F[x - d] are said to be equivalent provided that |d| < A, where A is the smallest difference of clinical significance. To test for equivalence, we obtain a confidence interval for d, rejecting equivalence only if this interval contains valuse in excess of A. The width of a confidence interval decreases as the sample size increases; thus a very large sample may be required to demonstrate equivalence just as a very large sample may be required to demonstrate a clinically significant effect.