Download (direct link):
For testing H2: pj = 0 for all j, choose one of the following test statistics as we did in the section on one-way analysis, F22 = SiSkXijk)), F21 =
S,SkXijk\, or R2 = Sig[j]SSkXijk, where g[j] is a monotone function of j, and determine the distribution of its values with respect to the rearrangements in PC.
Tests for the parameters of three-way and higher-order experimental designs can be obtained via the same approach; use a multidimensional lattice and such additional multivalued properties of the balls as charm and spin. Proofs may be seen at http://users.oco.net/drphilgood/resamp.htm.
Unbalanced designs with unequal numbers per cell may result from unanticipated losses during the conduct of an experiment or survey (or from an extremely poor initial design). There are two approaches to their analysis:
CHAPTER 5 TESTING HYPOTHESES: CHOOSING A TEST STATISTIC 69
Permutation tests can be applied to unbalanced as well as balanced experimental designs, providing only that are sufficient observations in each cell to avoid confounding of the main effects and interactions. Even in this latter case, exact permutation tests are available; see Pesarin [2001, p. 237], observations, recognizing that the results may be somewhat tainted.
Second, we might bootstrap along one of the following lines:
• If only one or two observations are missing, create a balanced design by discarding observations at random; repeat to obtain a distribution of p values (Baker, 1995).
• If there are actual holes in the design, so that there are missing combinations, create a test statistic that does not require the missing data. Obtain its distribution by bootstrap means. See Good [2000, pp. 68-70] for an example.
A major source of error in the analysis of contingency tables is to associate the Pearson chi-square statistic, a quite useful measure of the difference between observed and expected values, with the chi-square distribution. The latter is the distribution of Z2, where Z has the normal distribution.
Just as the means of very large samples have almost normal distributions, so the means of very large numbers of squared values tend to almost chi-square distributions. Pearson’s chi-square statistic is no exception to the rule. If the probabilities of an observation falling in a particular cell of a contingency table are roughly the same for all rows and columns, then convergence the chi-square distribution can be quite rapid. But for sparse tables, the chi-square distribution can be quite misleading (Delucchi, 1983).
We recommend using an exact permutation procedure, particularly now that software for a variety of testing situations is commercially and freely available.8 As in Fisher , we determine the proportion of tables with the same marginals that are as extreme as, or more extreme than, our original table.
The problem lies in defining what is meant by “extreme.” The errors lie in failing to report how we arrived at our definition.
For example, in obtaining a two-tailed test for independence in a 2 x 2 contingency table, we can treat each table strictly in accordance with its probability under the multinomial distribution (Fisher’s method) or weight each table by the value of the Pearson chi-square statistic for that table. The situation is even more complicated with general R x C tables where a dozen different statistics compete for our attention.
8 Examples include StatXact® from http://www.cytel.com, RT from www.west-inc.com, NPC Test from http:///www.methodologica.it., and R (freeware) from http://www.r-project.org.
70 PART II HYPOTHESIS TESTING AND ESTIMATION
The chief errors in practice lie in failing to report all of the following:
• Whether we used a one-tailed or two-tailed test and why.
• Whether the categories are ordered or unordered.
• Which statistic was employed and why.
Chapter 9 contains a discussion of a final, not inconsiderable source of error, the neglect of confounding variables that may be responsible for creating an illusory association or concealing an association that actually exists.
Violation of assumptions can affect not only the significance level of a test but the power of the test, as well; see Tukey and McLaughlin  and Box and Tiao . For example, while the significance level of the t test is robust to departures from normality, the power of the t test is not. Thus, the two-sample permutation test may always be preferable.
If blocking including matched pairs was used in the original design, then the same division into blocks should be employed in the analysis. Confounding factors such as sex, race, and diabetic condition can easily mask the effect we hoped to measure through the comparison of two samples. Similarly, an overall risk factor can be totally misleading (Gigerenzer,
2002). Blocking reduces the differences between subjects so that differences between treatment groups stand out—that is , if the appropriate analysis is used. Thus, paired data should always be analyzed with the paired t test or its permutation equivalent, not with the group t test.