# Common Errors in Statistics and How to Avoid Them - Good P.I

**Download**(direct link)

**:**

**30**> 31 32 33 34 35 36 .. 90 >> Next

As a consequence of these properties, the variance of a sum of independent normally distributed random variables can be decomposed into the sum of a series of independent chi-square variables. We use these independent variables in the analysis of variance (ANOVA) to construct a series of independent tests of the model parameters.

Unfortunately, even slight deviations from normality negate these properties; not only are ANOVA p values in error because they are taken from the wrong distribution, but they are in error because the various tests are interdependent.

When constructing a permutation test for multifactor designs, we must also proceed with great caution for fear that the resulting tests will be interdependent.

The residuals in a two-way complete experimental design are not exchangeable even if the design is balanced as they are both correlated and functions of all the data (Lehmann and D’Abrera, 1988). To see this, suppose our model is Xi]k = m + a + fr + gy + ei]k, where Sa = Sbj = Sfgij = SjYij = 0.

Eliminating the main effects in the traditional manner, that is, setting Xijk = Xijk - Xi. - X y + X..., one obtains the test statistic

first derived by Still and White [1981]. A permutation test based on the statistic I will not be exact because even if the error terms {e^} are exchangeable, the residuals X^ = e^ - e.. - e. + e... are weakly correlated, with the correlation depending on the subscripts.

Nonetheless, the literature is filled with references to permutation tests for the two-way and higher-order designs that produce misleading values. Included in this category are those permutation tests based on the ranks of the observations that may be found in many statistics software packages.

CHAPTER 5 TESTING HYPOTHESES: CHOOSING A TEST STATISTIC 67

ooo

\\WW

FIGURE 5.1 A 2 X 3 Design with Three Observations per Cell.

ono ??o

FIGURE 5.2 A 2 X 3 Design with Three Observations per Cell after p e PR

The recent efforts of Salmaso [2003] and Pesarin [2001] have resulted in a breakthrough that extends to higher-order designs. The key lies in the concept of weak exchangeability with respect to a subset of the possible permutations. The simplified discussion of weak exchangeability presented here is abstracted from Good [2003].

Think of the set of observations {Xijk} in terms of a rectangular lattice L with K colored, shaped balls at each vertex. All the balls in the same column have the same color initially, a color which is distinct from the color of the balls in any other column. All the balls in the same row have the same shape initially, a shape which is distinct from the shape of the balls in any other row. See Fig. 5.1.

Let P denote the set of rearrangements or permutations that preserve the number of balls at each row and column of the lattice. P is a group.7

Let PR denote the set of exchanges of balls among rows and within columns which (a) preserve the number of balls at each row and column of the lattice and (b) result in the numbers of each shape within each row being the same in each column. PR is the basis of a subgroup of P. See Fig. 5.2.

Let PC denote the set of exchanges of balls among columns and within rows which (a) preserve the number of balls at each row and column of the lattice and (b) result in the numbers of each color within each column being the same in each row. PC is the basis of a subgroup of P. See Fig. 5.3.

Let PRC denote the set of exchanges of balls that preserve the number of balls at each row and column of the lattice and which result in (a) an

7 See Hungerford [1974] or http://members.tripod.com/~dogschool/ for a thorough discussion of algebraic group properties.

68 PART II HYPOTHESIS TESTING AND ESTIMATION

FIGURE 5.3 A 2 X 3 Design with Three Observations per Cell p e Pc.

exchange of balls between both rows and columns (or no exchange at all), (b) the numbers of each color within each column being the same in each row, and (c) the numbers of each shape within each row being the same in each column. PRC is the basis of a subgroup of P.

The only element these three subgroups PRC, PR,and PC have in common is the rearrangement that leaves the observations with the same row and column labels they had to begin with. As a result, tests based on these three different subsets of permutations are independent of one another.

For testing H3: gj = 0 for all i and j, determine the distribution of the values of S = Si<i<i'</1 Si<j<j-<l2(Xij + X y - X j - Xj) with respect to the rearrangements in PRC. If the value of S for the observations as they were originally labeled is not an extreme value of this permutation distribution, then we can accept the hypothesis H3 of no interactions and proceed to test for main effects.

For testing H1: a = 0 for all i, choose one of the following test statistics as we did in the section on one-way analysis, F12 = Si(SjLkxi^jk)2, F11 = S,\LjSkXijk\, or R1 = S0[i\SjLtXijk, whereg[i] is a monotone function of i, and determine the distribution of its values with respect to the rearrangements in PR.

**30**> 31 32 33 34 35 36 .. 90 >> Next