in black and white
Main menu
Home About us Share a book
Biology Business Chemistry Computers Culture Economics Fiction Games Guide History Management Mathematical Medicine Mental Fitnes Physics Psychology Scince Sport Technics

Common Errors in Statistics and How to Avoid Them - Good P.I

Good P.I Common Errors in Statistics and How to Avoid Them - Wiley publishing , 2003. - 235 p.
Download (direct link): ñommonerrorsinstatistics2003.pdf
Previous << 1 .. 23 24 25 26 27 28 < 29 > 30 31 32 33 34 35 .. 90 >> Next

For example, for testing against K0, Lehmann [1999, p. 372] recommends the use of the Jonckheere-Terpstra statistic, the number of pairs in which an observation from one group is less than an observation from a higher-dose group. The penalty we pay for using this statistic and ignoring the actual values of the observations is a marked reduction in power for small samples and is a less pronounced loss for larger ones.
If there are just two samples, the test based on the Jonckheere-Terpstra statistic is identical to the Mann-Whitney test. For very large samples, with identically distributed observations in both samples, 100 observations would be needed with this test to obtain the same power as a permutation
test based on the original values of 95 observations. This is not a price one would want to pay in human or animal experiments.
Similar caveats hold for the parametric ANOVA approach to the analysis of two-factor experimental design with two additions:
1. The sample sizes must be the same in each cell; that is, the design must be balanced.
2. A test for interaction must precede any test for main effects.
Imbalance in the design will result in the confounding of main effects with interactions. Consider the following two-factor model for crop yield:
Xijk _ m + a i + P j + g ij + e jjk
Now suppose that the observations in a two-factor experimental design are normally distributed as in the following diagram taken from Cornfield and Tukey (1956):
N(0,1)| N(2,1)
N(2,1)| N(0,1)
There are no main effects in this example—both row means and both column means have the same expectations, but there is a clear interaction represented by the two nonzero off-diagonal elements.
If the design is balanced, with equal numbers per cell, the lack of significant main effects and the presence of a significant interaction should and will be confirmed by our analysis. But suppose that the design is not in balance, that for every 10 observations in the first column, we have only one observation in the second. Because of this imbalance, when we use the F ratio or equivalent statistic to test for the main effect, we will uncover a false “row” effect that is actually due to the interaction between rows and columns. The main effect is confounded with the interaction.
If a design is unbalanced as in the preceding example, we cannot test for a “pure” main effect or a “pure” interaction. But we may be able to test for the combination of a main effect with an interaction by using the statistic that we would use to test for the main effect alone. This combined effect will not be confounded with the main effects of other unrelated factors.
Whether or not the design is balanced, the presence of an interaction may zero out a cofactor-specific main effect or make such an effect impos-
sible to detect. More important, the presence of a significant interaction may render the concept of a single “main effect” meaningless. For example, suppose we decide to test the effect of fertilizer and sunlight on plant growth. With too little sunlight, a fertilizer would be completely ineffective. Its effects only appear when sufficient sunlight is present. Aspirin and warfarin can both reduce the likelihood of repeated heart attacks when used alone; you don’t want to mix them!
Gunter Hartel offers the following example: Using five observations per cell and random normals as indicated in Cornfield and Tukey’s diagram, a two-way ANOVA without interaction yields the following results:
Source df Sum of Squares F Ratio Prob > F
Row 1 0.15590273 0.0594 0.8104
Col 1 0.10862944 0.0414 0.8412
Error 17 44.639303
Adding the interaction term yields
Source df Sum of Squares F Ratio Prob > F
Row 1 0.155903 0.1012 0.7545
Col 1 0.108629 0.0705 0.7940
Row*col 1 19.986020 12.9709 0.0024
Error 16 24.653283
Expanding the first row of the experiment to have 80 observations
rather than 10, the main effects only table becomes
Source df Sum of Squares F Ratio Prob > F
Row 1 0.080246 0.0510 0.8218
Col 1 57.028458 36.2522 <.0001
Error 88 138.43327
But with the interaction term it is:
Source df Sum of Squares F Ratio Prob > F
Row 1 0.075881 0.0627 0.8029
Col 1 0.053909 0.0445 0.8333
row*col 1 33.145790 27.3887 <.0001
Error 87 105.28747
Independent Tests
Normally distributed random variables (as in Figure 7.1) have some remarkable properties:
• The sum (or difference) of two independent normally distributed random variables is a normally distributed random variable.
• The square of a normally distributed random variable has the chi-square distribution (to within a multiplicative constant); the sum of two variables with the chi-square distribution also has a chi-square distribution (with additional degrees of freedom).
• A variable with the chi-square distribution can be decomposed into the sum of several independent chi-square variables.
Previous << 1 .. 23 24 25 26 27 28 < 29 > 30 31 32 33 34 35 .. 90 >> Next