# Common Errors in Statistics and How to Avoid Them - Good P.I

**Download**(direct link)

**:**

**32**> 33 34 35 36 37 38 .. 90 >> Next

To analyze a block design (for example, where we have sampled separately from whites, blacks, and Hispanics), the permutation test statistic is S = ZB=1Zjxbj, where xj is the jth observation in the control sample in the bth block, and the rearranging of labels between control and treated samples takes place separately and independently within each of the B blocks (Good, 2001, p. 124).

Blocking can also be used after the fact if you suspect the existence of confounding variables and if you measured the values of these variables as you were gathering data.9

Always be sure your choice of statistic is optimal against the alternative hypotheses of interest for the appropriate loss function.

To avoid using an inferior less sensitive and possibly inaccurate statistical procedure, pay heed to another admonition from George Dyke [1997]: The availability of user-friendly statistical software has caused authors to become increasingly careless about the logic of interpreting their results,

9 This recommendation applies only to a test of efficacy for all groups (blocks) combined. p values for subgroup analyses performed after the fact are still suspect; see Chapter 1.

CHAPTER 5 TESTING HYPOTHESES: CHOOSING A TEST STATISTIC 71

and to rely uncritically on computer output, often using the default option when something a little different (usually, but not always, a little more complicated) is correct, or at least more appropriate.

MULTIPLE TESTS

When we perform multiple tests in a study, there may not be journal room (nor interest) to report all the results, but we do need to report the total number of statistical tests performed so that readers can draw their own conclusions as to the significance of the results that are reported.

We may also wish to correct the reported significance levels by using one of the standard correction methods for independent tests (e.g., Bonferroni; for resampling methods, see Westfall and Young, 1993).

Several statistical packagesSAS is a particular offenderprint out the results of several dependent tests performed on the same set of datafor example, the t test and the Wilcoxon. We are not free to pick and choose. Before we view the printout, we must decide which test we will employ.

Let Wa denote the event that the Wilcoxon test rejects a hypothesis at the a significance level. Let Pa denote the event that a permutation test based on the original observations and applied to the same set of data rejects a hypothesis at the a significance level. Let Ta denote the event that a t test applied to the same set of data rejects a hypothesis at the a significance level.

It is possible that Wa may be true when Pa and Ta are not, and so forth. As Pr {Wa or Pa or Ta \H} < Pr {Wa \H = a, we will have inflated the Type I error by picking and choosing after the fact which test to report. Vice versa, if our intent was to conceal a side effect by reporting that the results were not significant, we will inflate the Type II error and deflate the power b of our test, by an after-the-fact choice as b = Pr {not (Wa and Pa and Ta)\K} < Pr{Wa|K}.

To repeat, we are not free to pick and choose among tests; any such conduct is unethical. Both the comparison and the test statistic must be specified in advance of examining the data.

BEFORE YOU DRAW CONCLUSIONS

Before you draw conclusions, be sure you have accounted for all missing data, interviewed nonresponders, and determined whether the data were missing at random or were specific to one or more subgroups.

During the Second World War, a group was studying planes returning from bombing Germany. They drew a rough diagram showing where the bullet holes were and recommended those areas be reinforced. A statisti-

72 PART II HYPOTHESIS TESTING AND ESTIMATION

cian, Abraham Wald [1980],10 pointed out that essential data were missing from the sample they were studying. What about the planes that didnt return from Germany?

When we think along these lines, we see that the two areas of the plane that had almost no bullet holes (where the wings and where the tail joined the fuselage) are crucial. Bullet holes in a plane are likely to be at random, occurring over the entire plane. Their absence in those two areas in returning bombers was diagnostic. Do the data missing from your experiments and surveys also have a story to tell?

Induction

Behold! human beings living in an underground den, which has a mouth open towards the light and reaching all along the den; here they have been from their childhood, and have their legs and necks chained so that they cannot move, and can only see before them, being prevented by the chains from turning round their heads. Above and behind them a fire is blazing at a distance, and between the fire and the prisoners there is a raised way; and you will see, if you look, a low wall built along the way, like the screen which marionette players have in front of them, over which they show the puppets"

And they see only their own shadows, or the shadows of one another, which the fire throws on the opposite wall of the cave."

**32**> 33 34 35 36 37 38 .. 90 >> Next