# Common Errors in Statistics and How to Avoid Them - Good P.I

**Download**(direct link)

**:**

**10**> 11 12 13 14 15 16 .. 90 >> Next

CHAPTER 2 HYPOTHESES: THE WHY OF YOUR RESEARCH 17

TABLE 2.1 Survial Rates of Men and Women8

Survived Died Total

Men 10 0 10

Women 3 11 14

Total 13 11 24

Survived Died Total

Men 8 2 10

Women 5 9 14

Total 13 11 24

8 In terms of the Relative Survival Rates of the Two Sexes, the first of these tables is more extreme than our original table. The second is less extreme.

and women profit the same from treatment if we had observed a table of the following form?

Survived Died Total

Men 0 10 10

Women 13 1 14

Total 13 11 24

Of course, we would! In determining the significance level in the present example, we must add together the total number of tables that lie in either of the two extremes or tails of the permutation distribution.

The critical values and significance levels are quite different for onetailed and two-tailed tests; all too often, the wrong test has been employed in published work. McKinney et al. [1989] reviewed some 70 plus articles that appeared in six medical journals. In over half of these articles, Fisher’s exact test was applied improperly. Either a one-tailed test had been used when a two-tailed test was called for or the authors of the paper simply hadn’t bothered to state which test they had used.

Of course, unless you are submitting the results of your analysis to a regulatory agency, no one will know whether you originally intended a one-tailed test or a two-tailed test and subsequently changed your mind. No one will know whether your hypothesis was conceived before you started or only after you’d examined the data. All you have to do is lie. Just recognize that if you test an after-the-fact hypothesis without identifying it as such, you are guilty of scientific fraud.

When you design an experiment, decide at the same time whether you wish to test your hypothesis against a two-sided or a one-sided alternative.

18 PART I FOUNDATIONS

A two-sided alternative dictates a two-tailed test; a one-sided alternative dictates a one-tailed test.

As an example, suppose we decide to do a follow-on study of the cancer registry to confirm our original finding that men diagnosed as having tumors live significantly longer than women similarly diagnosed. In this follow-on study, we have a one-sided alternative. Thus, we would analyze the results using a one-tailed test rather than the two-tailed test we applied in the original study.

Determine beforehand whether your alternative hypotheses are ordered or unordered.

Ordered or Unordered Alternative Hypotheses?

When testing qualities (number of germinating plants, crop weight, etc.) from k samples of plants taken from soils of different composition, it is often routine to use the F ratio of the analysis of variance. For contingency tables, many routinely use the chi-square test to determine if the differences among samples are significant. But the F-ratio and the chi-square are what are termed omnibus tests, designed to be sensitive to all possible alternatives. As such, they are not particularly sensitive to ordered alternatives such “as more fertilizer more growth” or “more aspirin faster relief of headache.” Tests for such ordered responses at k distinct treatment levels should properly use the Pitman correlation described by Frank, Trzos, and Good [1978] when the data are measured on a metric scale (e.g., weight of the crop). Tests for ordered responses in 2 x C contingency tables (e.g., number of germinating plants) should use the trend test described by Berger, Permutt, and Ivanova [1998]. We revisit this topic in more detail in the next chapter.

DEDUCTION AND INDUCTION

When we determine a p value as we did in the example above, we apply a set of algebraic methods and deductive logic to deduce the correct value. The deductive process is used to determine the appropriate size of resistor to use in an electric circuit, to determine the date of the next eclipse of the moon, and to establish the identity of the criminal (perhaps from the fact the dog did not bark on the night of the crime). Find the formula, plug in the values, turn the crank, and out pops the result (or it does for Sherlock Holmes,4 at least).

When we assert that for a given population a percentage of samples will have a specific composition, this also is a deduction. But when we make an

4 See “Silver Blaze” by A. Conan-Doyle, Strand Magazine, December 1892.

CHAPTER 2 HYPOTHESES: THE WHY OF YOUR RESEARCH 19

inductive generalization about a population based upon our analysis of a sample, we are on shakier ground. It is one thing to assert that if an observation comes from a normal distribution with mean zero, the probability is one-half that it is positive. It is quite another if, on observing that half the observations in the sample are positive, we assert that half of all the possible observations that might be drawn from that population will be positive also.

Newton’s Law of gravitation provided an almost exact fit (apart from measurement error) to observed astronomical data for several centuries; consequently, there was general agreement that Newton’s generalization from observation was an accurate description of the real world. Later, as improvements in astronomical measuring instruments extended the range of the observable universe, scientists realized that Newton’s Law was only a generalization and not a property of the universe at all. Einstein’s Theory of Relativity gives a much closer fit to the data, a fit that has not been contradicted by any observations in the century since its formulation. But this still does not mean that relativity provides us with a complete, correct, and comprehensive view of the universe.

**10**> 11 12 13 14 15 16 .. 90 >> Next