Download (direct link):
The problem is that because of the variation inherent in the disease process, each and every one of the possible outcomes could occur regardless of which hypothesis is true. Of course, some outcomes are more likely if H is true (for example, 50 cases of pneumonia in the placebo group and
CHAPTER 2 HYPOTHESES: THE WHY OF YOUR RESEARCH 15
48 in the vaccine group), and others are more likely if the alternative hypothesis is true (for example, 38 cases of pneumonia in the placebo group and 20 in the vaccine group).
Following Neyman and Pearson, we order each of the possible outcomes in accordance with the ratio of its probability or likelihood when the alternative hypothesis is true to its probability when the principal hypothesis is true. When this likelihood ratio is large, we shall say the outcome rules in favor of the alternative hypothesis. Working downwards from the outcomes with the highest values, we continue to add outcomes to the rejection region of the test—so-called because these are the outcomes for which we would reject the primary hypothesis—until the total probability of the rejection region under the null hypothesis is equal to some predesignated significance level.
To see that we have done the best we can do, suppose we replace one of the outcomes we assigned to the rejection region with one we did not. The probability that this new outcome would occur if the primary hypothesis is true must be less than or equal to the probability that the outcome it replaced would occur if the primary hypothesis is true. Otherwise, we would exceed the significance level. Because of how we assigned outcome to the rejection region, the likelihood ratio of the new outcome is smaller than the likelihood ratio of the old outcome. Thus the probability the new outcome would occur if the alternative hypothesis is true must be less than or equal to the probability that the outcome it replaced would occur if the alternative hypothesis is true. That is, by swapping outcomes we have reduced the power of our test. By following the method of Neyman and Pearson and maximizing the likelihood ratio, we obtain the most powerful test at a given significance level.
To take advantage of Neyman and Pearson’s finding, we need to have an alternative hypothesis or alternatives firmly in mind when we set up a test. Too often in published research, such alternative hypotheses remain unspecified or, worse, are specified only after the data are in hand. We must specify our alternatives before we commence an analysis, preferably at the same time we design our study.
Are our alternatives one-sided or two-sided? Are they ordered or unordered? The form of the alternative will determine the statistical procedures we use and the significance levels we obtain.
Decide beforehand whether you wish to test against a one-sided or a twosided alternative.
One-Sided or Two-Sided
Suppose on examining the cancer registry in a hospital, we uncover the following data that we put in the form of a 2 x 2 contingency table.
16 PART I FOUNDATIONS
Survived Died Total
Men 9 1 10
Women 4 10 14
Total 13 11 24
The 9 denotes the number of males who survived, the 1 denotes the number of males who died, and so forth. The four marginal totals or marginals are 10, 14, 13, and 11. The total number of men in the study is 10, while 14 denotes the total number of women, and so forth.
The marginals in this table are fixed because, indisputably, there are 11 dead bodies among the 24 persons in the study and 14 women. Suppose that before completing the table, we lost the subject IDs so that we could no longer identify which subject belonged in which category. Imagine you are given two sets of 24 labels. The first set has 14 labels with the word “woman” and 10 labels with the word “man.” The second set of labels has 11 labels with the word “dead” and 13 labels with the word “alive.” Under the null hypothesis, you are allowed to distribute the labels to subjects independently of one another. One label from each of the two sets per subject, please.
There are a total of | | ways you could hand out the labels. \ ||
of the assignments result in tables that are as extreme as our original table
(that is, in which 90% of the men survive) and j in tables that are
more extreme (100% of the men survive). This is a very small fraction of the total, so we conclude that a difference in survival rates of the two sexes as extreme as the difference we observed in our original table is very unlikely to have occurred by chance alone. We reject the hypothesis that the survival rates for the two sexes are the same and accept the alternative hypothesis that, in this instance at least, males are more likely to profit from treatment (Table 2.1).
In the preceding example, we tested the hypothesis that survival rates do not depend on sex against the alternative that men diagnosed with cancer are likely to live longer than women similarly diagnosed. We rejected the null hypothesis because only a small fraction of the possible tables were as extreme as the one we observed initially. This is an example of a one-tailed test. But is it the correct test? Is this really the alternative hypothesis we would have proposed if we had not already seen the data? Wouldn’t we have been just as likely to reject the null hypothesis that men