Download (direct link):
“To them, I said, the truth would be literally nothing but the shadows of the images"
The Allegory of the Cave (Plato, The Republic, Book VII).
Never assign probabilities to the true state of nature, but only to the validity of your own predictions.
A p value does not tell us the probability that a hypothesis is true, nor does a significance level apply to any specific sample; the latter is a characteristic of our testing in the long run. Likewise, if all assumptions are satisfied, a confidence interval will in the long run contain the true value of the parameter a certain percentage off the time. But we cannot say with certainty in any specific case that the parameter does or does not belong to that interval (Neyman, 1961, 1977).
When we determine a p value, we apply a set of algebraic methods and deductive logic to deduce the correct value. The deductive process is used
10 This reference may be hard to obtain. Alternatively, see Mangel and Samaniego .
CHAPTER 5 TESTING HYPOTHESES: CHOOSING A TEST STATISTIC 73
to determine the appropriate size of resistor to use in an electric circuit, to determine the date of the next eclipse of the moon, and to establish the identity of the criminal (perhaps from the fact the dog did not bark on the night of the crime). Find the formula, plug in the values, turn the crank, and out pops the result (or it does for Sherlock Holmes,11 at least).
When we assert that for a given population that a percentage of samples will have a specific composition, this is a deduction also. But when we make an inductive generalization about a population based upon our analysis of a sample, we are on shakier ground. Newton’s Law of gravitation provided an exact fit to observed astronomical data for several centuries; consequently, there was general agreement that Newton’s generalization from observation was an accurate description of the real world. Later, as improvements in astronomical measuring instruments extended the range of the observable universe, scientists realized that Newton’s Law was only a generalization and not a property of the universe at all. Einstein’s Theory of Relativity gives a much closer fit to the data, a fit that has not been contradicted by any observations in the century since its formulation. But this still does not mean that relativity provides us with a complete, correct, and comprehensive view of the universe.
In our research efforts, the only statements we can make with God-like certainty are of the form “our conclusions fit the data.” The true nature of the real world is unknowable. We can speculate, but never conclude.
The gap between the sample and the population will always require a leap of faith. We understand only in so far as we are capable of understanding [Lonergan, 1992].
Know your objectives in testing. Know your data’s origins. Know the assumptions you feel comfortable with. Never assign probabilities to the true state of nature, but only to the validity of your own predictions. Collecting more and better data may be your best alternative.
TO LEARN MORE
For commentary on the use of wrong or inappropriate statistical methods, see Avram et al. , Badrick and Flatman , Berger et al.
, Bland and Altman , Cherry , Dar, Serlin, and Omer , Elwood , Felson, Cupples, and Meenan , Fienberg , Gore, Jones, and Rytter , Lieberson , MacArthur
11 See “Silver Blaze” by A. Conan-Doyle, Strand Magazine, December 1892.
74 PART II HYPOTHESIS TESTING AND ESTIMATION
and Jackson , McGuigan , McKinney et al. , Miller
, Padaki , Welch and Gabbe , Westgard and Hunt , White , and Yoccuz .
Guidelines for reviewers are provided by Altman [1998a], Bacchetti , Finney , Gardner, Machin, and Campbell , George , Goodman, Altman, and George , International Committee of Medical Journal Editors , Light and Pillemer , Mulrow
, Murray , Schor and Karten , and Vaisrub . For additional comments on the effects of the violation of assumptions,
see Box and Anderson , Friedman , Gastwirth and Rubin , Glass, Peckham, and Sanders , and Pettitt and Siskind .
For the details of testing for equivalence, see Dixon . For a review of the appropriate corrections for multiple tests, see Tukey .
For procedures with which to analyze factorial and other multifactor experimental designs, see Chapter 8 of Pesarin .
Most of the problems with parametric tests reported here extend to and are compounded by multivariate analysis. For some solutions, see Chapter
5 of Good  and Chapter 6 of Pesarin .
For a contrary view on the need for adjustments of p values in multiple comparisons, see Rothman [1990a].
Venn  and Reichenbach  are among those who’ve attempted to construct a mathematical bridge between what we observe and the reality that underlies our observations. To the contrary, extrapolation from the sample to the population is not a matter of applying Holmes-like deductive logic but entails a leap of faith. A careful reading of Locke , Berkeley , Hume , and Lonergan  is an essential prerequisite to the application of statistics.