Download (direct link):
For more on the contemporary view of induction, see Berger  and Sterne, Smith, and Cox . The former notes that, “Dramatic illustration of the non-frequentist nature of p-values can be seen from the applet available at http://www.stat.duke.edu/~berger. The applet assumes one faces a series of situations involving normal data with unknown mean
6 and known variance, and tests of the form H: 6 = 0 versus K: 6 n 0.
The applet simulates a long series of such tests, and records how often H is true for p-values in given ranges.”
CHAPTER 5 TESTING HYPOTHESES: CHOOSING A TEST STATISTIC 75
Strengths and Limitations of Some Miscellaneous Statistical Procedures
The GREATEST ERROR ASSOCIATED WITH THE USE OF statistical procedures is to make the assumption that one single statistical methodology can suffice for all applications.
From time to time, a new statistical procedure will be introduced or an old one revived along with the assertion that at last the definitive solution has been found. As is so often the case with religions, at first the new methodology is reviled, even persecuted, until it grows in the number of its adherents, at which time it can begin to attack and persecute the adherents of other, more established dogma in its turn.
During the preparation of this text, an editor of a statistics journal rejected an article of one of the authors on the sole grounds that it made use of permutation methods.
“I’m amazed that anybody is still doing permutation tests . . .” wrote the anonymous reviewer, “There is probably nothing wrong technically with the paper, but I personally would reject it on grounds of irrelevance to current best statistical practice.” To which the editor sought fit to add, “The reviewer is interested in estimation of interaction or main effects in the more general semiparametric models currently studied in the literature. It is well known that permutation tests preserve the significance level but that is all they do is answer yes or no.”1
But one methodology can never be better than another, nor can estimation replace hypothesis testing or vice versa. Every methodology has a proper domain of application and another set of applications for which it
1 A double untruth. First, permutation tests also yield interval estimates; see, for example, Garthwaite . Second, semiparametric methods are not appropriate for use with small-sample experimental designs, the topic of the submission.
CHAPTER 6 LIMITATIONS OF SOME MISCELLANEOUS STATISTICAL PROCEDURES 77
fails. Every methodology has its drawbacks and its advantages, its assumptions, and its sources of error. Let us seek the best from each statistical procedure.
The balance of this chapter is devoted to exposing the frailties of four of the “new” (and revived) techniques: bootstrap, Bayesian methods, metaanalysis, and permutation tests.
Many of the procedures discussed in this chapter fall victim to the erroneous perception that one can get more out of a sample or series of samples than one actually puts in. One bootstrap expert learned he was being considered for a position because management felt, “your knowledge of the bootstrap will help us to reduce the cost of sampling.”
Michael Chernick, author of Bootstrap Methods: A Practitioner’s Guide, Wiley, 1999, has documented six myths concerning the bootstrap:
1. Allows you to reduce your sample size requirements by replacing real data with simulated data—Not.
2. Allows you to stop thinking about your problem, the statistical design and probability model—Not.
3. No assumptions necessary—Not.
4. Can be applied to any problem—Not.
5. Only works asymptotically—Necessary sample size depends on the context.
6. Yields exact significance levels—Never.
Of course, the bootstrap does have many practical applications, as witnessed by its appearance in six of the chapters in this book.2
As always, to use the bootstrap or any other statistical methodology effectively, one has to be aware of its limitations. The bootstrap is of value in any situation in which the sample can serve as a surrogate for the population.
If the sample is not representative of the population because the sample is small or biased, not selected at random, or its constituents are not independent of one another, then the bootstrap will fail.
Canty et al.  also list data outliers, inconsistency of the bootstrap method, incorrect resampling model, wrong or inappropriate choice of statistic, nonpivotal test statistics, nonlinearity of the test statistic, and discreteness of the resample statistic as potential sources of error.
2 If you’re counting, we meet the bootstrap again in Chapters 10 and 11.
78 PART II HYPOTHESIS TESTING AND ESTIMATION
One of the first proposed uses of the bootstrap, illustrated in Chapter 4, was in providing an interval estimate for the sample median. Because the median or 50th percentile is in the center of the sample, virtually every element of the sample contributes to its determination. As we move out into the tails of a distribution, to determine the 20th percentile or the 90th, fewer and fewer elements of the sample are of assistance in making the estimate.