# Common Errors in Statistics and How to Avoid Them - Good P.I

**Download**(direct link)

**:**

**46**> 47 48 49 50 51 52 .. 90 >> Next

Limitations in the measuring instrument such as censoring at either end of the scale can result in biased estimates. Current methods of estimating cloud optical depth from satellite measurements produce biased results that depend strongly on satellite viewing geometry. In this and in similar cases in the physical sciences, absent the appropriate nomograms and conversion tables, interpretation is impossible.

Over- and underreporting plague meta-analysis (discussed in Chapter 6). Positive results are reported for publication, negative findings are suppressed or ignored. Medical records are known to underemphasize conditions (such as arthritis) for which there is no immediately available treatment while overemphasizing the disease of the day. (See, for example, Callaham et al. [1998].)

Collaboration between the statistician and the domain expert is essential if all sources of bias are to be detected and corrected for, because many biases are specific to a given application area. In the measurement of price indices, for example, the three principal sources are substitution bias, quality change bias, and new product bias.10

Two distinct kinds of statistical bias effects arise with astronomical distance indicators (DIs), depending on the method used.11

Publisher's Note:

Permission to reproduce this text online was not granted by the copyright holder. Readers are kindly requested to refer to the printed version of this article.

10 Otmar Issing in a speech at the CEPR/ECB Workshop on issues in the measurement of price indices, Frankfurt am Main, 16 November 2001.

11 These next paragraphs are taken with minor changes from Willick [1999, Section 9].

CHAPTER 7 REPORTING YOUR RESULTS 103

“A second sort of bias comes into play because some galaxies are too faint or small to be in the sample; in effect, the large-distance tail of P(dlr) is cut off. It follows that the typical inferred distances are smaller than those expected at a given true distance r. As a result, the peculiar velocity model that allows true distance to be estimated as a function of redshift is tricked into returning shorter distances. This bias goes in the same sense as Malmquist bias, but is fundamentally different.” It results not from volume/density effects, but from the same sort of sample selection effects that were discussed earlier in this section.

Selection bias can be minimized by working in the “inverse direction.” Rather than trying to predict absolute magnitude (Y) given a value of the velocity width parameter (A), instead one fits a line by regressing the widths X on the magnitudes Y.

Finally, bias can result from grouping or averaging data. Bias if group randomized trials are analyzed without correcting for cluster effects was reported by Feng et al. [1996]; see Chapter 5. The use of averaged rather than end-of-period data in financial research results in biased estimates of the variance, covariance, and autocorrelation of the first- as well as higherorder changes. Such biases can be both time varying and persistent (Wilson, Jones, and Lundstrum, 2001).

REPORTING POWER

Statisticians are routinely forced to guess at the values of population parameters in order to make the power calculations needed to determine sample size. Once the data are in hand, it’s tempting to redo these same power calculations. Don’t. Post hoc calculations invariably inflate the actual power of the test (Zumbo and Hubley, 1998).

Post hoc power calculations can be of value in designing follow-up studies, but should not be used in reports.

DRAWING CONCLUSIONS

Found data (nonrandom samples) can be very useful in suggesting models and hypotheses for further exploration. But without a randomized study, formal inferential statistical analyses are not supported (Greenland, 1990; Rothman, 1990b). The concepts of significance level, power, p value, and confidence interval apply only to data that have arisen from carefully designed and executed experiments and surveys.

A vast literature has grown up around the unease researchers feel in placing too much reliance on p values. Examples include Selvin [1957], Yoccuz [1991], Badrick and Flatman [1999], Feinstein [1998], Johnson

104 PART II HYPOTHESIS TESTING AND ESTIMATION

[1999], Jones and Tukey [2000], McBride, Loftis, and Adkins [1993], Nester [1996], Parkhurst [2001], and Suter [1996].

The vast majority of such cautions are unnecessary provided that we treat p values as merely one part of the evidence to be used in decisionmaking. They need to be viewed and interpreted in the light of all the surrounding evidence, past and present. No computer should be allowed to make decisions for you.

A failure to reject may result from insensitive or inappropriate measurements, or too small a sample size.

A difference that is statistically significant may be of no practical interest. Take a large enough sample and we will always reject the null hypothesis; take too small a sample and we will always accept—to say nothing of “significant” results that arise solely because their authors chose to test a “null” hypothesis rather than one of practical interest. (See Chapter 4.)

**46**> 47 48 49 50 51 52 .. 90 >> Next