# Common Errors in Statistics and How to Avoid Them - Good P.I

**Download**(direct link)

**:**

**18**> 19 20 21 22 23 24 .. 90 >> Next

Definitions and a further discussion of the interrelation among power and significance level may be found in Lehmann [1986], Casella and Berger [1990], and Good [2001]. You’ll also find discussions of optimal statistical procedures and their assumptions.

Shuster [1993] offers sample size guidelines for clinical trials. A detailed analysis of bootstrap methodology is provided in Chapters 3 and 7.

For further insight into the principles of experimental design, light on math and complex formulas but rich in insight, study the lessons of the

CHAPTER 3 COLLECTING DATA 37

masters: Fisher [1925, 1935] and Neyman [1952]. If formulas are what you desire, see Thompson and Seber [1996], Rosenbaum [2002],

Jennison and Turnbull [1999], and Toutenburg [2002].

Among the many excellent texts on survey design are Fink and Kosecoff [1988], Rea, Parker, and Shrader [1997], and Cochran [1977]. For tips on formulating survey questions, see Converse and Presser [1986], Fowler and Fowler [1995], and Schroeder [1987]. For tips on improving the response rate, see Bly [1990, 1996].

38 PART I FOUNDATIONS

Part II

HYPOTHESIS TESTING AND

Chapter 4

Estimation

Accurate, reliable estimates are essential to effective decisionmaking. In this chapter, we review preventive measures and list the properties to look for in an estimation method. Several robust semiparametric estimators are considered along with one method of interval estimation, the bootstrap.

PREVENTION

The vast majority of errors in estimation stem from a failure to measure what one wanted to measure or what one thought one was measuring. Misleading definitions, inaccurate measurements, errors in recording and transcription, and confounding variables plague results.

To forestall such errors, review your data collection protocols and procedure manuals before you begin, run several preliminary trials, record potential confounding variables, monitor data collection, and review the data as they are collected.

DESIRABLE AND NOT-SO-DESIRABLE ESTIMATORS

“The method of maximum likelihood is, by far, the most popular technique for deriving estimators” Casella and Berger [1990, p. 289]. The proper starting point for the selection of the “best” method of estimation is with the objectives of our study: What is the purpose of our estimate?

If our estimate is 0* and the actual value of the unknown parameter is 0, what losses will we be subject to? It is difficult to understand the popular-

CHAPTER 4 ESTIMATION 41

ity of the method of maximum likelihood and other estimation procedures that do not take these losses into consideration.

The majority of losses will be monotone nondecreasing in nature; that is, the further apart the estimate 0* and the true value 0, the larger our losses are likely to be. Typical forms of the loss function are the absolute deviation |0* - 0|, the square deviation (0* - 0)2, and the jump—that is, no loss if |0* - 0| < 8, and a big loss otherwise. Or the loss function may resemble the square deviation but take the form of a step function increasing in discrete increments.

Desirable estimators share the following properties: impartial, consistent, efficient, robust, and minimum loss.

Impartiality

Estimation methods should be impartial. Decisions should not depend on the accidental and quite irrelevant labeling of the samples. Nor should decisions depend on the units in which the measurements are made.

Suppose we have collected data from two samples with the object of estimating the difference in location of the two populations involved. Suppose further that the first sample includes the values a, b, c, d, and e, the second sample includes the values f g, h, i, j, k, and our estimate of the difference is 0*. If the observations are completely reversed—that is, if the first sample includes the values f, g, h, i, j, k and the second sample the values a, b, c, d, and e—our estimation procedure should declare the difference to be - 0*.

The units we use in our observations should not affect the resulting estimates. We should be able to take a set of measurements in feet, convert to inches, make our estimate, convert back to feet, and get absolutely the same result as if we’d worked in feet throughout. Similarly, where we locate the zero point of our scale should not affect the conclusions.

Finally, if our observations are independent of the time of day, the season, and the day on which they were recorded (facts that ought to be verified before proceeding further), then our estimators should be independent of the order in which the observations were collected.

Consistency

Estimators should be consistent; that is, the larger the sample, the greater the probability the resultant estimate will be close to the true population value.

Efficient

One consistent estimator certainly is to be preferred to another if the first consistent estimator can provide the same degree of accuracy with fewer

42 PART II HYPOTHESIS TESTING AND ESTIMATION

observations. To simplify comparisons, most statisticians focus on the asymptotic relative efficiency (ARE), defined as the limit with increasing sample size of the ratio of the number of observations required for each of two consistent statistical procedures to achieve the same degree of accuracy.

**18**> 19 20 21 22 23 24 .. 90 >> Next