# Common Errors in Statistics and How to Avoid Them - Good P.I

**Download**(direct link)

**:**

**42**> 43 44 45 46 47 48 .. 90 >> Next

Deaths and disabling accidents and diseases, whether or not directly related to the condition being treated, are common in long-term trials in the elderly and high-risk populations. Or individuals are simply lost to sight (“no forwarding address”) in highly mobile populations.

Lang and Secic [1997, p. 22] suggest a chart such as that depicted in Figure 3.1 as the most effective way to communicate all the information regarding missing data. Censored and off-scale measurements should be described separately and their numbers indicated in the corresponding tables.

TABLES

Is text, a table, or a graph the best means of presenting results? Dyke [1997] would argue, “Tables with appropriate marginal means are often the best method of presenting results, occasionally replaced (or supplemented) by diagrams, usually graphs or histograms.” Van Belle [2002] warns that aberrant values often can be more apparent in graphical form. Arguing in favor of the use of ActivStats® for exploratory analysis is that one can so easily go back and forth from viewing the table to viewing the graph.

A sentence structure should be used for displaying two to five numbers, as in “The blood type of the population of the United States is approximately 45% O, 40% A, 11% B, and 4% AB.”3 Note that the blood types are ordered by frequency.

Marginal means may be omitted only if they have already appeared in other tables.4 Sample sizes should always be specified.

Among our own worst offenses is the failure to follow van Belle’s advice to “Use the table heading to convey critical information. Do not stint.

The more informative the heading, the better the table.”5

Consider adding a row (or column, or both) of contrasts; “for example, if the table has only two rows we could add a row of differences, row 1 minus row 2: if there are more than two rows, some other contrast might be useful, perhaps ‘mean haploid minus mean diploid’, or ‘linear component of effect of N-fertilizer’.”6 Indicate the variability of these contrasts.

3 van Belle [2002, p. 154].

4 Dyke [1997]. Reprinted with permission from Elsevier Science.

5 van Belle [2002, p. 154].

6 Dyke [1997]. Reprinted with permission from Elsevier Science.

94 PART II HYPOTHESIS TESTING AND ESTIMATION

Tables dealing with two-factor arrays are straightforward, provided that confidence limits, least standard deviations, and standard errors are clearly associated with the correct set of figures. Tables involving three or more factors are not always immediately clear to the reader and are best avoided.

Are the results expressed in appropriate units? For example, are parts per thousand more natural in a specific case than percentages? Have we rounded off to the correct degree of precision, taking account of what we know about the variability of the results and considering whether they will be used by the reader, perhaps by multiplying by a constant factor or by another variate—for example, % dry matter?

Dyke [1997] also advises us that “Residuals should be tabulated and presented as part of routine analysis; any [statistical] package that does not offer this option was probably produced by someone out of touch with research workers, certainly with those working with field crops.” Best of all is a display of residuals aligned in rows and columns as the plots were aligned in the field.

A table of residuals (or tables, if there are several strata) can alert us to the presence of outliers and may also reveal patterns in the data not considered previously.

STANDARD ERROR

One of the most egregious errors in statistics—one encouraged, if not insisted upon by the editors of journals in the biological and social sciences—is the use of the notation “mean ± standard error” to report the results of a set of observations.

Presumably, the editors of these journals (and the reviewers they select) have three objectives in mind: To communicate some idea of

1. The “correct” result

2. The precision of the estimate of the correct result

3. The dispersion of the distribution from which the observations were drawn

Let’s see to what degree any or all of these objectives might be realized in real life by the editor’s choice.

For small samples of three to five observations, summary statistics are virtually meaningless; reproduce the actual observations; this is easier to do and more informative.

For many variables, regardless of sample size, the arithmetic mean can be very misleading. For example, the mean income in most countries is far in excess of the median income or 50th percentile to which most of us can relate. When the arithmetic mean is meaningful, it is usually equal

CHAPTER 7 REPORTING YOUR RESULTS 95

to or close to the median. Consider reporting the median in the first place.

The geometric mean is more appropriate than the arithmetic in three sets of circumstances:

1. When losses or gains can best be expressed as a percentage rather than a fixed value.

2. When rapid growth is involved as in the development of a bacterial or viral population.

**42**> 43 44 45 46 47 48 .. 90 >> Next