Download (direct link):
Even the mean of observations taken from a mixture of distributions (males and females, tall Zulu and short Bantu)—visualize a distribution curve resembling a camel with multiple humps—will have a normal distribution if the sample size is large enough. Of course, this mean (or even the median) conceals the fact that the sample was taken from a mixture of distributions.
If the underlying distribution is not symmetric, the use of the ± SE notation can be deceptive because it suggests a nonexistent symmetry. For samples from nonsymmetric distributions of size 6 or less, tabulate the minimum, the median, and the maximum. For samples of size 7 and up, consider using a box and whiskers plot as in Figure 7.3. For samples of size 16 and up, the bootstrap (described in Chapters 4 and 5) may provide the answer you need.
As in Chapters 4 and 5, we would treat the original sample as a stand-in for the population and resample from it repeatedly, 1000 times or so, with replacement, computing the sample statistic each time to obtain a distribution similar to that depicted in Figure 7.4. To provide an interpretation compatible with that given the standard error when used with a sample from a normally distributed population, we would want to report the values of the 16th and 84th percentiles of the bootstrap distribution along with the sample statistic.
When the estimator is other than the mean, we cannot count on the Central Limit Theorem to ensure a symmetric sampling distribution. We recommend you use the bootstrap whenever you report an estimate of a ratio or dispersion.
If you possess some prior knowledge of the shape of the population distribution, you should take advantage of that knowledge by using a parametric bootstrap (see Chapter 4). The parametric bootstrap is particularly
CHAPTER 7 REPORTING YOUR RESULTS 99
recommended for use in determining the precision of percentiles in the tails (P20, P10, P90, and so forth).
Before interpreting and commenting on p values, it’s well to remember that in contrast to the significance level, the p value is a random variable that varies from sample to sample. There may be highly significant differences between two populations and yet the samples taken from those populations and the resulting p value may not reveal that difference. Consequently, it is not appropriate for us to compare the p values from two distinct experiments, or from tests on two variables measured in the same experiment, and declare that one is more significant than the other.
If in advance of examining the data we agree that we will reject the hypothesis if the p value is less than 5%, then our significance level is 5%. Whether our p value proves to be 4.9% or 1% or 0.001%, we will come to the same conclusion. One set of results is not more significant than another; it is only that the difference we uncovered was measurably more extreme in one set of samples than in another.
p values need not reflect the strength of a relationship. Duggan and Dean  reviewed 45 articles that had appeared in sociology journals between 1955 and 1965 in which the chi-square statistic and distribution had been employed in the analysis of 3 x 3 contingency tables and compared the resulting p values with association as measured by Goodman and Kruskal’s gamma. Table 7.1 summarizes their findings.
p values derived from tables are often crude approximations, particularly for small samples and tests based on a specific distribution. They and the stated significance level of our test may well be in error.
The vast majority of p values produced by parametric tests based on the normal distribution are approximations. If the data are “almost” normal, the associated p values will be almost correct. As noted in Chapter 6, the stated significance values for Student’s t are very close to exact. Of course a stated p value of 4.9% might really prove to be 5.1% in practice. The significance values associated with the F statistic can be completely inaccurate for non-normal data (1% rather than 10%). And the p values derived from
TABLE 7.1 p-Value and Association
<.30 .30-.70 >.70
<.01 8 11 5
.05 7 0 0
>.10 8 0 0
100 PART II HYPOTHESIS TESTING AND ESTIMATION
the chi-square distribution for use with contingency tables also can be off by an order of magnitude.
The good news is that there exists a class of tests, the permutation tests described in Chapter 5, for which the significance levels are exact if the observations are independent and identically distributed under the null hypothesis or their labels are otherwise exchangeable.
If p values are misleading, what are we to use in their place? Jones [1955, p. 407] was among the first to suggest that “an investigator would be misled less frequently and would be more likely to obtain the information he seeks were he to formulate his experimental problems in terms of the estimation of population parameters, with the establishment of confidence intervals about the estimated values, rather than in terms of a null hypothesis against all possible alternatives.” See also Gardner and Altman  and Poole .