# Common Errors in Statistics and How to Avoid Them - Good P.I

**Download**(direct link)

**:**

**77**> 78 79 80 81 82 83 .. 90 >> Next

The limiting distribution for very large samples of a sample statistic such as the mean or the number of events in a large number of very small intervals often tends to a distribution of known form such as the Gaussian for the mean or the Poisson for the number of events.

Be wary of choosing a statistical procedures which is optimal only for a limiting distribution and not when applied to a small sample. For a small sample, the empirical distribution may be a better guide.

HYPOTHESIS, NULL HYPOTHESIS, ALTERNATIVE

The dictionary definition of a hypothesis is a proposition, or set of propositions, put forth as an explanation for certain phenomena.

For statisticians, a simple hypothesis would be that the distribution from which an observation is drawn takes a specific form. For example, F [x] is N(0,1). In the majority of cases, a statistical hypothesis will be compound rather than simple—for example, that the distribution from which an observation is drawn has a mean of zero.

Often, it is more convenient to test a null hypothesis—for example, that there is no or null difference between the parameters of two populations.

There is no point in performing an experiment or conducting a survey unless one also has one or more alternate hypotheses in mind.

PARAMETRIC, NONPARAMETRIC, AND SEMIPARAMETRIC MODELS

Models can be subdivided into two components, one systematic

and one random. The systematic component can be a function of certain

188 GLOSSARY, GROUPED BY RELATED BUT DISTINCT TERMS

predetermined parameters (a parametric model), be parameter-free (nonparametric), or be a mixture of the two types (semiparametric). The definitions in the following section apply to the random component.

PARAMETRIC, NONPARAMETRIC, AND SEMIPARAMETRIC STATISTICAL PROCEDURES

Parametric statistical procedures concern the parameters of distributions of a known form. One may want to estimate the variance of a normal distribution or the number of degrees of freedom of a chisquare distribution. Student t, the F ratio, and maximum likelihood are typical parametric procedures.

Nonparametric procedures concern distributions whose form is unspecified. One might use a nonparametric procedure like the bootstrap to obtain an interval estimate for a mean or a median or to test that the distributions of observations drawn from two different populations are the same. Nonparametric procedures are often referred to as distributionfree, though not all distribution-free procedures are nonparametric in nature.

Semiparametric statistical procedures concern the parameters of distributions whose form is not specified. Permutation methods and U statistics are typically employed in a semiparametric context.

SIGNIFICANCE LEVEL AND p VALUE

The significance level is the probability of making a Type I error. It is a characteristic of a statistical procedure.

The p value is a random variable that depends both upon the sample and the statistical procedure that is used to analyze the sample.

If one repeatedly applies a statistical procedure at a specific significance level to distinct samples taken from the same population when the hypothesis is true and all assumptions are satisfied, then the p value will be less than or equal to the significance level with the frequency given by the significance level.

TYPE I AND TYPE II ERROR

A Type I error is the probability of rejecting the hypothesis when it is true. A Type II error is the probability of accepting the hypothesis when an alternative hypothesis is true. Thus, a Type II error depends on the alternative.

GLOSSARY, GROUPED BY RELATED BUT DISTINCT TERMS 189

TYPE II ERROR AND POWER

The power of a test for a given alternative hypothesis is the probability of rejecting the original hypothesis when the alternative is true. A Type II error is made when the original hypothesis is accepted even though the alternative is true. Thus, power is one minus the probability of making a Type II error.

190 GLOSSARY, GROUPED BY RELATED BUT DISTINCT TERMS

Bibliography

Adams DC; Gurevitch J; Rosenberg MS. Resampling tests for meta-analysis of ecological data. Ecology 1997; 78:1277-1283.

Albers W; Bickel PJ; Van Zwet WR. Asymptotic expansions for the power of distribution-free tests in the one-sample problem. Ann. Statist. 1976; 4:108156.

Altman DG. Statistics in medical journals. Stat. Med. 1982; 1:59-71.

Altman DG. Randomisation. BMJ 1991a; 302:1481-1482.

Altman DG. Statistics in medical journals: Developments in the 1980s. Stat. Med.

1991b; 10:1897-1913.

Altman DG. The scandal of poor medical research. BMJ 1994; 308:283-284. Altman DG. Statistical reviewing for medical journals. Stat. Med. 1998a; 17: 2662-2674.

Altman DG. Commentary: Within trial variation—A. false trail? J. Clin. Epidemiol.

1998b; 51:301-303.

Altman DG. Statistics in medical journals: Some recent trends. Stat. Med. 2000; 19:3275-3289.

Altman DG. Poor quality medical research: What can journals do? JAMA 2002; 287:2765.

Altman DG; De Stavola BL; Love SB; Stepniewska KA. Review of survival analyses published in cancer journals. Br J. Cancer 1995; 72:511-518.

**77**> 78 79 80 81 82 83 .. 90 >> Next