Download (direct link):
Similarly, one doesn’t just mail out several thousand copies of a survey before performing an initial pilot study to weed out or correct ambiguous and misleading questions.
The following groups are unlikely to yield identically distributed observations: the first to respond to a survey, those who only respond after been offered an inducement, and nonresponders.
Statisticians have found three ways for coping with individual-to-individual and observer-to-observer variation:
1. Controlling. Making the environment for the study—the subjects, the manner in which the treatment is administered, the manner in which the observations are obtained, the apparatus used to make the measurements, and the criteria for interpretation—as uniform and homogeneous as possible.
2. Blocking. A clinician might stratify the population into subgroups based on such factors as age, sex, race, and the severity of the condition and restricting comparisons to individuals who belong to the same subgroup. An agronomist would want to stratify on the basis of soil composition and environment.
3. Randomizing. Randomly assigning patients to treatment within each subgroup so that the innumerable factors that can neither be controlled nor observed directly are as likely to influence the outcome of one treatment as another.
CHAPTER 3 COLLECTING DATA 33
Steps 1 and 2 are trickier than they appear at first glance. Do the phenomena under investigation depend upon the time of day as with body temperature and the incidence of mitosis? Do they depend upon the day of the week as with retail sales and the daily mail? Will the observations be affected by the sex of the observer? Primates (including you) and hunters (tigers, mountain lions, domestic cats, dogs, wolves, and so on) can readily detect the observer’s sex.4
Blocking may be mandatory because even a randomly selected sample may not be representative of the population as a whole. For example, if a minority comprises less than 10% of a population, then a jury of 12 persons selected at random from that population will fail to contain a single member of that minority at least 28% of the time.
Groups to be compared may differ in other important ways even before any intervention is applied. These baseline imbalances cannot be attributed to the interventions, but they can interfere with and overwhelm the comparison of the interventions.
One good after-the-fact solution is to break the sample itself into strata (men, women, Hispanics) and to extrapolate separately from each stratum to the corresponding subpopulation from which the stratum is drawn.
The size of the sample we take from each block or strata need not, and in some instances should not, reflect the block’s proportion in the population. The latter exception arises when we wish to obtain separate estimates for each subpopulation. For example, suppose we are studying the health of Marine recruits and wish to obtain separate estimates for male and female Marines as well as for Marines as a group. If we want to establish the incidence of a relatively rare disease, we will need to oversample female recruits to ensure that we obtain a sufficiently large number. To obtain a rate R for all Marines, we would then take the weighted average pFRF + pMRM of the separate rates for each gender, where the proportions pM and pF are those of males and females in the entire population of Marine recruits.
In the next few sections on experimental design, we may well be preaching to the choir, for which we apologize. But there is no principle of experimental design, however obvious, and however intuitive, that someone will not argue can be ignored in his or her special situation:
• Physicians feel they should be allowed to select the treatment that will best affect their patient’s condition (but who is to know in advance what this treatment is?).
4 The hair follicles of redheads—genuine, not dyed—are known to secrete a prostaglandin similar to an insect pheromone.
34 PART I FOUNDATIONS
• Scientists eject us from their laboratories when we suggest that only the animal caretakers be permitted to know which cage houses the control animals.
• Engineers at a firm that specializes in refurbishing medical devices objected when Dr. Good suggested that they purchase and test some new equipment for use as controls. “But that would cost a fortune.”
The statistician’s lot is not a happy one. The opposite sex ignores us because we are boring,5 and managers hate us because all our suggestions seem to require an increase in the budget. But controls will save money in the end. Blinding is essential if our results are to have credence, and care in treatment allocation is mandatory if we are to avoid bias.
Permitting treatment allocation by either experimenter or subject will introduce bias.
To guard against the unexpected, as many or more patients should be assigned to the control regimen as are assigned to the experimental one. This sounds expensive and is. But shit happens. You get the flu. You get a headache or the runs. You have a series of colds that blend one into the other until you can’t remember the last time you were well. So you blame your silicone implants. Or, if you are part of a clinical trial, you stop taking the drug. It’s in these and similar instances that experimenters are grateful they’ve included controls. This is because when the data are examined, experimenters learn that as many of the control patients came down with the flu as those who were on the active drug, and they also learn that those women without implants had exactly the same incidence of colds and headaches as those who had implants.