Download (direct link):
With the bootstrap, the sample acts as a surrogate for the population. Each time we draw a pair of bootstrap samples from the original sample, we compute the difference in means. After drawing a succession of such samples, well have some idea of what the distribution of the difference in means would be were we to take repeated pairs of samples from the population itself.
As a general rule, resampling should reflect the null hypothesis, according to Young  and Hall and Wilson . Thus, in contrast to the bootstrap procedure used in estimation (see Chapter 3), each pair of bootstrap samples should be drawn from the combined sample taken from
4 For a discussion of these latter, see Brockwell and Davis .
56 PART II HYPOTHESIS TESTING AND ESTIMATION
the two treatment groups. Under the null hypothesis, this will not affect the results; under an alternative hypothesis, the two bootstrap sample means will be closer together than they would if drawn separately from the two populations. The difference in means between the two samples that were drawn originally should stand out as an extreme value.
Hall and Wilson  also recommend that the bootstrap be applied only to statistics that, for very large samples, will have distributions that do not depend on any unknowns.5 In the present example, Hall and Wilson  recommend the use of the t statistic, rather than the simple difference of means, as leading to a test that is both closer to exact and more powerful.
Suppose we draw several hundred such bootstrap samples with replacement from the combined sample and compute the t statistic each time. We would then compare the original value of the test statistic, Students t in this example, with the resulting bootstrap distribution to determine what decision to make.
Pairwise Dependence. If the covariances are the same for each pair of observations, then the permutation test described previously is an exact test if the observations are normally distributed (Lehmann, 1986) and is almost exact otherwise.
Even if the covariances are not equal, if the covariance matrix is nonsingular, we may use the inverse of this covariance matrix to transform the original (dependent) variables to independent (and hence exchangeable) variables. After this transformation, the assumptions are satisfied so that a permutation test can be applied. This result holds even if the variables are collinear. Let R denote the rank of the covariance matrix in the singular case. Then there exists a projection onto an R-dimensional subspace where R normal random variables are independent. So if we have an N dimensional (N > R) correlated and singular multivariate normal distribution, there exists a set of R linear combinations of the original N variables so that the R linear combinations are each univariate normal and independent.
The preceding is only of theoretical interest unless we have some independent source from which to obtain an estimate of the covariance matrix. If we use the data at hand to estimate the covariances, the estimates will be interdependent and so will the transformed observations.
Moving Average or Autoregressive Process. These cases are best treated by the same methods and are subject to the caveats as described in Part 3 of this text.
5 Such statistics are termed asymptotically pivotal.
CHAPTER 5 TESTING HYPOTHESES: CHOOSING A TEST STATISTIC 57
Group Randomized Trials.6 Group randomized trials (GRTs) in public health research typically use a small number of randomized groups with a relatively large number of participants per group. Typically, some naturally occurring groups are targeted: work sites, schools, clinics, neighborhoods, even entire towns or states. A group can be assigned to either the intervention or control arm but not both; thus, the group is nested within the treatment. This contrasts with the approach used in multicenter clinical trials, in which individuals within groups (treatment centers) may be assigned to any treatment.
GRTs are characterized by a positive correlation of outcomes within a group, along with a small number of groups. There is positive intraclass correlation (ICC) between the individuals target-behavior outcomes within the same group. This can be due in part to the differences in characteristics between groups, to the interaction between individuals within the same group, or (in the presence of interventions) to commonalities of the intervention experienced by an entire group. Although the size of the ICC in GRTs is usually very small (e.g., in the Working Well Trial, between 0.01 and 0.03 for the four outcome variables at baseline), its impact on the design and analysis of GRTs is substantial.
The sampling variance for the average responses in a group is (o2/w)*[1 + (n - 1)s)], and that for the treatment average with k groups and n individuals per group is (o2/n)*[1 + (n - 1)a], not the traditional a2/n and o2/(nk), respectively, for uncorrelated data.