Download (direct link):
“The factor 1 + (n - 1)a is called the variance inflation factor (VIF), or design effect. Although a in GRTs is usually quite small, the VIFs could still be quite large because VIF is a function of the product of the correlation an group size n.”
“For example, in the Working Well Trial, with a = 0.03 for daily number of fruit and vegetable servings, and an average of 250 workers per work site, VIF = 8.5. In the presence of this deceivingly small ICC, an 8.5-fold increase in the number of participants is required in order to maintain the same statistical power as if there were no positive correlation. Ignoring the VIF in the analysis would lead to incorrect results: variance estimates for group averages that are too small.”
To be appropriate, an analysis method of GRTs need to acknowledge both the ICC and the relatively small number of groups. Three primary approaches are used (Table 5.2):
1. Generalized Linear Mixed Models (GLMM). This approach, implemented in SAS Macro GLIMMIX and SAS PROC MIXED, relies on an assumption of normality.
6 This section has been abstracted (with permission from Annual Reviews) from Feng et al. , from whom all quotes in this section are taken.
58 PART II HYPOTHESIS TESTING AND ESTIMATION
2. Generalized Estimating Equations (GEE). Again, this approach assumes asymptotic normality for conducting inference, a good approximation only when the number of groups is large.
3. Randomization-Based Inference. Unequal-sized groups will result in unequal variances of treatment means resulting in misleading p values. To be fair, “Gail et al.  demonstrate that in GRTs, the permutation test remains valid (exact or near exact in nominal levels) under almost all practical situations, including unbalanced group sizes, as long as the number of groups are equal between treatment arms or equal within each block if block ing is used.”
The drawbacks of all three methods, including randomization-based inference if corrections are made for covariates, are the same as those for other methods of regression as detailed in Chapters 8 and 9.
TABLE 5.2 Comparison of Different Analysis Methods for Inference on Treatment Effect jja
Method 102 ?(102 SE) p Value p
GLIM (independent) -6.9 (2.0) 0.0006
GEE (exchangeable) -6.8 (2.4) 0.0052 0.0048
GLMM (random intercept) -6.7 (2.6) 0.023 0.0077
df D 12b
Permutation -6.1 (3.4) 0.095
t test (group level) -6.1 (3.4) 0.098
Permutation (residual) -6.3 (2.9) 0.052
GLIM (independent) -7.8 (12) 0.53
GEE (exchangeable) -6.2 (20) 0.76 0.0185
GLMM (random intercept) -13 (21) 0.55 0.020
df D 12b
Permutation -12 (27) 0.66
t-test (group-level) -12 (27) 0.66
Permutation (residual) -13 (20) 0.53
a Using Seattle 5-a-day data with 26 work sites (K = 13) and an average of 87 (n, ranges from 47 to 105) participants per work site. The dependent variables are In (daily servings of fruit and vegetable C1) and smoking status. The study design is matched pair, with two cross-sectional surveys at baseline and 2-year follow-up. Pairs identification, work sites nested within treatment, intervention indicator, and baseline work-site mean fruit-and-vegetable intake are included in the model. Pairs and work sites are random effects in GLMM (generalized linear mixed models). We used SAS PROC GENMOD for GLIM (linear regression and generalized linear models) and GEE (generalized estimating equations) (logistic model for smoking data) and SAS PROCMIXED (for fruit/vegetable data) or GLMMIX (logistic regression for smoking data) for GLMM; permutation tests (logit for smoking data) were programmed in SAS.
b Degrees of freedom (df) = 2245 in SAS output if work site is not defined as being nested within treatment.
Source: Reprinted with permission from the Annual Review of Public Health Volume 22, © 2001 by Annual Reviews. Feng et al. .
CHAPTER 5 TESTING HYPOTHESES: CHOOSING A TEST STATISTIC 59
Nonsystematic Dependence. If the observations are interdependent and fall into none of the preceding categories, then the experiment is fatally flawed. Your efforts would be best expended on the design of a cleaner experiment. Or, as J. W. Tukey remarked on more than one occasion, “If a thing is not worth doing, it is not worth doing well.”
Testing for the equality of the variances of two populations is a classic problem with many not-quite-exact, not-quite-robust, not-quite-powerful-enough solutions. Sukhatme  lists four alternative approaches and adds a fifth of his own; Miller  lists 10 alternatives and compares four of these with a new test of his own; Conover, Johnson, and Johnson  list and compare 56 tests; and Balakrishnan and Ma  list and compare nine tests with one of their own.
None of these tests proves satisfactory in all circumstances, because each requires that two or more of the following four conditions be satisfied:
1. The observations are normally distributed.