Download (direct link):
Learning about the parameters of a mixture
Figure 4A.I Two histogram representations of a set of 300 observations Irom a three-component normal mixture
-2 0 1
1 0.25 0.50
54 Statistical analysis of finite mixture distributions
weights were equal and the means and standard deviations are given in Table 4.1.1.
The first thing we might look for is evidence of multimodality. Of course, the strength of such evidence depends on the fineness of the partitioning of the data by the histogram intervals. In the first histogram, in Figure 4.1.1(a), there are three modes, which happens to reflect well the form of the generating density. The number of modes doubles, however, when the interval length is halved, (Figure 4.1.1b), despite the fact that the second histogram might be generally accepted as a reasonable compromise grouping which is neither too coarse nor too fine. It is clear that the modality of a histogram may not be a reliable guide as to the true number of components, or even modes (recall the discussion of Section 3.3.1!). Murphy (1964) tantalizes his readers with various samples of size 50 from a normal distribution. Several of them might be thought to show evidence of two or even three modes.
These considerations serve to underline a lesson learned in Section 3.2. To obtain reliable inferences, particularly about k, either the components have to he very well separated or we shall need very large samples indeed. In the work to be reviewed in this section, the general underlying principle seems to be to search for areas in the sample space where the mixture density behaves like a pure component.
Figure 4.1.2 Plots of first differences (x x) and second differences
(•------• ) based on Figure 4.1.1 (a)
Learning about the parameters of a mixture ^
In the method of Tanner (1962), the characterization of modes as local maxima (first derivative zero, second derivative negative) and of antimodes as local minima (first derivative zero, second derivative positive) is exploited. First and second differences of histogram counts are used instead of the derivatives, as shown in Figure 4.1.2, constructed from the histogram in Figure 4.1.1(a).
The three zeros of the first-differences plot, which are marked ‘M\ give crude estimates for the modes, with approximate values
(- 1.9, 0, 0.9).
The distances A2B2 and A3B3 similarly estimate the distances between points of inflection on the up- and downslopes of the second and third component densities. They estimate, therefore, twice the standard deviations, giving standard deviation estimates of about
For the flatter first component the picture is not so clear. Which of the distances A,,B, and A12B, should we look at? The two estimates of the first standard deviation are
of which the former happens to be closer to the true value.
The other two zeros of the first-differences plot lead to a partitioning of the sample and thus to estimates of the mixing weights. These are at P, and P2, values of about — 1.2 and 0.8. It is then reasonable to ‘allocate’ all frequencies in the first seven intervals to component 1, along with a proportionate amount from the eight interval. The obvious linearly based proportion is
-1.2-(-1.25) . .
— 0.75 — (— 1.25) ’
This gives, for Ti,, an estimate of
^ = (1 + 3 + 7 + 10 + 16 + 22 + 16 + j?)/300 = 0.255.
Correspondingly, we obtain
n2 = 0.522 and ft3 = 0.223.
Biases are, of course, incurred with these crude estimators as a result of the overlap.
A less crude approach, also based on the histogram data, is that described by, among others, Bhattacharya (1967). The method relies on two facts.
(a) The logarithm of a normal density is a concave quadratic in the variable, so
that its derivative is linear, with negative slope.
(b) When there is a lot of data and the grouping imposed by the histogram is quite fine, the histogram heights are roughly proportional to the density.
Statistical analysis of finite mixture distributions
Figure 4.1.3 Bhattacharya plot for data in Figure 4.1.1(b). Lines corresponding to three components were fitted ‘by eye’
Thus, a plot of first differences of the logarithms of the histogram frequencies from data from a mixture of well-separated normal components should display a sequence of negatively sloping ‘linear’ plots, one corresponding to each component. Figure 4.1.3 displays the Bhattacharya plot from the histogram in Figure 4.1.1(b).
Although the picture is not too clear (inevitably), there is some evidence for the presence of three normal components, indicated by the somewhat optimistically drawn lines. Clearly, the positions and orientations of the lines contain information which can be used to provide crude parameter estimates. Under the assumption that a data set arises from an N(fi,o2) distribution and that the histogram interval is of width /1, Bhattacharya (1967) derives the following