Download (direct link):
Learning about the parameters of a mixture 57
estimation procedure for g and a from a line fitted as in Figure 4 1 1 (see also Oka, 1954):
(I = X + h/2
ff2 = [<*/, cot (0)/b]-/i2/l 2,
where d and b are the relative scales on the x and y axes, X is the intercept of the line on the x axis, and 0 is the angle between the line and the negative direction of the x axis. For Figure 4.1.3 this gives the following crude estimates
(/1 = 0.25):
Component 1 2 3
/2 - 1.45 0.25 1.20
a 1.00 0.33 0.45
For estimating the mixing weights, Bhattacharya (1967) suggests various methods based on least-squares fitting, to the observed histogram frequencies, of expected frequencies which are calculated under the assumption that the component parameters are correctly estimated by the above method. Explicit estimates are available, as described later in Section 4.5, for general problems when only the mixing weights are unknown. Bhattacharya (1967) includes some ‘quick’ variations which obviate the matrix inversion necessary for full-blooded least squares, but it seems unlikely that much improvement is to be obtained over the method of Tanner (1962) described earlier, particularly in view of the unreliability of the estimates of the means and variances.
Overlapping of the components clearly biases the estimates. In the Bhattacharya method, as in others, it is sometimes possible to subtract the frequencies likely to have originated from the ‘outside’ components and to replot the remaining data, from which less-biased estimates can be obtained. In doing this, the degree of overlap is assessed by, say, using (/i,,<t,) and to
estimate the frequencies in the overlap region which come from components 1 and 3. They can now be subtracted from the observed frequencies and can also be counted into the estimates of the first and third mixing weights. Admittedly this detracts somewhat from the ‘quick, graphical’ character of the basic method.
Informal successive subtraction of the components after fitting quadratics to the logarithms of the extreme sets of frequencies is also described by Buchanan Wollaston and Hodgson (1928). If three frequencies are used, one quadratic fits perfectly. Otherwise a best quadratic may be fitted, by least squares, say. Suppose we obtain the quadratic
g(x) = ax2 + bx + c.
The /th contributing hump of the mixture density, scaled to match an n-sample histogram, is of the form
nJM = [»,/ ,/(2jw?)] exp [ - (x - P,)2/2ff,2],
Statistical analysis of finite mixture distributions
where nt = 71)11, and its logarithm is
- (x - Hi)2/2af + log [n,/
We may thus identify
- 1/2of with a
nJoj with b
- /if/2of + log n, - log yJ(2nof) with c,
which suggests estimates
df = - 1/2 a /}, = — b/2a
log 7t, = c + 62/4a + log [w'1 y/( - n/a)].
As before, overlaps are bound to cause a problem, and subtraction of components from the extremes is likely to be helpful. Tanaka (1962) outlines the practicalities of carrying this out using a set of quadratic templates of varying curvature (corresponding, as equations (4.1.1) indicate, to different variances). A template is chosen to fit a mode of the log-frequencies as well as possible and the remaining two parameters are fitted as above. This leads to a set of expected frequencies for that component, by which the overall frequencies may be reduced before fitting the next component. As in all these procedures there is some danger of small negative net frequencies after subtraction but, given the crudeness of the method, this is not worth worrying about.
4.1.2 Methods based on the cumulative distribution function
The alternative to plotting an estimate of the density function is to plot the empirical distribution function and see whether it shows evidence of a mixture. When investigating the possibility of a mixture of normals it is natural to use normal probability paper, which leads to a normal quantile-quantile (Q-Q) plot. This plot can be described as a plot of an estimate of F~ *(p) against <1>_ ‘(p) (0<p< 1), where F( ) is the cumulative distribution function of the mixture and d>(-) is that of the standard normal. A sample from a single normal distribution should produce a linear plot, the kinds of plots that are likely from various mixtures of normals being illustrated in Figure 4.1.4. Certain deviations from linearity are characteristic of certain types of mixture, although, as usual, there has to be a fair amount of ‘separation’ for the pattern to be clear. The four cases depicted in Figure 4.1.4 are as follows:
(a) equally weighted mixture of two normal densities with similar variances but quite different means;
(b) as (a) but with unequal mixing weights;
Learning about the parameters of a mixture
Figure 4.1.4 ‘Normal’ probability plots for some normal mixtures (schematic)
(c) equally weighted mixture of three normal densities which differ mainly in their means;
(d) equal mixture of two normal densities with the same mean but different variances.