Download (direct link):
W= X MflogO^e-0' + (1 -Jtjflie"®1].
i = 0
As expected, the Newton algorithm either failed, by wandering too far from the solution, or converged quickly, in 8 to 11 iterations. The point of convergence was rii =0.3599, §x = 1.2561, 02 = 2.6634, slightly different from that obtained by Hasselblad using the EM algorithm (probably because the EM algorithm was stopped ‘too early’). From Table 4.3.1, the slow but comparatively sure properties of EM are only too evident. The numbers of iterations required in order that 71, be ‘correct’ to two decimal places and to four decimal places are given. The difference between the two is, in each case, about 1000 iterations! Indeed, it requires a further 740 EM steps to take Hasselblad’s tf,( = 0.3590) up to 0.3599!
The maximum likelihood Poisson mixture fits very well, with y2 = 1.27 on 5 degrees of freedom. The posited real-life justification of the mixture model is that there could be different patterns of deaths in winter and summer.
Example 4.3.6 Mixture of two known normals A sample of 300 values was generated from the mixture
p(x) = O.80(x | — 1,1) + O.20(x 11,1)
Learning about the parameters of a mixture
Table 4.3.2 Results for Example 4.3.6
Sample size Tt Estimated standard error l IT Iterations required EM NR
25 0.775 0.116 0.311 14 4
50 0.774 0.088 0.398 16 4
100 0.813 0.061 0.378 19 4
300 0.821 0.032 0.310 16 3
The EM and NR algorithms were used to find the maximum likelihood estimate of the mixing weight, assuming that all other parameters were known. Both algorithms were started off with an initial estimate of 0.5, and samples of sizes 25, 50, 100, and 300 were extracted from the total set.
Table 4.3.2 displays the results based on a stopping rule |7t(,m+11 — 7r(,m,| <10 s, where 7r,,m| and 7r,1m+1) are consecutive iterates. Also shown are the estimated standard errors of the estimates and the corresponding estimate of / \ where I is the Fisher information per observation. This should be compared with the theoretical value of / “' =0.327, evaluated by the methods of Section 3.2.
Redner and Walker (1984) apply the EM algorithm to normal mixtures. They discover that, although ‘convergence’ typically takes very many iterations, it does not take long to get ‘high up’ on the likelihood surface. Usually, 95 per cent of the change in the log-likelihood is achieved in the first five iterations. This suggests that a composite algorithm, in which a few EM iterations are followed by one or two further iterations using MS or NR, would be worth investigation.
The idea of speeding up EM itself was raised by Peters and Walker (1978a, 1978b). Suppose we write the EM iteration as
and we define
^(m+ 1) = ^<m| + a[-G(^
reminiscent of (4.3.8) and (4.3.9), with a = I corresponding to EM. Peters and Walker look at versions of Examples 4.3.1 and 4.3.2. They find that strongly consistent maximum likelihood estimators, ij/, for if/ exist and that, provided initial estimates are chosen close enough to ‘local’ convergence occurs for 0 < a < 2. Asymptotically optimal rates of convergence occur for some a* with 1 < a* < 2. Typically, if the component densities are well separated, a* is near 1 and convergence is fast. Otherwise, a* is near 2 and convergence is slow.
Incorporation of a crude one-dimensional search technique into the EM algorithm for exponential mixtures is described by Wilson and Sargent (1979).
4.3.3 Theoretical considerations
Two important questions are raised by the above discussion.
Statistical analysis of finite mixture distributions
(a) Do maximum likelihood estimators exist with the familiar, desirable,
(b) What convergence properties can be established for the various computational methods available, particularly the recently proposed EM
The reader may well be surprised to have just been informed that the answer to (a) seems to be ‘yes’ for the normal mixture of Example 4.3.2, in spite of all its singularity problems. However, the singularities occur at certain points on the boundary of the parameter space as the variances tend to zero. In most examples, provided we keep away from these boundaries we seem to find a sensible local maximum on the likelihood surface. The following useful summary of the theoretical results available is abstracted from Redner and Walker (1984), to whom the reader should refer for full regularity conditions.
Suppose \J/' denotes the parameters 6), with the redundant nk
omitted, and suppose all Uj > 0, j = 1,... ,k. Suppose also that the class of mixtures is identifiable. The likelihood equations are, therefore,
Dy<eW) = 0. (4.3.11)
Theorem 4.3.1 Given the existence of, and certain boundedness conditions on, derivatives off(x\if/), of orders up to 3, and given that /(iJ/'0), the Fisher information matrix evaluated at the true if/'0 is well defined and positive definite, there is a unique solution tj/n of (4.3.11) in a certain small neighbourhood of iJ/'0 for all sufficiently large sample sizes, n. This f/'n locally maximizes the likelihood and