# Statistical analysis of mixture distribution - Smith A.F.M

ISBN 0-470-90763-4

**Download**(direct link)

**:**

**34**> 35 36 37 38 39 40 .. 103 >> Next

It is easy to see that the M step for Oj reduces to the maximization of

*

i -a(flj)].

•• , /=1

This gives

<6T+l)=W^,M))]_1 t w0^(m,)t(xi), j= 1.................

^ J i = I

86

Statist tail analysis of finite mixture distributions

where X Wi^(m)) = nn{f+l).

f= 1

Note the ‘weighted average nature of 0jm+,). Note also that nftf/{m)) corresponds to a pseudo sample size associated with they'th subpopulation, with the observations allocated, by the fractions defined by the weights, to the various

subpopulations.

For this example, the likelihood equations themselves take an interesting form. We can write the equation for n} (cf. Example 4.3.1) as

n

7tj = n~l X n}i'lf)ln-

i= 1

Differentiation with respect to 0J similarly leads to the stationarity condition

<t>j = l>j(V0] " ‘ Z wy(^)t(xf),

i= 1

inevitably reminiscent of the above M step. It follows therefore that, for stationarity,

Z nj<f)j = n-1 X Z Wi^n(Xi) = trl X M

j= 1 j = I 1=1 1=1

Thus, for the minimal sufficient statistics t(x), the maximum likelihood estimator of Z[t(A')| i/0] is given by the corresponding sample mean. Manifestations of this, with varying degrees of generality, have been noticed by Behboodian (1970a), Fryer and Holt (1970), and Wilson and Sargent (1979).

An appealing feature of the EM algorithm is that the approximations generated maintain this relationship, in that

? 7lJm)^im) = n 1 ?

7=1 /=1

for all m except possibly m = 0.

Having discussed the EM algorithm and stationarity conditions for the exponential family model, we can safely leave simple special cases as exercises for the reader. These include mixtures of Poissons, exponentials, binomials, geometries, and normals, univariate or multivariate, with or without equal variances or covariance matrices. The inclusion of fully categorized data can be coped with by specifying “degenerate’ weights in the iterations. For the time being, we shall confine our treatment of special cases to normal mixtures.

Example 4.5.2 (continued) Mixture of two univariate normals

Straightforwardly applying the EM algorithm, we obtain, if/if = Z<w,/^(m)), for

7 =1, 2,

Learning about the parameters of a mixture

87

n

n

(4.3.7)

where wj) = Note the superscript (m + 1) with pj on the right-hand

side of (4.3.7) (so that the recursions in Hosmer, 1973a, and Leytham, 1984 are, therefore, not quite EM), and note also the now-familiar analogy with the maximum-likelihood formulae for the case of fully categorized data.

Extension of both this and Example 4.3.1 to the case of mixtures of k component densities (k > 2) is straightforward. For the latter, Peters and Coberly

(1976) establish the concavity of the log-likelihood function ; see also Kazakos

(1977).

Example 4.3.4 Mixture of two multivariate normals with equal covariance matrices (Day, 1969; O’Neill, 1978; Gancsalingam and McLachlan, 1981)

Denote the two mean vectors by ftx,ft2 an^ the common covariance matrix by X. Then p(x| iff) = nl({)d(x\fil,'L) + 7r2</>j(x|/i2,X), where 4>d denotes the d-dimensional normal density function. The likelihood equations can be rearranged in the form (cf. Example 4.3.2)

A drawback to the direct EM algorithm, which is, effectively, a successive approximations iteration based on the above equations, is the need to calculate (E(m))-1 at each stage in order to update the {wtj}. However, Day (1969) proposes an elegant way of avoiding this problem. In parallel with Example 4.3.3, it turns out that the maximum likelihood estimators for the mean p and covariance matrix, V, of the mixture density are given by

n

n, = n 1 ? Wu{tj/), 1= 1,2,

n

= Z /=1,2

2 n

1 = n~1 Z Z wi/ Mx< -

j= 11= i

where

H/ = Z /=1,2,

and

WijliJ/) = TtjtpMfipZypiXtl H for each ij.

n

and

(/=n_1 ? (x, - x)(Xi - x)T.

I = 1

88

Statistical analysis of finite mixture distributions

Furthermore, for each i,

w„(^) = [ 1 + exp(?Tx,- + b)~\ l,

where

6=I.-l(P2 — /*l) = — Z11)/C 1 — -Ml)]

and

0 = - /ill' V2) + log(jc2/ji4)

= -4<5T(/I| -F /i2) -t" l0g(7T2/7r1).

Given /1, F, <5. and /?, all the original parameters, ^ can be evaluated. This permits an iterative procedure that requires only the initial inversion of P: At stage m, given ij/,m\ we may compute S(m+1\and an updated set of {w0}’s. Substitution of these in the original likelihood equations gives if/{m +1). Ganesalin-gam and McLachlan (1981) generalize this to incorporate fully categorized data. (Note that the new parameterization includes, directly, the parameters (ft, 6) of the linear discriminant function; see also Section 5.7.)

So far we have laid great emphasis on the EM algorithm for the calculation of maximum likelihood estimates. It is important, however, to bear in mind the existence of competing numerical methods, of which the most familiar are

(a) Newton-Raphson (NR);

(b) the Method of Scoring (MS).

Suppose we write the log-likelihood of interest as Then the iterative step

for the NR algorithm is defined by

^(m +1)_ ^(m,_am[D2JSf(^(m))]"1 DjSf(^(m)), in = 0,1,..., (4.3.8)

**34**> 35 36 37 38 39 40 .. 103 >> Next