# Statistical analysis of mixture distribution - Smith A.F.M

ISBN 0-470-90763-4

**Download**(direct link)

**:**

**68**> 69 70 71 72 73 74 .. 103 >> Next

1 0.25 0.51 0.34 0.51 0.42 0.51

(33.01) (25.12) (46.71) (63.11) (25.00) (43.39)

2 7.29 10.08 9.36 10.08 10.51 10.08

(22.05) (17.74) (25.73) (16.26) (16.28) (14.51)

3 31.41 35.92 35.13 35.92 36.78 35.92

(19.57) (23.54) (43.91) (29.63) (29.01) (23.46)

quite informative, despite the fact that an iterative technique will be required to obtain estimates of parameters. Figure 5.7.1 shows the discriminant boundaries obtained with the two-dimensional sepal data from the Iris versicolour and Iris setosa samples of Fisher (1936). In the diagram, taken from O'Neill (1978), the boundaries that result from taking the data as ‘fully categorized’ and 'un-categorized' are remarkably close together. The two samples are, however, very well separated.

Further empirical results are given by Titterington (1976) and Makov (1980a) using the sequential updating methods described in Chapter 6. In spite of the comparative ease of implementation of the sequential procedures, the use of maximum likelihood estimates, when practicable, is clearly desirable. Han (1978) considers the situation where there are some fully categorized cases and a further sample, all of which come from the same, but unknown, component population. (The underlying model is assumed to be a mixture of two multivariate normals with equal covariance matrices.) The problem of updating a logistic discriminant function is mentioned in Anderson (1979), who suggests the use of maximum likelihood estimates from Ml data. Of course, the logistic approach is based on the diagnostic paradigm and it would appear that, on the basis of previous

remarks, the uncategorized cases have nothing to offer.

However, Anderson (1979) mixes the two parameterization^ using the n in (5.7.3) and p in (5.7.4) as basic parameters. Thus y in (5.7.4) is expressed as a function of p and n and the uncategorized data are therefore worth incorporating

into the discriminant rule. ...

Ganesalingam and McLachlan (1979b) compare, empirically, the discriminant

rule from maximum likelihood with one estimated using the ‘cluster analysis

approach of Section 4.3.4.

McLachlan (1975, 1977) discusses a method which lies somewhere between

Statistical analysis of finite mixture distributions

MLE from unclassified observations

MLE from classified observations

Preliminary partition boundary on which

initial iterates for the EM algorithm were based

TD

s

o

Q.

9

V)

Sepal length

Figure 5.7.1 Linear discriminants for Iris setosa and Iris versicolour data. Reproduced by permission of the American Statistical Association from O’Neill (1978)

these two approaches and which is used for M2 data, containing both fully categorized and uncategorized cases. The method is based on an iterative procedure somewhat reminiscent of, but not equivalent to, the EM algorithm.

In general, these parametric procedures rely either on maximum likelihood estimation or on some approximation thereto, possibly involving sequential incorporation of the uncategorized cases. Given consistency, it should pay, on average, to incorporate the uncategorized cases if an ‘imitator’ of the optimal discriminant rule is then used. As is clear from the above, most of the detailed work has been concentrated on mixtures of two multivariate normals with equal covariance matrices.

More general procedures are possible if the assumption of a parametric model

Learning about the components of a mixture !

is dropped. Several ad hoc suggestions are made by Murray and Titterington (1978) based on methods described in Example 4.3.9. Also mentioned in Example 4.3.9, in the context of density estimation, was the penalized likelihood method. It can be used to obtain non-parametric estimates of the density ratio itself, thus providing a direct non-parametric version of the linear logistic approach. Suppose k = 2, d = 1, and we have a set of mixture data along with a set of observations from the first component. The two sample sizes are n and «„ respectively, and the densities are p(x) and /j(x). The objective is to estimate

v(x) = p(x)//1(x) = 7T, T (1

/i(x)

A plot of v(x) against x is approximately constant, at level itu in regions where /2(x)//i(x) is small. If n and n, are reasonably large, we may regard the data as realizations of two independent inhomogeneous Poisson processes and, if an estimate /i(x) can be made of p{x), the ratio of the intensity of the mixture process to that of the pure process, an estimate of v(x) is given by

v(x) = (njn) fi(x).

Suppose the combined order statistic is zlf...,zN, where N = n + n,. Then, so far as the intensity ratio is concerned, we may restrict our attention (Silverman, 1978) to the conditional log-likelihood

& (a) = X (efa(z,) - log {1 + exp [a(z,)]}),

1=1

where a(z) = log [_n(z)1 and e( = 1 if z( comes from the mixture data, =0 otherwise.

Since we may maximize f? without any parametric restriction, we obtain the degenerate solution a(z,) = oo if e, = 1 and a(z,) = — oo otherwise. To avoid this, a roughness penalty is imposed on the form of a( ). Specifically, we maximize, for some constant K,

**68**> 69 70 71 72 73 74 .. 103 >> Next