# Statistical analysis of mixture distribution - Smith A.F.M

ISBN 0-470-90763-4

**Download**(direct link)

**:**

**48**> 49 50 51 52 53 54 .. 103 >> Next

The following modification of the procedures considered so far in this section, which is due to Hall (1981), actually requires data from all the component distributions as well as form the mixture. The purpose is to estimate the mixing weights alone, with no interest in the component distributions and without the need to specify parametric models for them.

Example 4.5.2 Mixture of k unknown distributions

Suppose data sets of sizes n, nl,...,nk are available from the mixture and the k components, respectively, yielding empirical distributions F„(), Fn!'( ) Then the mixing weights, n, are estimated by minimizing

for some distance measure <5(% •). Hall (1981) considers the case of univariate continuous data with the quadratic-based distance measures <5LA and <5WLA. As a result, the calculation of the optimal n can, of course, be carried out exactly. If, as earlier, we let

Learning about the parameters of a mixture

121

if

Jjt ~

[f»/M-fwW][P„W-FwW]w(x)dx, j,l=

and if

eJ =

[Fn(x)-Fnk(x)][Fn/(x)-Fni<(x)]vv(.x)dx, ;= 1,

then

Jfj = e.

If the mixture is identifiable then, asymptotically, J~l exists so // is uniquely defined. For the case of uniform w(-), and assuming data in the form of Hosmer’s (1973a) model M2, Hall (1981) proves consistency and asymptotic normality and derives the asymptotic covariance structure for »/.

If empirical versions of density functions are to be used, as in <5WLB, for instance, and the data are not discrete, smoothed density estimates such as those based on kernel functions will be required. Titterington (1983) considers the use of <5lb, <5wl,„ and <5ALB with unsmoothed and smoothed discrete data and smoothed continuous data.

4.5.3 Problems with non-explicit estimators

The success of the treatments of Example 4.5.1 discussed above was founded on the linearity of the mixture density or distribution function in terms of the unknown parameters, and the linear or quadratic nature of the criterion to be minimized. However, most finite mixtures give rise to densities which are highly non-linear in the unknown parameters, so that explicit results of any kind are rarely available and numerical optimization will normally be essential. Usually, the Newton-Raphson method or the Method of Scoring has been used to solve the associated stationarity equations. The case of SKL[Fn{-),F(-\iJ/)] has effectively already been dealt with in our discussion of maximum likelihood in Section 4.3. As in (4.5.2), write

r\

and let D2S(if/) denote the matrix of second derivatives. Then, having chosen initial approximations, ^(0), we generate {^(r)} according to

iJ,ir)-oir[D26(il/{r))TlDd(iJ/{r)), r = 0,1.............. (4.5.9)

where ar may be taken to be 1, for all r.

By analogy with the Method of Scoring, we might replace the matrix of second

derivatives by its expected value, conditional on ^ =

A particular version of (4.5.9) is the program ROKE described by I. Clark (1977), in which grouped data from a mixture of univariate normals, possibly with unequal variances, is assessed on the basis of the discrete version of t>LA, that is

12 -> Statistical analysis of finite mixture distributions

S (p(^).r). For the same type of mixture, a modified Newton-Raphson method is^used by Mundry (1972) to minimize a slightly different criterion,

<Wp, r)= I

l= I

where 4> '( ) is the inverse of the standard normal distribution function.

For an extensive treatment of the practical aspects of decomposing normal mixtures using quadratic distance measures, see Macdonald and Pitcher (1979). The distance measure

<5*L*[f,( ).f(-|W]-«'1 t (4.5.10)

i = 1

with Vj ^ xn, which was used in Example 4.5.1, was investigated in detail from a theoretical point of view by Choi (1969a). Fie established, under the usual sort of regularity conditions, several of the asymptotic properties that might be hoped for.

(a) With probability one, there is a neighbourhood of \J/0, the true value of i/f, such that, for large enough n, the criterion has a unique minimizing point therein; \J/n, say.

(b) With probability one, \j/n-nJ/0 as n-*oo.

(c) Asymptotically, \f/„ is multivariate normal.

Furthermore, the asymptotic covariance matrix is derived. The method is illustrated by Choi (1969b) and Choi and Bulgren (1968) in the contexts of empirical Bayes and more general mixtures.

If the roles of Fn( ?) and F{\tf/) are reversed as arguments in the measure (4.5.10), then the criterion may be identified as the Cramer-von Mises statistic

n~l X [^il^)-(»-^)/«]2 + (12n2)_1.

i= 1

The only essential difference is the replacement of the term i/n in (4.5.10) by (i — \)/n. This change does not affect the asymptotic results, nor the quadratic programming nature of the problem when only the mixing weights are unknown. As might be expected, however, it leads to reduction in the bias for small-sample cases in a similar way to the analogous modification of probability-plotting procedures (see, for instance, Barnett, 1975). Empirical evidence for this is indicated by Macdonald (1971); see also Macdonald (1969).

**48**> 49 50 51 52 53 54 .. 103 >> Next