# Statistical analysis of mixture distribution - Smith A.F.M

ISBN 0-470-90763-4

**Download**(direct link)

**:**

**80**> 81 82 83 84 85 86 .. 103 >> Next

E(X‘)=tXX i = l

j-i n

subject to possible constraints imposed on the parameters to be estimated, where the number r is chosen such that a unique solution for all unknown parameters is guaranteed. For example, suppose the mixture consists of two normal densities with means 0, and 0, respectively, and equal variances 02, with mixing parameter n. If 7r, 0j, and 02 are unknown, we solve the r = 3 equations:

E{X) = ndl = x„;

?(X2) = 7i0? + 02 = Xx,2/n;

E(*3) = 7i(0? + 30l02)=2>i3/n.

Sequential problems and procedures

If we consider, for example, the implied equation for 0„ it can be shown that for 0, / 0 a unique solution exists only if n = 2/3; otherwise the problem will have multiple solutions. A unique solution is obtainable only if more information on the unknown parameters is available, or if higher moments are used. For further details see Fu (1968).

Provided the moment equations are relatively simple, the MM is computationally attractive and can be handled sequentially using stochastic approximation-type equations (Fu, 1968). If conditions for attaining a unique solution arc satisfied, the method can be shown to converge to the true parameters with probability one. However, the method can be painfully inefficient.

A maximum likelihood approach, leading to a stochastic approximation algorithm, was proposed by Young and Coraluppi (1970).

6.4.2 A general recursion for parameter estimation using incomplete data

As we remarked in Section 3.2, one way of viewing observations from a finite mixture distribution is to regard them as incomplete data, the incompleteness referring to the absence of the indicator vectors which would identify the actual category or component membership of each observation.

In this section (based closely on Titterington, 1984), we shall begin by considering a general form of recursive estimation algorithm, together with its asymptotic properties. We shall then note the difficulties that arise in implementing this recursion in the multiparameter case with incomplete data (and, in particular, in the case of finite mixtures). Finally, exploiting links between the complete and incomplete data cases or, in the language of earlier sections in this chapter, between ‘supervised learning’ and ‘unsupervised learning we then suggest modified recursions, which are shown to have close connections with the EM algorithm. We recall that, as noted at the end of Section 6.3, although it is convenient to examine such general recursions under the heading of case C, the results apply, a fortiori, to appropriate case A and case B problems. Indeed, several of the recursions already discussed in previous sections will be seen to be special cases of, or closely related to, the recursions to be discussed below. Moreover, as we remarked at the beginning of Section 6.1.1, the algorithms can also be used as computational devices for non-scquential

problems.

We suppose that x„ x2,... are independent observations, each with underlying probability density function p(x\(J/), where i/'e'F c /I1, for some s. Let S(x, i/O denote the vector of scores,

a

S/x, ^) = Trlogp(x|«A), ;=

dtpj

Let D2(x, tj/) denote the matrix of second derivatives of logp(x|«/0 and let /(</') denote the Fisher information matrix corresponding to one observation.

20ft

Statistical analysis offinite mixture distribution

assumed that all derivatives and expected values exist and that

E*S(x, if/) = S(x. if/)p(x| \f/ ) dx = 0.

/() = ?*[S(x, t/OST(.\\ «/')]=- ?* D2(x, if/).

Consider the recursion

which is recognizable as a stochastic approximation procedure. Under regularity conditions over and above those implicitly assumed thus far, as »->00,

in distribution, where if/0 denotes the true parameter value. This result appears in Sacks (1958). Fabian (1968), Nevel’son and Has’minskii (1973, Chapter 8), and Fabian (1978).

The following conditions are required for the most useful version of the result in Fabian (1978).

Cl: Continuity

where II u ||2 = u1 u and C is independent of S.

One further comment should be made. Theoretical results are based on the

assumption that if/feW, for all n. In practice, (6.4.2) may have to be modified

slightly to ensure that this condition holds. For instance, if ip reduces to a mixing

weight an additional constraint should be added, such as: e < if/* < 1 — ?, for all n

and some small positive e. Given these conditions and modifications, (6.4.2) is guaranteed.

II (6.4.2) holds for (6.4.1) then it will also hold for

which is a particularly elegant form to use in that it provides the exact recursion

(6.4.2)

(a) [S(.v, S ) — S(x, if/ )]1 [S(.v, S) — S(x, if/)~\p(x11/0 dx -? 0

as if/ in 4*.

(b) If, as n -* 00, if/* -» if/0, then

o)]"1-

C2: Definiteness

- (6 - if/)J1(0)- 1 ?^S(x, S) > 0 for ? # if/.

(6.4.3)

C3: Boundedness

(6.4.4)

y ... y

(6.4.5)

Sequential problems and procedures ^

obeyed by maximum likelihood estimates in exponential family models (exercise for the reader; or see Titterington, 1984).

**80**> 81 82 83 84 85 86 .. 103 >> Next