# Statistical analysis of mixture distribution - Smith A.F.M

ISBN 0-470-90763-4

**Download**(direct link)

**:**

**74**> 75 76 77 78 79 80 .. 103 >> Next

188

Statistical analysis of finite mixture distributions

sensitivity to the data. Makov therefore suggests modifying LK(n) by incorporat ing into it additional parameters, at and /?, to produce the gain function

This gam function takes account both of the value of n and of the S/N ratio, but also guarantees asymptotic efficiency. Unlike LK(n). it allows reduction of the gain through the choice of (a,/?), a reduction which decreases as more observations are available.

A typical (simulated) example where such a modification proves to be useful is shown in Figure 6.2.2, where the first fifty estimates of n are plotted using LQB, LK, and LMk. Here m/a = 0.5/1, the true value of n is 0.75, and 7T<0’ = 0.5(5/(5 + 5)). While the K procedure fluctuates considerably for the first thirty-five observations, the modified scheme is smoother and is closer to the true value of n. The QB procedure is even smoother and in this particular example is the most accurate procedure of all over this initial sequence of observations. Thus, in general, it would seen that a scheme based on Lmk(7t) offers a useful way of achieving both short-term and asymptotic efficiency, at the obvious cost of computing the gain function after each observation.

U,kln,~(cc + fi)/n+l'

/’'I*)

0 5

n

0

5 10 15 20 25 30 35 40 45 50

Figure 6.2.2 t versus the number of observations. Reproduced with permission from Makov (1980). Copyright © 1980 IEEE

Sequential problems and procedures ^

6.2.4 The A-class problem: a quasi-Bayes procedure

We consider the situation in which, conditional on /r = (ji,.rr:.. .,**) and

density functions /, ,/2 fk, we may assume that the random variables V, are

independent, with probability densities

p(xjir) = n,/,.(xj + nj2(xm) + ••• + nkfk(x„), (6.2.22)

where the tt/s are non-negative and sum to unity. The density specifies the probability distribution of the observation, given that it belongs to population //,, and 71, denotes the probability of this latter event. We further assume that the f( are known and the n, unknown.

As in the two-class case, the formal Bayes solution to the problem of learning about n and classifying the observations is deceptively straightforward.

We suppose that p(;r) denotes a prior density for n, p(/r|\') p(tr|x,,x2,...,xr) denotes the resulting posterior density for n given Xi,x2,...,xr, and p,(7r|.vr) denotes the posterior density for n if, in addition to

x 1,x2 xr, it were also known that the rth observation came from //,.

By Bayes theorem, we have, for n ^ I,

p(n|x")ocp(xj7r)p(/r|x" ‘). (6.2.23)

If we now define random variables )'i Y2,..., Yn such that Yn = 1 if and only if

belongs to Ht, i = 1,2 k, then, from (6.2.22) and (6.2.23),

p(/r|x")= V p(yn = /|x")/),(7r|x"), (6.2.24)

1« 1

where, for i = 1,2,...,A,

p( Y„ = i|x") oc ft(x„)^ ~1 (xn 1) (6.2.25)

and

/%

n1 ‘(x" ') =

• • •

7r,p(/r|x" ')dff. (6.2.26)

Sequential learning about n takes place through p(/r|x") and classification of the successive observations on the basis of p(Yn = i|x"), i= 1,2We note, incidentally, that the forms of p( Y„ = i|x") are the same as would be obtained for known rr,, except that the latter arc estimated by their expected values, given

X X ^ ^

As we pointed out in previous sections, the implementation of this learning procedure raises some serious computational problems because of the mixture form which appears in (6.2.22). Successive computation of (6.2.23), or (6.2._4), introduces an ever-expanding linear combination of component posterior densities, each of which corresponds to an updating based upon a particular choice of previous classifications. At the nth stage, there are implicitly k such possible classifications, and calculation quickly becomes prohibitive.

As in the two-class case, there arc many forms of ad hoi recursive procedure

I gQ Statistical analysis of finite mixture distrihuii

ions

that one might adopt. Kazakos and Davisson (1980) consider a decision-directed approach: Kazakos(1977) extends the Newton- Raphson type algorithm to the k-class situation; and Smith and Makov (1978) extend the quasi-Bayes solution. We shall not review all these generalizations in detail. Instead, we shall outline the form of the quasi-Bayes solution and then make some general remarks about

asymptotic properties.

We begin by considering the formal Bayesian solution to the problem, assuming that />(*). the prior density for /r, has the form of a Dirichlet density,

P(: )_r(otV») r(a«2°>) - - - r(«i°>) ,B, ' • (6'127)

which we denote by D{n;a',01, a(20)—, o^01), where a-01 ^ 0, / = 1,2,..., k. Such a form might arise, for example, from a multinomially distributed training sample, whose correct classifications were known.

It follows from (6.2.24) that, after observing x,,

p(/r|x,) = X p[Y = i\xx)D{n;a\0) + dil,u(?) + di2,...,<x[0) + dik), (6.2.28)

i= 1

where = /|x,)oc/i(x1)a{0) and <5i7 = 0 if / 3i}= 1 if i=j. It is natural to

consider approximating (6.2.28) by a suitable Dirichlet density. The form of p(tr |-Xj ,x2) would then be a linear combination of k terms, as in (6.2.28), and could itself be then approximated by a Dirichlet density; and so on. Proceeding in this way, we could keep the necessary computation within reasonable bounds.

**74**> 75 76 77 78 79 80 .. 103 >> Next