# Statistical analysis of mixture distribution - Smith A.F.M

ISBN 0-470-90763-4

**Download**(direct link)

**:**

**71**> 72 73 74 75 76 77 .. 103 >> Next

We assume that, if the source of an observation is known, then the probability distribution of that observation is completely specified by the known density,^ for /7, or/2 for A/2, but we assume that the probability for source H, is unknown. Denoting this prior probability by n (so that 1 - n is the prior probability for H2), wc suppose the observations x„ to be independent, given n, with common density

P(x„\n) = nfi(xj + (1 - n)J2(xn).

(6.2.1)

We denote by p0(n) the density for n prior to obtaining any observations, and we write p(7i\xl,...,x„) = p(7r|xn),/i > 1, for the posterior density conditional on

having observed x, xn, but without knowing their correct sources. By Bayes

theorem, we have

n

Po(n) | 1 p(X;\n)

p(n\xn)= 1=1

(6.2.2)

Poi*) 11 P(xt\n)dn

o 1=1

p(xn\n)p{n\xn ') n

(6.2.3)

p(x„\n)p(n\xn l)d7T

0

The form (6.2.2) reveals how the order of complexity increases with n as a result of the mixture form (6.2.1).

1 he decision regarding the source of an observation xn depends on the quantity w„= Pr(xne/y | |xxn), which is given by

0

Pr(Xne//1|7r,X„)p(7r|x")d7r

(6.2.4)

o p{x„ I n)

p(n\x")dn.

(6.2.5)

Writing h{" 11 =

o

p(7i|x1 1 )d7r and using (6.2.3), we obtain

f\ (-^)7r,n

-1)

>> -P/2(xn)(l - 7r<«- »>)’

(6.2.6)

w hich has the same form as in the case of knowm 7r, but with n replaced by its posterior mean conditional on xx„ j. Assuming a zero one loss function, we would decide that x„6//lt if vv„ > \and that xneH2, if vv„ < f

Sequential problems and procedures

The above development is deceptively straightforward and conceals the fact that the computation of the successive /?(7r|x"), and hence of n„, increases in complexity as n increases, as a result of the mixture form p(x„\n). The computation utilizes weighted average forms involving an exponentially increasing number of terms, each requiring updating at each stage. We therefore need to seek approximations which avoid this ever-increasing computational complexity.

We begin by considering the formal Bayesian solution in the particular situation where the prior density p0(n) for the unknown parameter n is taken to have the form of a beta density

Pofr) = *(1 ~ tCf°- \ (6.2.7)

r(«o)r(/?0)

which we denote by fl(7r;a0,/f0), with ao>0, fi0> 0. It follows from (6.2.2) that

p{n\xl) = wiB(n;a0+ l,/?o) + 0 -wl)B(n\ct0,(l0+ I), (6.2.8)

where

vv, =

I r

_ ao + Vo _ 1.

+

<*o + fio «0 + Vo

(6.2.9)

and it is easily seen that, in general, the p(7r|x") build up as weighted averages of beta densities.

To avoid the expanding form for p(7r|x"), it is natural to consider approximating (6.2.8) by a suitable beta density. At the next step, the resulting form for p(n|xi,x2) would then be merely a linear combination of two beta terms and could itself be approximated by a beta density. Proceeding in this way, the necessary computation could be kept within reasonable limits.

Having just observed x,, consider the first step. If we were informed of the true source ofXj, the distribution of n would be independent ofx, and would be given by

B(n\a.0 -I- An,/f0 + Ai2), (6.2.10)

where A17 is 1 if x, eHj and 0 otherwise,; =1,2. Since we are not informed of the true source (a situation which is referred to in the engineering literature as ‘learning without a teacher’; see, for example, Agrawala, 1973), we need, in effect, to ‘estimate’ the unknown Atj. Viewed from this perspective, several of the ad hoi solutions proposed in the literature can be examined more systematically.

(a) Decision-directed learning (DD)

This is the general name given to procedures which assign AtJ a value (0 or I) on the basis of some decision rule. Thus, for example, Davisson and Schwartz (1970) set A, j equal to one if vv, > 2 and to zero otherwise, with A, 2 — I — A,,. This approach, in effect, assumes that the most likely is, in fact, the true one. If we

182

Statistical analysis of finite mixture distributions

write

1, if

0, otherwise,

the (DD) estimator for n is

This seemingly intuitive ad hoc procedure, which has been widely used, was investigated in detail by Davisson and Schwartz (1970), who demonstrated that the approach does not necessarily guarantee asymptotic unbiasedness and can also lead to problems of “runaways’, in that there are problems where the true n satisfies 0 < n < 1, but 7r(nl can converge to 0 to 1.

katopis and Schwartz (1972) proposed a modified decision-directed learning procedure (M DD) which avoids the problems associated with DD, but at the cost of requiring numerical integration at each step in the recursive sequence of estimates 7r(n'. Kazakos and Davisson (1980) have found a fully efficient modification of the MDD approach, but their procedure is even more computationally demanding.

(b) Learning with a probabilistic teacher (PT)

This is the name given to the approach which makes a randomized choice for A , u setting it equal to one with probability and setting it to zero otherwise, with A12 = 1 — A,,. The details of the approach are given in Agrawala (1970) and its asymptotic properties are investigated in Silverman (1979), who shows that the method has little to commend it from the efficiency point of view.

**71**> 72 73 74 75 76 77 .. 103 >> Next