# Statistical analysis of mixture distribution - Smith A.F.M

ISBN 0-470-90763-4

**Download**(direct link)

**:**

**73**> 74 75 76 77 78 79 .. 103 >> Next

where Z(X„_,) = y(Xn_,)-M(Xn_1), and Y(Xn_,) is a random variable such that

?[y(*„-i)l*1,...,Xn_1] = M(A'n_1).

The following theorems typify the kinds of conditions under which the convergence and asymptotic normality of (6.2.18) can be established.

Theorem 6.2.1 (Convergence)

In the recursion (6.2.18), if 0 is such that M(9) = a, and

(a) la„=co, In2<oo,

(b) inf (x — 0)[M(x) — a] > 0, Ve>0,

c<\x-0\<t 1

(c) 3d such that Vx,?[T2(x)] ^d(\ +x2),

then the sequence Xx,X2,... tends to 6 almost surely.

Proof : See Gladyshev (1965).

Theorem 6.2.2 (Asymptotic normality)

In the recursion (6.2.18), if

(a) Ea„ = oo, Zu2 < oo,

(b) M(0) = a, (x — 0)[M(x) — a] > 0, Vx ^0,

(c) 3K > 0 such that\M(x) - a|< K|x — 0|, Vx,

(d) Vx, M(x) = a + a,(x-0) + o(|x-0|), with a, >0,

(e) supx?[Z2(x)] < oo,

(f) lim^o?[Z2(x)] = S(0)<oo,

(g) Z(X„) are independent, Vn,

then, if an = An~l, where /la, > j,

nl'2(XK-0)-+N«kA2S¹2A«i - I)-1)

in distribution.

Proof : See Sacks (1958).

Theorem 6.2.1 is used by Makov and Smith (1977a) to establish the convergence of the quasi-Bayes procedure, and Theorem 6.2.2 is used by Kazakos (1977), who

1Statistical analysis of finite mixture distributions shows that, for the recursive procedure defined by (6.2.17),

nlf2{AM - 7T)- N(0, L2(7r)/(7r)[2L(7T)/(7r) -l]"1), (6.2.19)

where /(ÿ) =

[ /1( x) -f2(x)V WlW + (1 - ÿ)/ã(*)] 1 dx>

is the Fisher information for a single observation from p{x\n) = nfl(x) + (1 -n)f2(x).

Minimization of the variance term in (6.2.19) leads to the optimal choice of L{rr). This is found to be

LK(n) =

and forms the basis of the procedure advocated by Kazakos, which we shall denote by K. With this choice, the asymptotic variance is given by VK = the Cramer-Rao lower bound, and so the K procedure is fully efficient. Thus a simple, recursive procedure is shown to achieve the same asymptotic performance as the full, intractable, maximum likelihood approach.

In order to establish the rate of convergence of the QB procedure defined by (6.2.13). we first note that this recursion is a special case of (6.2.17) with a gain function

7l(l — Ë)

(a + ¹ + 1

~ë(1— ë), for large n.

Thus, using (6.2.19) and (6.2.20) we see that the asymptotic variance of the QB procedure is given by

È0„= *2<'-*>2'W ,6.2.2!)

2ÿ(1 — ÿ)/(ÿ) — 1

provided the denominator is greater than zero. The efficiency of the QB procedure is therefore equal to [FQB/(rc)]_ *. Clearly, the Ê procedure is asymptotically superior to the QB procedure. Note, however, that for f- s whose domains of support are disjoint, / _1(ë) = ë(1 - ÿ) and thus LK(n)/LQH(n)-> 1, in which case the QB procedure is fully efficient. Clearly, the two procedures differ only in the choice of their gain functions. LK{n) is influenced by the value of ÿ, the mixing parameter, and by the degree of overlap between /, and /2; LQ[i(n) is asymptotically influenced by ÿ only. However, its failure to take into account the overlap has only a limited effect. This obviously makes the QB procedure very attractive from the computational point of view, as it does not require integration for each n (a criticism that can be levelled, for example, against the Ê procedure).

For the purpose of illustration, we consider here the case of a known bipolar signal with f,, f2 both normal, with means m and — m, respectively, and equal variances Simulation studies reported in Makov and Smith (1977a) have

Sequential problems and procedures

187

0

n

10

15

20

25

30 35 40 45 50

Figure 6.2.1 ft versus the number of observations. Reproduced with permission from Makov (1980). Copyright © 1980 IEEE

shown that while for medium to high S/N (Signal to noise) ratios (m/a), corresponding to a low degree of overlap of the components, the procedures are very similar, for small S/N the K procedure is generally faster. Figure 6.2.1 shows the first fifty estimates of n (true value 0.25), using simulated data from m = I. a2 = I, and prior parameters a = 0.9, /1 = 0.1, which give the starting value 7r(0> = 0.9. The marginal superiority of the K procedure demonstrated here is typical of results observed in a range of simulation studies, for very small S/N ratios, the resulting increase in LK(n) can be counterproductive as it makes the scheme initially too responsive to the data. Although it certainly converges, it can fluctuate considerably for the first few observations, offering an inferior short-run performance to that of similar schemes whose gain is somewhat reduced.

Makov (1980b) proposed a modified version of LK(n) which checks fluctuations of this sort. To motivate the proposed modification, let us clarify the roleof« and fi in Lon(7r) (see equation 6.2.20). For small values ol n, L()iK(n) is obviously influenced by the choice of « and fl, whose values can drastically alter the sensitivity of the recursion to the data. In fact, one can choose an infinite number of pairs (a, fi) which all correspond to the same starting point ft1 ’ and, Irom a Bayesian viewpoint, any such choice is determined by a particular prior distribution on 7i. Large (a, ft) result in a more concentrated beta density lunction, which implies stronger belief in the initial estimate of n and thus in less initial

**73**> 74 75 76 77 78 79 .. 103 >> Next