# Statistical analysis of mixture distribution - Smith A.F.M

ISBN 0-470-90763-4

**Download**(direct link)

**:**

**76**> 77 78 79 80 81 82 .. 103 >> Next

k*l = 9.-n~'G0Jy(0„,xntl)(6.3.1)

where, given 0, the observations Xn are independently distributed with common density p(x\0), 0n+i is the estimate of 0 based on x,,..., xn+,, G{0n) is an adjustable gain function whose explicit form will be discussed later and y(6,x) is defined by

y(0,x)= - ^log/?(x|0). (6.3.2)

The stochastic approximation recursion (6.3.1) is clearly aimed at finding the root of }’(•), or, equivalently, through (6.3.2), the extremum of log p(x|0), which would coincide with the maximum likelihood estimator of 0. Patrick (1972) gives a general discussion of such recursions and we have already come across similar forms in our discussion of case A.

Defining

0(z, 0) = E0y{z, X), (6.3.3)

where the expectation is with respect to p(x\0), we note the following.

Lemma 6.3.1

g(z,0) has the following properties:

(a) g(0,0) = 0;

(b) there exists 0’ ? 0, a neighbourhood of 0, such that

inf(z - 0)g(z10) > 0 forze0',z*0. (6-3-4)

Proof : We note that

,g4 Statistical analysis of finite mixture distrihuti

where

HI inns

J(z. 0) = /% log P{x\0)~ to ^

% _P(x\z)_

p(x\0)dx,

is the Kullback Leibler directed divergence between p(x\0) and p(x\z): (a) and (b) follow immediately since it is well known that J(z, 0) ^ 0, with equality if and only

if z = 0.

Suppose we now assume that

A I: 0e( - M. M) for some known M > 0, such that ( - M Af)n0' * 0-A2:ze(-A?A/)n0'; ’

A3:sup?[}-a(z, A')|0] < oo, 0e0';

A4: G(z) is positive and bounded, with a bounded first derivative, and define

G(z, 0) = - G(z)g(z, 0) (6 3 «

and

?(z, x) = G(z)[y(z, .v) - g(z, 0)1 (63.6)

We can then establish the following properties.

Lemma 6.3.2

If assumptions A1 to A4 are satisfied, the quantities defined by (6.3.5) and (6.3.6) satisiy.

(a) 1/(0,0) = 0 and (z - 9)U(z, 0) < 0 for all z * 0;

(b) I C’(z,0)| < *|z - 0|, for some k > 0, and inf | G(z, 0)\ > 0, for r, < \z - 0| < g~'

and for all g > 0 such that assumption A2 is satisfied.

(c) U(z. 0) = oc(z - 0) + o(|z - 0|), for some a < 0-Id) (i) sup.?[?2(2,2f)|z] < oo,

(ii) lim:.0?[/?2(z, JQ|z] = S(0) < oo;

(e) given z, ?(z,*,), i = 1,2 are identically distributed.

r°"0WS 'mmediale|y from (6 3.5) and Lemma 6.3.1(b).

I o establish (b). we use the Taylor expansion

U(z, 0) = 1/(0,0) + (z - 0)U'(z*, 0) = (z - 0)U'(z*, 0),

'ZmTlo* evnih bfTCn* and WherC U’{z*'0) is the derivative of U(z, 0) with IS uniformlv ho, UdH^H TIZ ’^ n°lC that °Ur assumPli°ns ensure that | U'(z*, 0)|

the proof of Lemma 6.3.1° ^ °f ^ f°UOWS fr°m the remarks made in

For (c), we use the expansion

V(z,0) = - [(Z _ ())y'(0,0) + o(|z - 0|)][G(0) + 0(|z - 0|)]

= [ - G(())g'(0,0)](z - 0) + o(\z - 0|),

Sequential problems ami procedures iqj

and note that

- 0'(0,0) = -

d2

~?)Q2 Og Pi* 10)

p(x 10) d.v = - /(0),

where [l(0)Y 1 >0 is the Cramer-Rao lower bound for a single observation from p(x\6). The choice a = - G(0)/(0) < 0 satisfies (c).

To establish (d), we note that Ely(z, Xfiz] =0, and so

E[R‘(z,X)\z] = G2(z){E[y2{z,X)\z-]+cj2(z,0)}, which is bounded by virtue of assumptions A1 to A4. We note that

E[y2{z,X)\z-\ = E

d )2

-logp(X\z)\ \z

and hence, taking the limit as z tends to 0 in the above, we see from the expression for — g'(0,0) that

lim E[R2{z, X)\z~\ = G2(0)l{0).

z~*0

The choice S(0) = G2(0)I(0) < go satisfies (d).

The final property, (e), follows straightforwardly from the assumed independence, given z, of the Xhi= 1,2...........

The following lemma will be used to establish the asymptotic properties of

(6.3.1).

Lemma 6.3.3

Suppose that the conditions of Lemma 6.3.2 are satisfied and that |a| > L Then, for the recursion defined by (6.3.1), n1 2(0n — 0) is asymptotically normally distributed with zero mean and variance

V = S(0)(2|a| — 1 )

-1

Proof: This follows from an application of Theorem 6.2.2.

The main result is now the following.

Theorem 6.3.1

If in the recursion defined by (6.3.1) G(0n) > [2/((?„)] \ and assumptions A1 to A4 are satisfied, then nl,2(0„ - 9) is asymptotically normally distributed, with

zero mean and variance

GW(g)_ (6.3,7)

[2C(0)/(0)-l]

Proof: Truncation does not, of course, affect convergence properties (see. for example, Davisson, 1970) and the result follows immediately from Lemmas 6.3.2

and 6.3.3.

196

Statistical analysis of finite mixture distributions

Corollary 6.3.1

w

A fully asymptotically efficient procedure corresponds to the choice G(z) = [/(")]"'

Proof : With this choice, (6.3.7) reduces to Vopt = [/(0)] '.

Corollary 6.3.2

Given AI to A4, the relative asymptotic efficiency of 0n with constant gain G(r) = c > [27(z)] “1 is given by

We shall now use these properties of the general recursion (6.3.1) to examine a number of specific important practical case B problems and some proposed sequential estimation procedures.

6.3.2 I nsupervised learning for signal versus noise

We shall consider the case of a sequence of observations, x,, x2,...,x„, each of which is either a signal, assumed to have a normal distribution with unit variance and unknown mean 0, or noise, assumed to have a normal distribution with unit variance and zero mean. The corresponding normal densities will be denoted by /,(*10) and f2(x\0) = f2(x), respectively. The a priori probabilities of signal and noise will be assumed constant and known, and are denoted by 7r, and n2 ( = 1 - 7tj). Given 0, uj, n2, observations will be assumed independent, with common mixture density

**76**> 77 78 79 80 81 82 .. 103 >> Next