# Statistical analysis of mixture distribution - Smith A.F.M

ISBN 0-470-90763-4

**Download**(direct link)

**:**

**47**> 48 49 50 51 52 53 .. 103 >> Next

At the opposite end of the scale of difficulty is the estimation of mixing weights using quadratic distance measures. The simplicity of the measures, in conjunction with the linearity, in /r, of p(x| i/Oand F(x\tJ/) accounts for the comparative ease of solution, as indicated in the next subsection.

4.5.2 Estimation of mixing weights based on quadratic distances

Example 4.5.1 Mixtures of known component densities

As an example, consider the distance measure adopted by Choi and Bulgren

(1968):

n

i*LA[Fj-).F(-i^)] = »-' I ¹, m-w

1=1

k

-»-I

n

i = 1

X njFJLxd - i/n

j= i

(4.5.3)

There are two general ways of writing (4.5.3):

(a) n'l(Brr-d)TV(Bn-A), (4-5-4>

, | x Statistical analysis of finite mixture distributions

where, without loss of generality, x, s$ s$.v„,

Bij — h j(Xj), i= 1,..., n ,j 1,..., kt

V= the n x n identity matrix, and di = i/th i= 1,..., w.

(b) n~ x(n] An — 2n'b + c), (4.5.5)

where A = BT VB, b = BTVd. and c = dJVd.

It can be shown that A is positive definite (Hardy, Littlewood, and Polya, 1952, p. 16). so that the unique unconstrained minimum occurs at

ft= A ~ ‘b.

The resulting function ?;=i UjF/x) (xef) will, subject to sufficient regularity conditions, provide a (consistent, as n -* oo) approximation to F{ • | if/), but it may not itself, for any given sample, be a distribution function.

To guarantee the latter, we need to minimize (4.5.4) subject to

7T, H + Kh = 1,

71, ^0,...,nk ^0. (4.5.6)

This represent a quadratic programming problem and terminating simplex algorithms are available for its solution (see, for instance, Walsh, 1975, Chapter 2). If we are prepared to compromise and risk the possibility of negative 7r/s, while insisting on (4.5.6), we may again write down the solution explicitly as

ft= A~xb + A~x\(\ — iTA ~ 'b)/lT/4 -11, (4.5.7)

where 11 is a vector of ones and (4.5.6) has been incorporated using a Lagrange multiplier.

This solution does represent a constrained minimum (see, for instance, Fletcher, 1971, Macdonald, 1975, and Macdonald and Pitcher, 1979). Alternatively. nk may be eliminated using (4.5.6) and the resulting quadratic function of (7r,,...,7it_ ,)=»/' minimized, without further constraints, at

fl=A*~lb*,

where = j,l=

I

bl = ZIFfa) - Fk(*.)] [»/« - FM],

I

It is easy to check that this solution is equivalent to (4.5.7).

Several other distance measures lead to similar quadratic criteria. For instance, <Wh(^ n> F) can be written in the form

dwLtiik n> F) — see Bartlett and Macdonald (1968).

\dFn(x) - dF(x | )] d H(x)

Learning about the parameters of a mixture ( ^

Although, as remarked by Macdonald (1975), the right-hand side is infinite if HI ) is differentiable, formal minimization shows that i/r would satisfy the vector

equation

-iv t \ dP(xt\tp) " L *(*/) —

i=i

where w(x) = [d//(x)/dx]

In our example, with tJ/= 17, we have

dp (x | tp)

"’w w dx' <4-5 8>

dplx\n) r/ x

—— =//(*) ~/fc(x), j = 1,..., k — 1, for all x.

Otlj

Equations (4.5.8) may therefore be written

T = J»7 + R,

where

n

Tj = n 1 ? w(x,) [//x,) -fk(xt)l j = 1,.,., k- 1,

1 = 1

C//x) -A(x)] C//(x) - A(X)]w(x)dx, j, I = 1,..., k - 1,

and

/?,=

Ifjix) - A(x)]/k(x)vv(x) dx, 7 = 1....k- 1.

Thus

1) = ‘(T — R).

It is easily shown (Macdonald, 1975) that f) is unbiased and

cov(^) = n~ l[J~1HJ 1 — (1/ + J 1 R)(i| + J ’R)1],

where

[//*)-A(x)] c/,(x) - A(x)]w2(x)p(x| r/)dx, ;, / = 1 k - 1

The discrete version of this distance measure is better defined in the form

where pi(\p)= njftfOj), i— 1,2,...,

J= 1

say. If only it is unknown, the criterion takes the form (4.5.4) with

Bjj — fjj, i= 1,2= 1,.. •, k,

V= diag(w„w2,...)

and

(I j= r 1= 1,2,...,

see Macdonald, 1975.

120

Statistical analysis of finite mixture distributions

In practice, the sample space will be made finite, possibly as a result of grouping, before evaluating relative frequencies and weights.

The attraction of being able to obtain explicit results might lead one in some applications to prefer the modified chi-squared criterion <5MC(p(/r), r) to the chi-squared criterion <5c(p(/r), r), particularly in view of the asymptotic equivalence of the two resulting estimators (Rao, 1965, Chapter 5).

Whereas the distance measures discussed above led to quadratic programming problems, if we use the sup-norm <5s(v) associated with the Kolmogorov-Smirnov test we are led to a linear programming problem.

Suppose the sample space, ,T, is continuous. Then, for any n, supJF(x|/r)-F„(x)j is attained at one of the data points x„. As before, we assume

x, < • • ^ xn. Our objective is, therefore, to minimize n0 subject to

? KjF/x,) - ifn

j= i

i =

X */(*;) -(*- 1)/«

j= i

^5 rr0,

/ = I,...,«,

7Tj H + 7rt=l, 7T0, 7t j,.,., 7lk ^ 0.

Since all the above constraints can be written as ordinary inequalities, linear

in 7i0, n the linear programming interpretation is clear. Using a

generalization of this, involving a sequence of such linear programmes, Deely and Kruse (1968) developed a method for estimating a general mixing distribution. Application of their approach is extended by Blum and Susarla (1977).

**47**> 48 49 50 51 52 53 .. 103 >> Next