# Statistical analysis of mixture distribution - Smith A.F.M

ISBN 0-470-90763-4

**Download**(direct link)

**:**

**20**> 21 22 23 24 25 26 .. 103 >> Next

for pc = 0, ?, The case pc = 0 corresponds to M0 and the inverses

of the percentages in that column give the M0 equivalent of a single fully categorized observation, as defined in (3.2.3). The final row of the table gives the efficiency of the simple estimator of n obtained from the ‘complete part of the M2 data. This value is, of course, pc, and allow us to judge the extra worth of

the uncategorized data.

The unbalanced (ti = 0.1) mixture produces the lower efficiencies and, if pc

and A are small, the uncategorized data do not contribute very much information. In the worst case considered (pc = 0, A = 0.4), one fully categorized observation gives as much information as 1/0.015 = 67 uncategorized ones.

Example 3.2.3 Mixture of two univariate normals

p(x \ if/) = 7lx(f)(x\Pi, ai) + (1 — tli)(pix |P2’ ^2)* which we shall, for notational convenience, abbreviate to

njfx) + n2f1(x).

4^ Statistical analysis of finite mixture distributions

If we define if/ = {px,p2, 71J, O2, ^Cn

If if/) = diag[^1/<T?,rr2/o’2»(7t,7r2)_ \2nl/o*,2n2/o2],

so that, for example,

/1«91) = diag(l/(T?,0,0,2/at,0).

If we further define

v(x) = {(x - px)/o2xfx - fi2)lo\,(nxn2) l,[(x — fii) — o\]/2ox,

[(x - p2)2 - c\-]l2a\Y

and define

IM) =

00

ó(õ)\ò(õ)ÿ³ß2[/1(õ)/2(õ)/ð(õ| Ô)] dx, (3.2.6)

“ 00

then

Note that (3.2.4) is just the special form which this takes for Example 3.2.1. Note also that, as is to be expected, ²ñ(ô) - 10(ô) is non-negative definite. As usual, numerical integration is required in (3.2.6). Behboodian (1972a) outlines several procedures and provides a set of tables from which I0(tJ/) can be calculated, approximately, for a wide range of values of the parameters. The tables, given in terms of standardized parameters ÿ,, ?>[ = \p2 — px |/2 yJ(oxo2)’\, and ri = ox/o2), provide a standardized information matrix, from which we

may calculate = WJ0(iJ/)W, where W= diagfaj- \ a2 \ 1, 2, of2}.

Information matrices for two special cases can be obtained easily from I0{tf/), or from Íô) of (3.2.1):

(a) ö, ô fi2, <71 = o2 = a. Add the last two rows of ²0(ô) and then the last two columns.

(b) öõ = fi2 = ö, oj Ô o2. Replace ‘last’ by ‘first’ in (a).

For case (a), Tan and Chang (1972a) tabulate l0{tj/) over a range of values of nv (between 0 and 0.5), cr, ö, and d, where ö = \{öõ + ö2) and d = \(öõ— ö2).

Detailed comparison of data structures MO, Ml, and M2 can only be carried out numerically. Unless only one parameter is of interest, it is a matter of comparing the ‘sizes' of non-negative definite matrices, usually in terms of a real-valued function of the eigenvalues. Hosmer and Dick (1977) use the criteria of trace and determinant of the inverses of information matrices, having the interpretations, respectively, of total and generalized variances. A range of two-component normal mixtures was considered and, qualitatively, the results obtained from the two criteria were roughly the same. Table 3.2.2 displays some of their results with the total variance criterion. The values quoted are the asymptotic efficiencies of M1 and M2, relative to MO. Of the Ml schemes they consider, the results in Table 3.2.2 correspond to equal sample sizes for the fully categorized data (px =p2 = ^\ — p0)). For Ml and M2, p0 was chosen from

Mathematical aspects of mixtures

Table 3.2.2 Asymptotic relative efficiency of M1 and M2 to MO using the total variance criterion. Reproduced by permission of Gordon and Breach, Science Publishers, Inc., from Hosmer and

Dick (1977)

Pi Po = 0.1 = 0.3 = 0.5

Ml M2 Ml M2 Ml M2

0.9 33.3 13.2 4.7 5.6 2.8 3.9

0.7 50.5 28.4 8.2 9.1 4.7 5.8

1 0.5 73.1 41.3 9.9 11.6 6.0 7.1

0.3 74.6 52.4 10.6 13.4 6.7 8.1

0.1 58.9 62.1 8.8 14.9 6.0 8.9

0.9 6.8 4.2 2.5 3.0 2.6 3.0

0.7 11.0 6.9 4.0 4.6 4.3 4.7

3 0.5 13.5 8.7 5.0 5.6 5.5 5.8

0.3 15.1 10.1 5.0 6.4 6.3 6.6

0.1 15.5 11.3 5.7 7.0 6.5 7.3

0.9 1.8 1.2 1.1 1.1 1.1 1.1

0.7 2.6 1.5 1.3 1.3 1.3 1.3

5 0.5 2.9 1.7 1.4 1.5 1.5 1.5

0.3 3.0 1.9 1.5 1.6 1.6 1.6

0.1 3.0 2.0 1.4 1.7 1.6 1.7

0.1 (0.2) 0.9; (p,, a j,er2) was fixed at (0,1,1.5); the values chosen for rr, were 0.1,0.3, and 0.5; and the values chosen for p2 were 1, 3, and 5.

Since the trace criterion is used, these figures give a rough but direct indication of the M0 sample sizes equivalent to one Ml or M2 observation. For small ttj, p2, and p0, these can clearly be very large. Given (it,, p2»Po)> the monotonic trend, in p0, of the M2 values is inevitable. Since the fully categorized part of an M1 sample tells us nothing about 7r,, there tends to be a fall-off in efficiency as pQ gets small. This is even more dramatic with the generalized-variance criterion. Note that Ml is sometimes better than M2 and sometimes not. For small 7r j and p2 even 10 per cent fully categorized data clearly adds a tremendous amount of information. Hosmer and Dick (1977) also show that, if it is possible to obtain fully categorized data at all, then it is best to use M1 with all the fully categorized part of the data taken from the component density with the smaller proportion, nx, if n1 ^ 0.3, p0 ^ 0.7, and

**20**> 21 22 23 24 25 26 .. 103 >> Next