# Statistical analysis of mixture distribution - Smith A.F.M

ISBN 0-470-90763-4

**Download**(direct link)

**:**

**27**> 28 29 30 31 32 33 .. 103 >> Next

F(x) = 0.21x, 0 ^ x < 0.79,

= 0.45x — 0.19, 0.79 ^x< 2.25,

= 0.25x + 0.25, 2.25 ^ x < 3.0

= 0 . otherwise.

Of course, as we indicated in Section 3.1, we run into identifiability problems with mixtures of uniforms and the above F(x) could equally well represent the mixture

0.17 x Un(0,0.79) + 0.64 x Un(0.79,2.25) + 0.19 x Un(2.25,3).

4.1.3 Methods for mixtures of discrete and multivariate distributions

In the discrete case, the following plots could be used to indicate deviation from a pure component distribution.

(a) Binomial

Up(x) = (Nx V(1 -0)/v-x,x = O,...,N, then

= x = 0.....N- 1.

p(x) N-x 1-0

An appropriate ‘null’ plot, therefore, is that of p(x + 1 )/p(x) against (x + 1)/

{N - x).

(b) Poisson

If p(x) = e " °Qx/x\, x = 0,1,..., then

p(x + 1 )/p(x) = 6/(x + 1), x = 0,1,...

One could plot p(x + l)/p(x) against (x + 1) 1 or p(x)/p(x + 1) against x.

Such plots could be used to decompose mixtures of binomials and mixtures of Poissons, although there will be the problem of lack of sensitivity unless

there are quite a number of large frequencies.

In the multivariate case, the search for graphical methods is more awkward and we shall limit ourselves to the normal case. In principle, we can simply extend the idea of using a plot or plots and detecting suitable deviations Irom a null pattern that corresponds to multivariate normality. However, some care

68

Statistical analysis of finite mixture distributions

*Z

Figure 4.1.10 Sketch contours (not to scale) of a mixture of bivariate normals which has normal marginals:

p(\)= [4tt ^(l -p2)]' ‘[exp( -*xTir lx) + exp( -ixrI2 *x)]

( 1 (-iyip

j V(-iriP 1

where

7=1,2

may be necessary. In particular, it is not sufficient to check univariate normality for each variable in order to be able to reject the possibility of a mixture. For instance, a ‘cross-shaped’ mixture of two bivariate normals with the same mean vector, the same marginal variances, and correlation coefficients equal in magnitude but opposite in sign, yields normal marginals (Figure 4.1.10). Some useful plots pertaining to multivariate normality are described by Gnanadesikan (1977) and Everitt (1978). Everitt and Hand (1981) suggest using the chi-squared probability plot of the generalized distances of the observations from the sample mean vector. With mixtures, S-shaped deviations from the null linear plot should be apparent.

A potentially more versatile plot is the Andrews curve (Andrews, 1972; Gnanadesikan, 1977). Each observation is depicted as a curve which, for a d-variate observation x, is

./*(0 = j2Xi + *2s*n 1 + *3cosl + x4sin2r + •• • to d terms ( — n < t < n).

Learning about the parameters of a mixture ^

It is helpful if the variables Xjf Xj,...fxd are arranged in decreasing order of informativeness and sometimes a preliminary transformation to principle component scores is made.

One interpretation of a set of Andrews curves from a data set is as an infinity (as t varies) of sets of univariate projections of the data. The intersection of the set of Andrews curves with t = t0 gives a set of realizations of the linear combination

where

h(f0)Tx,

h(0 = ( -Lr> sin f,cost,sin2r,...^ .

>/2

An informal test of multivariate normality is to assess univariate normality simultaneously for all t. (Since the form of h(r) obviously restricts the scope of

Figure 4.1.11 Quantile contour plots from Andrews curves of 1000 multivariate normal observations, standardized and compared with true percentiles. 0(0. 5) -

0.675; 0(0.95)= 1.645

70 Statistical analysis of finite mixture distributions

the linear combinations considered it is not a secure test of multivariate normality, but it should certainly be better than just looking at the d univariate marginal plots.) With a large set of data the conglomeration of Andrews curves looks impossible to disentangle and Gnanadesikan (1977) outlines a workable procedure in the quantile contour plot. Here the values of a few sample percentiles are evaluated for a large number of r values, giving, say, five contour curves. A useful refinement is first to standardize the corresponding univariate data, for each t. If the original data were multivariate normal, the resulting standardized quantile contours should be roughly horizontal straight-line plots at levels indicated by standard normal quantile. Figure 4.1.11 shows the resulting plot from a set of 1000 independent five-dimensional normal random vectors corresponding to percentiles 5, 25, 50, 75, and 95.

Deviation from this null plot indicates deviation from multivariate normality. If there is an underlying normal mixture then, for at least some r values, the univariate projection should show a univariate normal mixture clearly, particularly if the component densities differ in location. Use of about five contours as in Figure 4.1.11 should be enough to detect systematic deviation from multivariate normality but, to show evidence of a normal mixture, the number of contours will have to be increased. In Figure 4.1.12, nineteen contours,

**27**> 28 29 30 31 32 33 .. 103 >> Next