# Statistical analysis of mixture distribution - Smith A.F.M

ISBN 0-470-90763-4

**Download**(direct link)

**:**

**13**> 14 15 16 17 18 19 .. 103 >> Next

Example 2.2.4 Cluster analysis and latent structure models

The objective of cluster analysis (Gordon, 1981) is to construct a classification of a set of n observations, possibly vectors, into ‘interesting’ subsets. In particular, we may try to discover whether there are a certain number of well-defined groups

Statistical analysis of finite mixture distributions

2b

or to derive ‘optimal groupings of the observations into a specified number of clusters One approach for the case of k clusters is to assume a mixture model, with k components, for the overall distribution of the data. We then try to estimate the mixture density and perhaps to assign observations to component densities. We might also wish to assess the suitability of the assumption of k

components.

Seminal papers, using multivariate normal components, include Wolfe (1970), Scott and Symons (1971), Marriott (1975), and Symons (1981). A general discussion of the approach appears in Everitt (1980). Binder (1978a) applies Bayesian methods to the problem (see also Sections 4.3 and 4.4).

The problem of deciding upon the number of components (clusters) involved is typically very difficult, and we shall review in detail, in Chapter 5, some of

the proposals that have been made.

A closely related application of finite mixture distributions occurs in latent structure analysis, for which there is a large literature, particularly in publications devoted to applications of statistics in the social sciences. Useful recent references are Goodman (1974) and Fielding (1977). For some k, the density of the random variable or vector of interest is assumed to have the form

p(x | tl/) = ? 7i}fj{x 10j) (x e #*),

;=i

where the mixture used usually has the special feature that the components themselves are independence models. Thus, if x has d components, x(1,,...,x(d\

//* lfy)= n fit*'"’ j=l,...,k. (2.2.3)

1= 1

The model therefore assumes that there are k hypothetical latent classes, which may or may not have real physical meaning, and that all correlations among the components of x are caused by ignorance about the identity of the latent class. The statistical problem is to discover how many latent classes are required to explain the covariance structure observed in the data. There is an obvious resemblance, in spirit, to factor analysis and, indeed, latent structure analysis has most frequently been used as an analogue of factor analysis for categorical data (see, also, Bartholomew, 1980). As in factor analysis, some physical interpretation is often sought for the latent classes.

An interesting example is given by Skene (1978), who extends the method to deal simultaneously with m sets of data, each from a different group of patients. He assumes that the component densities are the same for all groups but that the sets of mixing weights may differ from group to group. As an example, Skene 11978) considers patients who form two groups, those who die or survive, subsequent to suffering severe head injury. Three latent classes were found to be adequate in this case. One of them was strongly associated with death, another with survival, and the third was mildly associated with both. The need for this third class reflects the fact that very clear diagnosis is not always possible between

Applications of finite mixture models

27

these two groups. Another illustration is given in Dawid and Skene (1979). Skene (1980) notes that missing data are fairly easily dealt with, given the independence model (2.2.3).

Example 2.2.5 Modelling prior densities

(^A feature of many Bayesian statistical analyses is the adoption of a prior density from a so-called conjugate family: (see, for example, DeGroot, 1970). Suppose 0(eQ) is the parameter of interest and that the experiment yields data x. Inference then proceeds on the basis of Bayes’ theorem,

7i(0|x) oc p{x | 0)7r(0) (0eQ),

where 7t(? | x) denotes the posterior density and n( ) the prior. In many situations— e.g. when p(x\0) belongs to the exponential family if tt(-) is chosen from an appropriate family of distributions, then typically tt(-|x) also belongs to that family, with the prior-to-posterior transformation simply described in terms of the sufficient statistics. For instance, if x represents the results of a sequence of independent Bernoulli trials, with success probability 0, it is easily seen that the beta densities form such a (conjugate) family. However, an obvious extension is possible, in that, if n( ) is chosen from the class of /c-component mixtures of betas, then 7t(? |x) also belongs to that class. A far richer class of priors can therefore be considered without any real increase in the difficulty of the prior-to-posterior calculations. Diaconis and Ylvisaker (1985) contrast coin tossing with coin spinning and argue that, if 0 is the probability of‘head’, then a beta prior centred on might be suitable for tossed coins, whereas spun coins tend to give ‘head’ with relative frequency either near 3 or near §, so that a much more suitable prior can be chosen from the class of two-component beta mixtures. The posterior density would also be such a mixture, most likely reinforcing one or other of the ‘modes’. In similar spirit to remarks made in Example 2.2.3., Diaconis and Ylvisaker (1985) comment that any prior density on (0,1) can be arbitrarily well approximated by a finite mixture of betas, and they generalize the discussion to exponential family distributions (see also Dalai and Hall, 1983, and other papers referenced by Diaconis and Ylvisaker, 1985).

**13**> 14 15 16 17 18 19 .. 103 >> Next