Download (direct link):
Statistical analysis of finite mixture distributions
Scale reading, mm
Figure 2.1.4 Electrophoresis diagram of anti-egg albumin rabbit serum, showing peaks corresponding to albumin, a-, [i-, and y-globulin. Reproduced fromTiseliusand Rabat, The Journal of Experimental Medicine, 1939,69,127, by copyright permission of the Rockefeller University Press
In this example, the problem is that of decomposing a given curve into its components and mixing weights. More detailed discussion of methods for doing this is given in Section 4.7. Similar problems occur with diffusion patterns, results from ultracentrifuges, chromatographic scannings, and absorption spectroscopy (see Noble, Hayes, and Eden 1959, and Fraser and Suzuki, 1966).
Example 2.1.4 Switching regressions
Goldfeld and Quandt (1973) discuss the following model for a housing market in disequilibrium. In any month, the relationship between explanatory variables, x, and the number of houses whose construction is started, y, takes the form
y = xT0 + ?,
where the first component of x is a dummy variable taking the value 1 and ? is a random variable with mean zero and variance a2.
It is thought that 0 takes one of two unknown values 0, and 02, depending on w hether the market is in a supply phase or a demand phase. If we regard the phase
Applications of finite mixture models
in operation in any given month to be the underlying categorical variable, we have a two-component mixture model. In practice it may not be clear which phase is in operation at any given month. We might then consider the phases in different months to be independent, which gives the basic mixture model or, perhaps more realistically, we might consider a model incorporating serial dependence from month to month. One such is the Markov model discussed in Example 4.3.10. A model incorporating even more complicated dependence or a model involving change points might also be considered.
The switching regression model has been proposed for a variety of problems in economics, as indicated in Table 2.1.3.
Example 2.1.5 Medical diagnosis and prognosis
As we shall see, much of the literature on mixtures is restricted, at least so far as the practical details are concerned, to the univariate case. However, the data available on each observational unit are often multivariate. In particular this is usually the case with medical data, where the mixture data come from patients whose disease category is unconfirmed but on whom several, and often very many, clinical tests have been made.
This is the case, for example, with a set of patients suffering from the hypertensive condition known as Conn’s syndrome. Further subclassification of the patients can be made into those for whom the major cause is an adenoma, which is a benign tumour, and those for whom this is not the case. However, this part of the diagnosis, which is critical as far as choice of treatment is concerned, is non-trivial medically, so that there are often cases of Conn’s syndrome in the databank which are uncategorized. The type of data available on Conn’s patients is indicated in Table 1.6 of Aitchison and Dunsmore(1975). An important feature is the high dimensionality. Multivariate normality is assumed for the logarithms of the data and an approximate procedure for incorporation of unconfirmed cases is discussed by Titterington (1976); see also Section 5.7 and Example 6.1.3. For similar examples, see Skene (1978), Makov (1980a), and Table 2.1.3.
Example 2.1.6 Tracking in a multitarget environment
The process of tracking a target in a multitarget environment involves the reception of signals over noisy channels. The nature of each particular observation is therefore uncertain and could be any one of the following: (a) noise alone (cut-off communication link); (b) false alarm (thermal or process noise, or clutter from a target not being tracked); or (c) the target actually being tracked (see Chang and Srinath, 1976, and a review paper by Bar-Shalom, 1978).
Sources (a) and (b) are often both referred to as ‘noise’ and source (c) as ‘signal’—hence the terminology ‘single versus noise' commonly discussed in engineering literature. Typically, such problems require the sequential processing of observations. Bayesian and non-Bayesian approaches to such problems are discussed in Section 6.3.2.
Statistical analysis of finite mixture distributions
Example 2.1.7 Remote sensing
Artificial satellites arc used to generate data from which estimates can be made of the relative acreages allocated to various crops. Energy is recorded in four spectral wave bands for each square pixel of ground observed from the satellite. Usually, areas of categorized pixels, covered by known crops, are used to provide estimates of the underlying component densities, which can then be used as a basis for estimating the mixing weights associated with the set of unknown pixels. These weights give the required relative acreages (Tubbs and Coberly, 1976). Sometimes the objective is that of‘image segmentation', which seeks to identify areas of contiguous pixels that are devoted to the same crop. This pattern-recogmtion activity is strongly related to cluster analysis, and mixture models are often used in this context (see Example 2.2.4, Section 4.3.4, and Sclove, 1983, who discusses these problems in the context of Landsat data). Issue 12 of Volume A5 of Communications in Statistics is largely devoted to remote sensing; see also Nagy (1972) and Example 6.1.1 later.