Download (direct link):
CHAPTER 6 SEQUENTIAL PROBLEMS AND PROCEDURES 176
6.1 Introduction to unsupervised learning problems 176
6.1.1 The problem and its Bayesian solution 176
6.1.2 Computational constraints and the need for approximations 179
6.2 Approximate solutions: unknown mixing parameters 179
6.2.1 The two-class problem: Bayesian and related procedures 179
6.2.2 The two-class problem: a maximum likelihood related procedure 183
6.2.3 Asymptotic and finite-sample comparisons of the quasi-Bayes and Kazakos procedures 184
6.2.4 The k-class problem: a quasi-Bayes procedure 189
6.3 Approximate solutions: unknown component distribution parameters 193
6.3.1 A general recursive procedure for a one-parameter mixture 193
6.3.2 Unsupervised learning for signal versus noise 196
6.3.3 A quasi-Bayes sequential procedure for the contaminated normal distribution 199
6.3.4 A quasi-Bayes sequential procedure for bipolar signal detection and related problems 201
6.3.5 Problems with several unknown parameters 203
6.4 Approximate solutions: unknown mixing and component parameters 203
6.4.1 A review of some pragmatic approaches 203
6.4.2 A general recursion for parameter estimation using incomplete data 205
6.4.3 Illustrations of the general recursion 208
6.4.4 Connections with the EM algorithm 210
6.5 Approximate solutions: dynamic linear models 212
6.5.1 Dynamic linear models and finite mixture Kalman filters 212
6.5.2 An outline of suggested approximation procedures 214
Finite mixture distributions have been used as models throughout the history of modern statistics. We are currently on the point of celebrating the centenary of Newcomb’s (1886) application of normal mixtures as models for outliers and Pearson’s (1894) classic paper on the decomposition of normal mixtures by the method of moments will soon similarly come of age. The ensuing century has revealed a multitude of fields of application which exemplify features that demand the use of mixture models: measurements are available from experimental units which are known to belong to one of a set of classes, but whose individual class-memberships are unavailable. Typical examples come from fisheries research (fish lengths are provided but their sexual identities are not), sedimentology (the grain size distribution of a sample of sand is known but its constitution in terms of different minerals is not) and medical diagnosis (clinical measurements are available for a set of patients whose disease classifications, however, are not).
Statistical analysis of such data has proved not to be straightforward, for two main reasons. Firstly, explicit formulae generally do not exist for estimators of the various parameters, so that numerical methods are required. This in itself discouraged the treatment of any but the simplest problems before the age of computers. Secondly, theoretical difficulties which arise in certain aspects of the statistical analysis reveal some common mixture problems to be ‘non-standard’. As a result, detailed investigation of the analysis of finite mixture problems offers more than just a catalogue of straightforward applications of standard methods to a particular class of statistical models: our statistical approach to sand-sifting will indeed reveal a few special nuggets.
The monograph offers a systematic treatment of the structure of finite mixture distributions, an account of the wide range of applications, a detailed description of the attempts to apply various statistical methodologies to the analysis of data from mixture distributions, and a large, up-to-date bibliography. A special feature is the final chapter, where methods are described for accommodating data sequentially and where the connection is made with the engineering literature on problems involving ‘unsupervized learning’.
The hook should be of interest to research workers in statistics and to engineers involved in pattern recognition, as well as to investigators in the many fields of application Although the monograph has not been designed as a textbook, the material could form the basis of a specialized postgraduate course in finite mixtures.
Thanks are due to Professor P. D. M. Macdonald and Professor A. S. Paulson, who contributed valuable information and advice at an early stage in the writing of the book Grateful acknowledgement is made in the text for permission to use material published elsewhere, and deep appreciation is expressed to Miss E M Nisbet, Mrs M. F. Smith and Miss K. Glowczewska for their excellent work in preparing the typescript.
1.1 BASIC DEFINITIONS AND CONCEPTS
- Suppose that a random variable or vector, X, takes values in a sample space, 3C% and that its distribution can be represented by a probability density function (or mass function* in the case of discrete X) of the form
p(x) = 7T,/i(x) + ? • • + njk(x) (xe3T), (1.1.1)
7ij > 0, y=l,...,k; 7r, + • •• + nk = 1