Download (direct link):
Generally, all the information contained in the spectra can be used for the modeling; these are called full-spectrum methods. However, in many cases, a preprocessing of the experimental spectra using WT compression can offer some advantages compared to the full-spectrum methods.
A combined procedures of WT compression and PLS is illustrated in Figure 5.62, including the following steps:
1. The measured signals, denoted by X, such as spectra, are transformed into wavelet domain represented by wavelet coefficients, W.
2. The matrix W is sorted according to their contribution to the data variance and a matrix Wsorted can be obtained. Because many wavelet coefficients in W or Wsorted are usually very small, only a limited number of columns of Wsorted are needed to represent the signal X. Therefore, the Wsorted can be divided into two submatrices, Ws and Wn, containing significant (information component) and insignificant (noisy component) coefficients, respectively. This step can be skipped in many cases because the sorting will change the relative position of the coefficients and, subsequently, cause a variation of the original information.
application of wavelet transform in chemistry
Ws W rt n
y _ W s W n Bs Bn
mxl mxn b
n xl H
y = W s b
mxl m xn'
Figure 5.62. A diagram showing the procedures of the PLS coupled with WT compression.
3. The submatrix Ws can be determined by different criteria:
3a. We may simply use the criteria discussed in the WT compression, but this means that only the advantage of WT compression is utilized.
3b. Other methods can also be employed for this purpose, such as the relevant component extraction (RCE) PLS approach described in Walczak’s book, Wavelets in Chemistry . In this method, as illustrated in Figure 5.62, the PLS is employed to calculate the b coefficients. A matrix of the regression coefficients can be obtained by using the ‘‘leave one out’’ cross-validation procedure. Then, the stability of the regression coefficient i, defined by
„ , , meanb)
can be calculated. Using the maximal stability of the noisy variables as a threshold,
Threshold = max (abs(stabilitynoise)) (5.82)
we can cut off those coefficients in b and the corresponding wavelet coefficients in the W or Wsorted.
4. With the submatrix Ws, build the PLS model from the training set.
5. Finally, we can use the model for prediction. It should be noted that the experimental data must be processed in the same way as that of the training set used to build the model. When you use step 3a for compression, you must compress the experimental data with the same criteria. When you use step 3b for determination of the Ws, you should keep those coefficients at the same position.
This method has been successfully used in the analysis of NIR spectra of gasoline samples. Examples can be found in the book Wavelets in Chemistry .
5.5.2. Combined Method for Classification and Pattern Recognition
Generally, pattern recognition refers to the ability to assign an object to one of several possible categories according to the values of some measured parameters, and the classification is one of the principal goals of pattern recognition. Many methods have been proposed for classification and pattern recognition because of their importance in chemical studies. Combined methods of WT for classification and pattern recognition include two main steps: (1) compression or feature selection is performed to the original dataset using WT as a preprocessing technique; then (2) classification or pattern recognition is performed by classifiers such as the artificial neural network (ANN), the soft independent modeling of class analogy (SIMCA), and the kth nearest neighbors (KNN), in the wavelet domain.
There have been several successful examples based on the combined method of WT and conventional classifiers for classification of analytical signals. One of them is reported by Bos and Vrielink in Chemo-metrics and Intelligent Laboratory Systems, [23:115-122, (1994)]. In their report, identification of mono- and disubstituted benzenes utilizing WT and several classifiers from IR spectra was studied. The aim of their work is to show whether the localization property of WT in both position and scale can be used to extract this information into
application of wavelet transform in chemistry
a concentrated form to obtain the salient features of an IR spectrum effectively. The coefficients obtained from the WT treatment of the IR spectrum were employed as inputs for an identification process that is based on the linear or nonlinear neural network classifiers. Using the concentrated form instead of the full spectra, the time to develop the classifiers is greatly reduced. Moreover, it is expected that the quality of the classifiers will improve if they are derived from smaller datasets that contain all the relevant information. From their study, it is concluded that WT coupled with Daubechies wavelet functions is a feature extracting method that can successfully reduce IR spectral data by more than 20-fold with a significant improvement in the classification process.