Feature selection and reduction

One can — and sometimes must — reduce the number of features. One way is to combine the original variables in a smaller number of latent variables such as principal components or PLS functions. This is called feature reduction. [Pg.236]

The combination of PCA and LDA is often applied, in particular for ill-posed data (data where the number of variables exceeds the number of objects), e.g. Ref. [46], One first extracts a certain number of principal components, deleting the higher-order ones and thereby reducing to some degree the noise and then carries out the LDA. One should however be careful not to eliminate too many PCs, since in this way information important for the discrimination might be lost. A method in which both are merged in one step and which sometimes yields better results than the two-step procedure is reflected discriminant analysis. The Fourier transform is also sometimes used [14], and this is also the case for the wavelet transform (see Chapter 40) [13,16]. In that case, the information is included in the first few Fourier coefficients or in a restricted number of wavelet coefficients. [Pg.236]

In feature selection one selects from the m variables a subset of variables that seem to be the most discriminating. Feature selection therefore constitutes a means of choosing sets of optimally discriminating variables and, if these variables are the results of analytical tests, this consists, in fact, of the selection of an optimal combination of analytical tests or procedures. [Pg.236]

One way of selecting discriminating features is to compare the means and the variances of the different variables. Variables with widely different means for the classes and small intraclass variance should be of value and, for a binary discrimination, one therefore selects those variables for which the expression [Pg.236]

Most of the supervised pattern recognition procedures permit the carrying out of stepwise selection, i.e. the selection first of the most important feature, then, of the second most important, etc. One way to do this is by prediction using e.g. cross-validation (see next section), i.e. we first select the variable that best classifies objects of known classification but that are not part of the training set, then the variable that most improves the classification already obtained with the first selected variable, etc. The results for the linear discriminant analysis of the EU/HYPER classification of Section 33.2.1 is that with all 5 or 4 variables a selectivity of 91.4% is obtained and for 3 or 2 variables 88.6% [2] as a measure of classification success. Selectivity is used here. It is applied in the sense of Chapter [Pg.236]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...