Feature selection

The objects (or events) of a data base are characterized by a set of measurements (features). For pattern recognition purposes the number of features should be as small as possible for two reasons. [Pg.106]

features which are not relevant to the classification problem should be eliminated because they may disturb the classification or at least enlarge the computational work. For the same reason, correlations between features should be eliminated. [Pg.106]

Second, the number (d) of features must be much greater than that (n) of patterns. Otherwise, a classifier may be found that even separates randomly selected classes of the training set. A ratio of n/d 3 is acceptable, n/d 10 is desirable (Chapter 1.6, details in Chapter 10.4). This second reason forces the user of pattern recognition methods into feature selection. In almost all chemical applications of pattern recognition is the number of original raw features too large and a reduction of the dimensionality is necessary. [Pg.106]

The problem of feature selection is to find the optimum combination of features. Most methods only look for the best individual features. An exhaustive search C175, 388D for the best subset of features is only possible for very small data sets. [Pg.106]

Another strategy is the generation of a group of new features (e.g. linear combinations of the original features) as described in Chapter 9.5. Features which are essential for a classification are often called intrinsic features . [Pg.106]

Where more than one component is used, the a values are for models of each dimensionality. ° The number of components used for this is given in brackets. [Pg.159]

LDA models may also be used to identify important variables but here it should be remembered that discriminant functions are not unique solutions. Thus, the use of LDA for variable selection may be misleading. Whatever form of supervised learning method is used for the identification [Pg.159]

This chapter has described some of the more commonly used supervised learning methods for the analysis of data discriminant analysis and its relatives for classified dependent data, variants of regression analysis for continuous dependent data. Supervised methods have the advantage that they produce predictions, but they have the disadvantage that they can suffer from chance effects. Careful selection of variables and test/training sets, the use of more than one technique where possible, and the application of common sense will all help to ensure that the results obtained from supervised learning are useful. [Pg.160]

Dillon, W. R. and Goldstein, M. (1984). Multivariate analysis methods and applications, pp. 360-93. Wiley, New York. [Pg.160]

Edlund, U., Hellberg, S., and Gasteiger, J. (1984). Quantitative Structure-Activity Relationships, 3, 134-7. [Pg.160]

Feature selection, i.e. the selection of variables that are meaningful for the classification and elimination of those that have no discriminating (or, for certain techniques, no modelling power). This step is discussed further in Section 33.3. [Pg.207]

In Section 33.2.2 we showed how LDA classification can be described as a regression problem with class variables. As a regression model, LDA is subject to the problems described in Chapter 10. For instance, the number of variables should not exceed the number of objects. One solution is to apply feature selection or... [Pg.232]

In feature selection one selects from the m variables a subset of variables that seem to be the most discriminating. Feature selection therefore constitutes a means of choosing sets of optimally discriminating variables and, if these variables are the results of analytical tests, this consists, in fact, of the selection of an optimal combination of analytical tests or procedures. [Pg.236]

It should be stressed here that feature selection is not only a data manipulation operation, but may have economic consequences. For instance, one could decide on the basis of the results deseribed above to reduce the number of different tests for a EU/HYPO discrimination problem to only two. A less straightforward problem with which the decision maker is eonfronted is to decide how many tests to earry out for a EU/HYPER discrimination. One loses some 3% in seleetivity by eliminating one test. The deeision maker must then compare the economic benefit of earrying out one test less with the loss contained in a somewhat smaller diagnostic success. In fact, he earries out aeost-benefit analysis. This is only one of the many instanees where an analytical (or clinical) chemist may be confronted with such a situation. [Pg.237]

M. Forina, G. Drava and G. Contarini, Feature selection and validation of SIMCA models a case study with a typical Italian cheese. Analusis 21 (1993) 133-147. [Pg.241]

The high sensitivity of the sensors makes it possible to use low-capacity sensors of active particles that feature selective generation in the study of heterogeneous processes. Thus, we are in a position to eliminate the influence of gaseous phase on the surface properties and to study in succession the interaction between the surface and certain constituents of an excited gaseous phase. [Pg.342]

Johnson, K.J., Synovec, R.E. (2002). Pattern recognition of jet fuels comprehensive GC x GC with ANOVA-based feature selection and principal component analysis Chemom. Intell. Lab. Syst. 60, 225-237. [Pg.32]

Closely related to the creation of regression models by OLS is the problem of variable selection (feature selection). This topic is therefore presented in Section 4.5, although variable selection is also highly relevant for other regression methods and for classification. [Pg.119]

Leardi, R. J. Chemom. 8, 1994, 65-79. Application of a genetic algorithm for feature selection under full validation conditions and to outlier detection. [Pg.206]

Nadler, B., Coifman, R. R. J. Chemom. 19, 2005, 107-118. The prediction error in CLS and PLS The importance of feature selection prior to multivariate calibration. [Pg.206]

Rosipal, R., Kramer, N. in Saunders, C., Grobelnik, M., Gunn, S. R., Shawe-Taylor, J. (Ed.), Subspace, Latent Structure and Feature Selection Techniques. Lecture Notes in Computer Science, Vol. 3940, Springer, Berlin, Germany, 2006, pp. 34—51. Overview and recent advances in partial least squares. [Pg.207]

Feature selection by genetic algorithms for mass spectral classifiers. [Pg.208]

D. Coomans, M.P. Derde, D.L. Massart and I. Broeckaert, Potential methods in pattern recognition. Part 3 Feature selection with ALLOC, Anal. Chim. Acta, 133, 241-250 (1981). [Pg.486]

Feature selection is the process by which the data or variables liq>or-tant for class assignment are determined. In this step of a pattern recognition study the various methods differ considerably. In the hyperplane methods, the strategy is to begin with a block of variables for the classes, calculate a classification function, and test it for classification of the training set. In this initial phase, generally many more variables are included than are necessary. Variables are then detected in a stepwise process and a new rule is derived and tested. This process is repeated until a set of variables is obtained that will give an acceptable level of classification. [Pg.247]

This approach to feature selection leads to a set of descriptors that are optimal for class discrimination. These variables may or may not contain information that describes the classes. [Pg.247]

In SIMCA, a class modeling method, a parameter called modeling power is used as the basis of feature selection. This variable is defined in Equation 4, where is the standard deviation of a vari-... [Pg.247]

After the feature selection process has been carried out once by the SIMCA method, it is necessary to refine the model because the iK>del may shift slightly. This refining of the model leads to an ( timal set of descriptors with optimal mathematical structure. [Pg.247]

Electronic descriptors were calculated for the ab initio optimized (RHG/STO-3G) structures. In addition, logP as a measure of hydrophobicity and different topological indices were also calculated as additional descriptors. A nonlinear model was constructed using ANN with back propagation. Genetic algorithm (GA) was used as a feature selection method. The best ANN model was utilized to predict the log BB of 23 external molecules. The RMSE of the test set was only... [Pg.110]

Recursive partitioning is a feature selection method. As such it shares the deficiencies of other feature selection methods such as stepwise or subset regression. The major deficiencies are as follows ... [Pg.324]

Once we have generated the forest of trees, the next question is what to do with the information it contains. To explore this, recall that the analysis has two possible purposes —prediction and feature selection. Prediction is the problem that arises if we have a new compound whose activity is unknown and wish to predict its activity on the basis of the relationships seen in the calibration sample. A good way to use the forest to make such a prediction is bagging. ... [Pg.325]

See also in sourсe #XX -- [ Pg.207 , Pg.236 , Pg.375 ]

See also in sourсe #XX -- [ Pg.363 ]

See also in sourсe #XX -- [ Pg.2 , Pg.151 , Pg.153 ]

See also in sourсe #XX -- [ Pg.145 ]

See also in sourсe #XX -- [ Pg.146 ]

See also in sourсe #XX -- [ Pg.55 ]

See also in sourсe #XX -- [ Pg.324 , Pg.326 , Pg.331 ]

See also in sourсe #XX -- [ Pg.2 , Pg.151 , Pg.153 ]

See also in sourсe #XX -- [ Pg.2 , Pg.151 , Pg.153 ]

See also in sourсe #XX -- [ Pg.109 ]

See also in sourсe #XX -- [ Pg.5 , Pg.106 ]

See also in sourсe #XX -- [ Pg.201 ]

See also in sourсe #XX -- [ Pg.69 ]

See also in sourсe #XX -- [ Pg.750 ]

See also in sourсe #XX -- [ Pg.264 , Pg.375 ]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...