Pattern recognition training sets

Discriminant analysis (Figure 31) [41,487, 577 — 581] separates objects with different properties, e.g. active and inactive compounds, by deriving a linear combination of some other features e.g. of different physicochemical properties), which leads to the best separation of the individual classes. Discriminant analysis is also appropriate for semiquantitative data and for data sets, where activities are only characterized in qualitative terms. As in pattern recognition, training sets are used to derive a model and its stability and predictive ability is checked with the help of different test sets. [Pg.100]

Most of the supervised pattern recognition procedures permit the carrying out of stepwise selection, i.e. the selection first of the most important feature, then, of the second most important, etc. One way to do this is by prediction using e.g. cross-validation (see next section), i.e. we first select the variable that best classifies objects of known classification but that are not part of the training set, then the variable that most improves the classification already obtained with the first selected variable, etc. The results for the linear discriminant analysis of the EU/HYPER classification of Section 33.2.1 is that with all 5 or 4 variables a selectivity of 91.4% is obtained and for 3 or 2 variables 88.6% [2] as a measure of classification success. Selectivity is used here. It is applied in the sense of Chapter... [Pg.236]

Feature selection is the process by which the data or variables liq>or-tant for class assignment are determined. In this step of a pattern recognition study the various methods differ considerably. In the hyperplane methods, the strategy is to begin with a block of variables for the classes, calculate a classification function, and test it for classification of the training set. In this initial phase, generally many more variables are included than are necessary. Variables are then detected in a stepwise process and a new rule is derived and tested. This process is repeated until a set of variables is obtained that will give an acceptable level of classification. [Pg.247]

Fig. 5. A plot of the two largest principal components of the training set developed from the 312 compounds and 15 molecular descriptors identified by the pattern-recognition GA. The plane defined by the two largest principal components accounts for 35% of the total cumulative variance. Circles are the musks inverted triangles are the nonmusks M = musks from the prediction set projected onto the principal component plot N = nonmusks from the prediction set projected onto the principal component plot.

Perform iBSupervised pattern recognition on the entire training set to sec if the classes ear overlapped (PCA and/or HCA). [Pg.75]

Supervised pattern recognition methods are used for predicting the class of unkno-wm samples given a training set of samples with known class member-sliip. Tvksmethods are discussed in Section 4.3, KNN and SIMCA,... [Pg.95]

The set of samples for which the property of interest and measurements are known is called the training set, whereas the set of measurements that describe each sample in the data set is called a pattern. The determination of the property of interest by assigning a sample to its respective class is called recognition, hence the term pattern recognition. ... [Pg.340]

The choice of the training set is important in any pattern-recognition study. Each class must be well represented in the training set. Experimental variables must be controlled or otherwise accounted for by the selection of suitable samples that take into account all sources of variability in the data, for example, lot-to-lot variability. Experimental artifacts such as instrumental drift or sloping baseline must be minimized. Features containing information about differences in the source profile of each class must be present in the data. Otherwise, the classifier is likely to discover rules that do not work well on test samples, i.e., samples that are not part of the original data. [Pg.354]

The SCS was subsequently applied to the same data set to provide a robust method of classification based more on overall pattern recognition than on intensity ratios. Before commencing the multivariate analysis, the spectra were partitioned into a training set (33 BPH, 13 tumours) and a test set. Preprocessing of the data using ORS GA (4) selected six optimally discriminatory regions,... [Pg.93]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...