Pattern recognition prediction

Increased trust in pattern recognition The active user involvement in the data mining process can lead to a deeper understanding of the data and increases the trust in the resulting patterns. In contrast, "black box" systems often lead to a higher uncertainty, because the user usually does not know, in detail, what happened during the data analysis process. This may lead to a more difficult data interpretation and/or model prediction. [Pg.475]

Often the goal of a data analysis problem requites more than simple classification of samples into known categories. It is very often desirable to have a means to detect oudiers and to derive an estimate of the level of confidence in a classification result. These ate things that go beyond sttictiy nonparametric pattern recognition procedures. Also of interest is the abiUty to empirically model each category so that it is possible to make quantitative correlations and predictions with external continuous properties. As a result, a modeling and classification method called SIMCA has been developed to provide these capabihties (29—31). [Pg.425]

Most of the supervised pattern recognition procedures permit the carrying out of stepwise selection, i.e. the selection first of the most important feature, then, of the second most important, etc. One way to do this is by prediction using e.g. cross-validation (see next section), i.e. we first select the variable that best classifies objects of known classification but that are not part of the training set, then the variable that most improves the classification already obtained with the first selected variable, etc. The results for the linear discriminant analysis of the EU/HYPER classification of Section 33.2.1 is that with all 5 or 4 variables a selectivity of 91.4% is obtained and for 3 or 2 variables 88.6% [2] as a measure of classification success. Selectivity is used here. It is applied in the sense of Chapter... [Pg.236]

In the following sections we propose typical methods of unsupervised learning and pattern recognition, the aim of which is to detect patterns in chemical, physicochemical and biological data, rather than to make predictions of biological activity. These inductive methods are useful in generating hypotheses and models which are to be verified (or falsified) by statistical inference. Cluster analysis has... [Pg.397]

While principal components models are used mostly in an unsupervised or exploratory mode, models based on canonical variates are often applied in a supervisory way for the prediction of biological activities from chemical, physicochemical or other biological parameters. In this section we discuss briefly the methods of linear discriminant analysis (LDA) and canonical correlation analysis (CCA). Although there has been an early awareness of these methods in QSAR [7,50], they have not been widely accepted. More recently they have been superseded by the successful introduction of partial least squares analysis (PLS) in QSAR. Nevertheless, the early pattern recognition techniques have prepared the minds for the introduction of modem chemometric approaches. [Pg.408]

As a result of machine learning a model is produced of the characteristic exhibition of a property (for instance, the formation of a particular type of chemical compound) which corresponds to a distribution pattern of this property in the multidimensional representative space of the properties of the elements. The subsequent pattern recognition corresponds to a criterion for the classification of the known compounds and for the prediction of those still unknown. Examples of this approach reported by Savitskii are the prediction of the formation of Laves phases, of CaCu5 type phases, of compounds XY2Z4 (X, Y any of the elements, Z = O, S, Se, Te), etc. (Data on the electronic structures of the components were selected as... [Pg.308]

An emerging type of resistance testing known as virtual phenotype predicts phenotypic resistance from the genotype by using pattern recognition that is applied to large relational databases of genotypes and phenotypes. [Pg.463]

Pattern recognition methods have been used for the description of air pollution in the industrialized region at the estuary of the river Rhine near Rotterdam. A selection of about eight chemical and physical-meteorological features offers a possibility for a description that accounts for out 70% of the information that is ccmprised in these features with two parameters only. Prediction of noxious air situations scmetimes succeeds for a period of at most four hours in advance. Seme-times, hewever, no prediction can be made. Investigations pertaining to the correlation between air conpo-sition and complaints on bad smell by inhabitants of the area show that, apart frem physical and chemical descriptors, other features are also involved that depend on human perception and bdiaviour. [Pg.93]

Pattern recognition offers a useful tool for the description of air pollution in industrialized areas. Depending on the weather conditions, sanetimes even a prediction of situations with bad-smelling air may be obtained. However, when the weather conditions are unstable, no valid prediction is possible, i rt fran physical, meteorological and chemical features, other factors must be accounted for to predict the burden felt by people living in the area. [Pg.105]

Similarly, quantitative structure-metabolism relationships (QSMR) have been studied [42]. QSAR tools, such as pattern recognition analysis, have been used to e. g. predict phase II conjugation of substituted benzoic acids in the rat [53]. [Pg.138]

This paper subscribes to the third viewpoint, and is based on an empirical approach that involves coordination number pattern recognition (CNPR). It is a simplistic approach, yet it apparently accommodates most if not all carborane and borane structures. For compounds that are still controversial and for compounds that have not yet been discovered or characterized, the CNPR thesis frequently predicts different structures, or at least fewer candidates, than do any of the theoretical treatments. [Pg.68]

A prediction set of 19 compounds (see Table 2) was used to assess the predictive ability of the 15 molecular descriptors identified by the pattern recognition GA. We chose to map the 19 compounds directly onto the principal component plot defined by the 312 compounds and 15 descriptors. Figure 5 shows the prediction set samples projected onto the principal component map. Each projected compound lies in a region of the map with compounds that bare the same class label. Evidently, the pattern-recognition GA can identify molecular descriptors that are correlated to musk odor quality. [Pg.419]

Fig. 5. A plot of the two largest principal components of the training set developed from the 312 compounds and 15 molecular descriptors identified by the pattern-recognition GA. The plane defined by the two largest principal components accounts for 35% of the total cumulative variance. Circles are the musks inverted triangles are the nonmusks M = musks from the prediction set projected onto the principal component plot N = nonmusks from the prediction set projected onto the principal component plot.

In contras to unsupervised methods, supervised pattern-recognition methods (Section 4.3) use class membership information in the calculations. The goal of these methods is to construct models using analytical measurements to predict class membership of future samples. Class location and sometimes shape are used in the calibration step to construct the models. In prediction, these moddsare applied to the analytical measurements of unknowu samples to predict dsss membership. [Pg.36]

Supervised pattern recognition methods are used for predicting the class of unkno-wm samples given a training set of samples with known class member-sliip. Tvksmethods are discussed in Section 4.3, KNN and SIMCA,... [Pg.95]

Habits 5 and 6 are not described because POV is not used in this section as a predictive tool. The super ised pattern-recognition technique, SIMCA, uses PCA for class prediction and the details of Habits 5 and 6 for SIMCA are presented in Section 4.3.2.1. [Pg.233]

Dias and coworkers utilized an array of potentiometric sensors for the classification of honey samples from different Portuguese regions with respect to the predominant pollen type Erica, Echium, Lavandula. PCA and LDA were employed for the pattern recognition (see Fig. 2.25), after having verified that the variables followed a normal distribution. Cross-validation was applied for evaluating the classification rules, obtaining satisfactory prediction abilities for two classes (about 80%) and poor results for the third one (about 50%) (Dias et al., 2008). [Pg.106]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...