Supervised method

If the membership of objects to particular clusters is known in advance, the methods of supervised pattern recognition can be used. In this section, the following methods are explained linear learning machine (LLM), discriminant analysis, A -NN, the soft independent modeling of class analogies (SIMCA) method, and Support Vector Machines (SVMs). [Pg.184]

Supervised methods for recognizing patterns can also be based on multivariate modeling methods, for example, by use of PLS as discussed in Section 6.2.2. The method is termed discriminant analysis-partial least squares (DA-PLS) analysis where the input feature data from the X matrix and the assignment to a class is described in the Y matrix. To avoid a ranking of classes, the containment of classes is not coded in a single classification vector, for example, classes 1-6, but is described by ones or zeros columnwise in the Y matrix. [Pg.184]

The first analytical application of a pattern recognition method dates back to 1969 when classification of mass spectra with respect to certain molecular mass classes was tried with the LLM. The basis for classification with the LLM is a discriminant function that divides the -dimensional space into category regions that can be further used to predict the category membership of a test sample. [Pg.184]

To find a decision boundary that separates the two groups, the data vectors have to be augmented by adding a ( + 1) component equal to 1.0. This ensures that the boundary for separating the classes passes through the origin. If more than two categories [Pg.184]

Hair sample Iodine content (ppm) Augmented component [Pg.185]

In complex systems where the number of groups to be separated during classification becomes larger, the performance of simple unsupervised methods (Section 3) degrades, requiring the use of more sophisticated supervised chemometric techniques. Additionally, in fields such a process NMR where there is a need for quantifying a component, the use of supervised methods becomes necessary. The different supervised methods described in the sections below have all been utilized in the chemometric analysis of NMR data for classification and/or quantitation. Examples utilizing these different techniques are discussed in Section 5. [Pg.60]

Principle components regression (PCR) is one of the supervised methods commonly employed to analyze NMR data. This method is typically used for developing a quantitative model. In simple terms, PCR can be thought of as PCA followed by a regression step. In PCR, the scores matrix (T) obtained in PCA (Section 3.1) is related to an external variable in a least squares sense. Recall that the data matrix can be reconstructed or estimated using a limited number of factors (/ffact), such that only the fc = Mfaet PCA loadings (l fc) are required to describe the data matrix. Eq. (15) can be reconstructed as [Pg.61]

In PCR a least squares solution can be found to relate the vector containing the state of an external variable (C) with dimensions X 1 to the scores matrix, T [Pg.61]

The vector C can be any parameter that is appropriate to model through D. Examples of C include species concentration, time of data collection, length of hydrocarbon chain or other performance parameters. The estimated relation between C and T is called the regression vector, B. Using a calibration set of data where, for each NMR spectrum, the concentration is known, B is estimated using the T matrix estimated from the PCA step. [Pg.61]

Once B is estimated using calibration data, the Cunknown value of an unknown data vector (spectrum) can be estimated from the unknown scores is estimated [Pg.61]

For high-throughput profiHng applications under discussion, the most interesting question is whether the molecular profiles can provide predictive information. In [Pg.422]

For example, for discrimination of cancer versus healthy, the space of y contains two possible values. The components of x are the quantitative values of the peptide ions. [Pg.424]

In the multivariate case, the challenge is to select a small number of peptides as a signature for classification. In this manner, prior information can be incorporated, which is crucial for interpretation as weU as for constructing a classifier [Pg.424]

Like the unsupervised methods, the supervised methods discussed in this book are based on the assumption that samples that are chemically or physically similar wM be near each other in measurement (row) space. [Pg.61]

The conclusion drawn from this analysis is that classes A and C are separated, while class B may be overlapped with classes A and/or C. For illustrative purposes, assume that additional information is available that confirms our assertion that the unusual class B samples are actually mislabeled class A samples. These unusual class B samples will henceforth be labeled as belonging tO class A. Caution Do not make class assignments based solely on the score plots. Known class information drives the supervised methods and. therefore, it is vcn- imponant that ihc clas,s desiunati[Pg.78]

Two supervised methods are examined in this chapter, KNN and SIMCA. The KNN models are constructed using the physical closeness of samples in space. It is a simple method that does not rely on many assumptions about the data. SIMCA models are based on more assumptions and define classes using the position and shape of the object formed by the samples in row space. A multidimensional box is constructed for each class using PCA and tlie classification of future samples is perfonned by determining within which box the sample belongs. [Pg.274]

Classification, or the division of data into groups, methods can be broadly of two types supervised and unsupervised. The primary difference is that prior information about classes into which the data fall is known and representative samples from these classes are available for supervised methods. The supervised and unsupervised approaches loosely lend themselves into problems that have prior hypotheses and those in which discovery of the classes of data may be needed, respectively. The division is purely for organization purposes in many applications, a combination of both methods can be very powerful. In general, biomedical data analysis will require multiple spectral features and will have stochastic variations. Hence, the field of statistical pattern recognition [88] is of primary importance and we use the term recognition with our learning and classification method descriptions below. [Pg.191]

Supervised methods rely on some prior training of the system with objects known to belong to the class they define. Such methods can be of the discriminant or modeling types.11 Discriminant methods split the pattern space into as many regions as the classes encompassed by the training set and establish bounds that are shared by the spaces. These methods always classify an unknown sample as a specific class. The most common discriminant methods include discriminant analysis (DA),12 the K-nearest neighbor... [Pg.366]

Reasonable noise in the spectral data does not affect the clustering process. In this respect, cluster analysis is much more stable than other methods of multivariate analysis, such as principal component analysis (PCA), in which an increasing amount of noise is accumulated in the less relevant clusters. The mean cluster spectra can be extracted and used for the interpretation of the chemical or biochemical differences between clusters. HCA, per se, is ill-suited for a diagnostic algorithm. We have used the spectra from clusters to train artificial neural networks (ANNs), which may serve as supervised methods for final analysis. This process, which requires hundreds or thousands of spectra from each spectral class, is presently ongoing, and validated and blinded analyses, based on these efforts, will be reported. [Pg.194]

It is, at this point, important to understand the difference between unsupervised methods and supervised methods. With the former, there is no indication given to the model creation program (e.g. PCA, self-organising maps) of where any of... [Pg.106]

The remainder of this section deals with supervised methods. [Pg.107]

The vectors of means = (xi, I2,..., x ) and deviations = (ii, S2,. ..,Sp), and matrices of covariances S = (Sij) and correlations R = (tij) can be calculated. For this data matrix, the most used non-supervised methods are Principal Components Analysis (PCA), and/or Factorial Analysis (FA) in an attempt to reduce the dimensions of the data and study the interrelation between variables and observations, and Cluster Analysis (CA) to search for clusters of observations or variables (Krzanowski 1988 Cela 1994 Afifi and Clark 1996). Before applying these techniques, variables are usually first standardised (X, X ) to achieve a mean of 0 and unit variance. [Pg.694]

Data mining methods can be generally divided into two types, unsupervised and supervised. Whereas unsupervised methods seek informative patterns, which directly display the interesting relationship among the data, supervised methods discoverpredictivepatterns, which can be used later to predict one or more attributes from the rest. [Pg.66]

Ringner, M. Peterson, C. Khan, J. Analyzing array data using supervised methods. Pharmacogenomics 2(K)2, 3, 403-415. [Pg.2799]

For the example in Fig. 2, the Fourier transformed NMR spectra (variables or descriptors being intensity as a function of frequency) were utilized for the creation of the data matrix D. It should be noted that many different descriptors can be used to create D, with the descriptor selection depending on the analysis method and the information to be extracted. For example, in the spectral resolution methods (Section 6), the desired end result is the determination of the true or pure component spectra and relative concentrations present within the samples or mixtures [Eq. (4)]. For this case, the unmodified real spectra Ij co) are commonly used for the chemometric analysis. In contrast, for the non-supervised and supervised methods described in Sections 3 and 4, the classification of a sample into different categories is the desired outcome. For these types of non-supervised and supervised methods the original NMR spectrum can manipulated or transformed to produce new descriptors including... [Pg.46]

Fig. 5. An example of a scores plot as one might obtain in a principal components analysis. Distinct clustering or grouping of NMR spectra is observed in this type of plot, where the discrimination results from the analyzed metric used (e.g., principal components). The distance between samples (r ) within groups is used by many supervised methods to further describe and improve class or group separation. There are different chemometric techniques that can be used to identify outliers, or to provide a group assignment.

QSAR methods can be classihed in several ways. One approach is to look at the nature of the method, supervised versus unsupervised, where supervised methods use the activity values to create a predictive model from the descriptors and unsupervised methods model molecular similarity from descriptors, but do not use the activity values in the derivation of the model. Another way is to look at the nature of the relationship between activity and descriptors categorical versus continuous, or linear versus non-linear (Figure 23.1)... [Pg.492]

Cluster significance analysis (CSA) is a related, supervised method that can be used to determine subsets of properties that cause active compounds to cluster together. ... [Pg.501]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...