Principal component model

In the following sections we propose typical methods of unsupervised learning and pattern recognition, the aim of which is to detect patterns in chemical, physicochemical and biological data, rather than to make predictions of biological activity. These inductive methods are useful in generating hypotheses and models which are to be verified (or falsified) by statistical inference. Cluster analysis has [Pg.397]

A table of correlations between seven physicochemical substituent parameters for 90 chemical substituent groups has been reported by Hansch et al. [39]. The parameters include lipophilicity (log P), molar refractivity MR), molecular weight MW), Hammett s electronic parameters (a and o ), and the field and resonance parameters of Swain and Lupton F and R). [Pg.398]

The method of PCA can be used in QSAR as a preliminary step to Hansch analysis in order to determine the relevant parameters that must be entered into the equation. Principal components are by definition uncorrelated and, hence, do not pose the problem of multicollinearity. Instead of defining a Hansch model in terms of the original physicochemical parameters, it is often more appropriate to use principal components regression (PCR) which has been discussed in Section 35.6. An alternative approach is by means of partial least squares (PLS) regression, which will be more amply covered below (Section 37.4). [Pg.398]

Principal components analysis can also be used in the case when the compounds are characterized by multiple activities instead of a single one, as required by the Hansch or Free-Wilson models. This leads to the multivariate bioassay analysis which has been developed by Mager [9]. By way of illustration we consider the physicochemical and biological data reported by Schmutz [41] on six oxazepines [Pg.398]

Correlations between 7 physicochemical substituent parameters obtained from 90 substituents groups [39] [Pg.399]

S. Wold, Cross-validatory estimation of the number of components in factor and principal components models. Technometrics, 20 (1978) 397-405. [Pg.160]

S. Wold, Pattern recognition by means of disjoint principal components models. Pattern Recogn., 8 (1976) 127-139. [Pg.240]

While principal components models are used mostly in an unsupervised or exploratory mode, models based on canonical variates are often applied in a supervisory way for the prediction of biological activities from chemical, physicochemical or other biological parameters. In this section we discuss briefly the methods of linear discriminant analysis (LDA) and canonical correlation analysis (CCA). Although there has been an early awareness of these methods in QSAR [7,50], they have not been widely accepted. More recently they have been superseded by the successful introduction of partial least squares analysis (PLS) in QSAR. Nevertheless, the early pattern recognition techniques have prepared the minds for the introduction of modem chemometric approaches. [Pg.408]

Musumarra et al. [44] also identified miconazole and other drugs by principal components analysis of standardized thin-layer chromatographic data in four eluent systems and of retention indexes on SE 30. The principal component analysis of standardized R values in four eluents systems ethylacetate-methanol-30% ammonia (85 10 15), cyclohexane-toluene-diethylamine (65 25 10), ethylacetate-chloroform (50 50), and acetone with plates dipped in potassium hydroxide solution, and of gas chromatographic retention indexes in SE 30 for 277 compounds provided a two principal components model that explains 82% of the total variance. The scores plot allowed identification of unknowns or restriction of the range of inquiry to very few candidates. Comparison of these candidates with those selected from another principal components model derived from thin-layer chromatographic data only allowed identification of the drug in all the examined cases. [Pg.44]

Wold, H., Christie, O. H. J. Anal. Chim. Acta 165, 1984, 51-59. Extraction of mass spectral information by a combination of autocorrelation and principal components models. [Pg.306]

This is a principal components model in which is the loading of peak i in term and t is the score of object k in term is a peak specific term and is an object or sample specific term. The variation about the mean, m-, can be random or systematic. If random variation is observed it can be due to measurement error, and this variation can be used in quality... [Pg.204]

Both resemble principal components models, but are derived so as to simultaneously minimize and in the least squares sense w h i 1 e yielding —ka that optimize the... [Pg.210]

The data were modeled by a principal components model with three components. The statistical results method (25. 31) are presented in Table IV and V. In addition, the measured total PCB concentration is included in Table IV. One of the three sets of two-dimension plots (Theta 1 vs Theta 2) is presented in Figure 10. Individual samples of a given Aroclor were distributed regularly in these plots and samples were ordered according to concentration. The sums of squares decreased from 4,360 to 52.4 (Table V.) and approximately 88 percent of the standard deviation was explained by the three term component model. [Pg.216]

The likeness of samples within the class can be assessed by the proximity of samples to each other in plots derived from principal components models. The statistical technique of cross-validation (17) was used to... [Pg.4]

The principal components model of the Aroclor seunples (Table i) preserves greater than 95% of the sample variance of the entire data set. From the 3-D seunple score plot (Figure 3) one can make these observations PCB mixtures of two Aroclors form a straight line three Aroclor mixtures form a plane and that possible mixtures of the four Aroclors are bounded by the intersection of the four planes. Samples not bounded by or inside the volume formed by the intersection of the four planes may... [Pg.9]

In SIHCA-3B, modeling power is defined to be a measure of the importance of each variable in a principal component term of the class model (18). The modeling power has a maximum value of one (1.0) if the variable is well described by the principal components model. Variables with modeling power of less than 0.2 can be eliminated from the data without a major loss of information (18). [Pg.10]

Table V. Weighted Centers of Class Principal Component Models...

The objective of principal components modeling is to approximate the systematic class structure by a model of the form of Equation 2. This is shown diagramatically below in Equation 3. Here X is the... [Pg.246]

Loadings Plot (Model and Variable Diagnostic) The loading plot in Figure 4.64 reveals that the first and se< ond loadings have nonrandom features, while the third is random in nature. This suggests a two-principal component model consistent with the percent variance explained, residuals plots, and mSECV PCA results... [Pg.254]

While the chemical mass balance receptor model is easily derivable from the source model and the elements of its solution system are fairly easy to present, this is not the case for multivariate receptor models. Watson (9) has carried through the calculations of the source-receptor model relationship for the correlation and principal components models in forty-three equation-laden pages. [Pg.94]

PLS is related to principal components analysis (PCA) (20), This is a method used to project the matrix of the X-block, with the aim of obtaining a general survey of the distribution of the objects in the molecular space. PCA is recommended as an initial step to other multivariate analyses techniques, to help identify outliers and delineate classes. The data are randomly divided into a training set and a test set. Once the principal components model has been calculated on the training set, the test set may be applied to check the validity of the model. PCA differs most obviously from PLS in that it is optimized with respect to the variance of the descriptors. [Pg.104]

Wold, S., Cross-Validatory Estimation of the Number of Components in Factor Analysis and Principal Component Models Technometrics 1978, 20, 397-406. [Pg.325]

Whenever data belongs to different known categories a principal component model can be calculated for each category. This technique is used in the method SIMCA for classification and modelling quantitative correlations between the model parameters (axes) and external properties can be established (ref. 9,10). [Pg.55]

SIMCA EMX, P.O.Box 336, S-95125 Lulea, Sweden 2200. Multivariate data analysis by SIMCA (principal component models of classes) and PLS (partial least square) (ref. 20). [Pg.63]

FIGURE 4.2 Diagram of a principal component model for the chromatographic-spectroscopic data set shown in Figure 4.1. [Pg.73]

To construct the principal component model described by Equation 4.4, we define V and T according to Equation 4.8 and Equation 4.9, where V contains the selected k columns from V. [Pg.74]

The singular-value decomposition (SVD) is a computational method for simultaneously calculating the complete set of column-mode eigenvectors, row-mode eigenvectors, and singular values of any real data matrix. These eigenvectors and singular values can be used to build a principal component model of a data set. [Pg.76]

In Equation 4.13 we seek the k columns of U that are the column-mode eigenvectors of A. These k columns are the columns with the k largest diagonal elements of S, which are the square root of the eigenvalues of Z = ATA. The k rows of VT are the row-mode eigenvectors of A. The following equations describe the relationship between the singular-value decomposition model and the principal component model. [Pg.76]

The SVD is generally accepted to be the most numerically accurate and stable technique for calculating the principal components of a data matrix. MATLAB has an implementation of the SVD that gives the singular values and the row and column eigenvectors sorted in order from largest to smallest. Its use is shown in Example 4.3. We will use the SVD from now on whenever we need to compute a principal component model of a data set. [Pg.76]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...