Principal score prediction

The procedure is as follows first, the principal components for X and Yare calculated separately (cf. Section 9.4.4). The scores of the matrix X are then used for a regression model to predict the scores of Y, which can then be used to predict Y. [Pg.449]

The disadvantages of the principal component space are that it is more difficult to understand and explain to others, and it is more complex to implement in prediction. Instead of simply using individual X-variable intensities from a prediction sample s response profile (xp), one must first project these intensities onto the A significant PCs in order to obtain the prediction sample s PCA scores (tp) ... [Pg.287]

The KNN method is probably the simplest classification method to understand. It is most commonly applied to a principal component space. In this case, calibration is achieved by simply constructing a PCA model using the calibration data, and choosing the optimal number of PCs (A) to use in the model. Prediction of an unknown sample is then done by calculating the PC scores for that sample (Equation 8.57), followed by application of the classification rule. [Pg.289]

Partial least squares (PLS) is similar to MLR in that it also assumes a linear relationship between a vector x and a target property y. However, it avoids the problems of collinear descriptors by calculating the principal components for the molecular descriptors and target property separately. The scores for the molecular descriptors are used as the feature vector x and are also used to predict the scores for the target property, which can in turn be used to predict y. An important consideration in PLS is the appropriate number of principal components to be used for the QSAR model. This is usually determined by using cross-validation methods like fivefold cross validation and leave-one-out. PLS has been applied to the prediction of carcinogenicity [19], fathead minnow toxicity [20], Tetrahymena pyriformis toxicity [21], mammalian toxicity [22], and Daphnia magna toxicity [23],... [Pg.219]

Assume that a principal components model with A components (/>j, />2>—> Pa) has been determined. The model can be used in two ways (1) For a new compound "r" the corresponding score values can be determined by projecting the descriptors of "r down to the hyperplane spanned by the components. (2) It is then possible to predict the original descriptors of "r from the scores and the loading vectors. If the model is good, the predicted value, ijj, of a descriptor should be close to the observed value, X ,.. The difference, Xj, - is the prediction error. (The letter/will be used to... [Pg.364]

It is important to know how many principal components (factors) should be retained to accurately describe the data matrix D in Eq. (15), and still reduce the amount of noise. A common method used is the cross validation technique, which provides a pseudo-predictive method to estimate the number of factors to retain. The cross validation technique leaves a percentage of the data (y %) out at a time. Using this reduced data set, PCA is again carried out to provide new loading and scores. These are then used to predict the deleted data and then used to calculate the ensuing error dehned by... [Pg.56]

The prediction of Y-data of unknown samples is based on a regression method where the X-data are correlated to the Y-data. The multivariate methods, usually used for such a calibration, are principal component regression (PCR) and partial least squares regression (PLS). Both methods are based on the assumption of linearity and can deal with co-linear data. The problem of co-linearity is solved in the same way as the formation of a PCA plot. The X-variables are added together into latent variables, score vectors. These vectors are independent since they are orthogonal to each other and they can therefore be used to create a calibration model. [Pg.7]

Katritzky et al. <89JA7> have developed a principal component analysis (PCA) system to determine aromaticity. Various characteristics consisting of geometrical, energetic, and magnetic data for a series of compounds are assembled and subjected to PCA. Values for the characteristics of individual compounds recalculated from the scores were found to exhibit good agreement with those used in the treatment. Scores can then be estimated to predict values of characteristics for other compounds where there is limited data available. [Pg.478]

Using Principal Component Scores and Artificial Neural Networks in Predicting Water Quality Index... [Pg.271]

The objective of this study is to use the PCA method to classify predictor variables according to their interrelation, and to obtain parsimonious prediction model (i.e., model that depend on as few variables as necessary) for WQI with other physico-chemical and biological data as predictor variables to model the water quality of the Langat river. For this purpose, principal component scores of 23 physico-chemical and biological water quality parameters were generated and selected appropriately as input variables in ANN models for predicting WQI. [Pg.273]

Two different types of ANN models were developed. In the first type, prediction was performed based on the original PCs. In the second type of ANNs developed, scores of rotated (varimax rotation) PCs (ANN-RPCs) with eigenvalues greater than 1 were selected as input. For this model, prediction of WQI was performed using two to six rotated principal components separately. [Pg.275]

Visual inspection should be possible from plots of predicted versus measured concentrations, from principal component plots of loadings and scores in the case of soft modeling techniques, and by plotting the standard error of calibration (SEC) or the standard error of prediction (SEP(-y, Eq. (6.68)) from cross-validation in dependence on the number of eigenvalues or of principal components. [Pg.247]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...