Principal components analysis cross-validation

Principal component analysis is central to many of the more popular multivariate data analysis methods in chemistry. For example, a classification method based on principal component analysis called SIMCA [69, 70] is by the far the most popular method for describing the class structure of a data set. In SIMCA (soft independent modeling by class analogy), a separate principal component analysis is performed on each class in the data set, and a sufficient number of principal components are retained to account for most of the variation within each class. The number of principal components retained for each class is usually determined directly from the data by a method called cross validation [71] and is often different for each class model. [Pg.353]

Principal component analysis of the response matrix afforded one significant component (cross-validation) which described 82% of total variance in Y. As all responses are of the same kind (percentage yield) the data were not autoscaled prior to analysis. The response y1]L was deleted as it did not vary. The scores and loadings are also given in Table 14. The score values were used to fit a second-order... [Pg.50]

Principal components analysis of the data in Table 15A1 showed that a two-component model was significant according to cross validation and accounted for... [Pg.375]

A first PLS model was established from 124 reaction systems. To ensure that this set of reaction systems was not selected in such a way that the descriptor variables were correlated, a principal component analysis was made of the variation of the eight descriptors over the set. This analysis afforded eight significant principal components according to cross validation. This showed that the variance-covariance matrix of the descriptors was a full rank matrix and that there were no severe colinearities among the descriptors. [Pg.481]

WJ Krzanowski. Cross-validation choice in principal component analysis. Biometrics, 43 575-584, 1987. [Pg.288]

Scarponi G, Moret I, Capodaglio G, Romanazzi M, Cross-validation, influential observations and selection of variables in chemometric studies of wines by principal component analysis, Journal of Chemometrics, 1990,4,217-240. [Pg.365]

One way to try to alleviate the problem of correlated descriptors is to perform a principal components analysis (see Section 9.13). Those principal components which explain (say) 90% of the variance may be retained for the subsequent calculations Alternatively, those principal components for which the associated eigenvalue exceeds unity may be chosen, or the principal components may be selected using more complex approaches based on cross-validation (see Section 12.12.3). It may be important to scale the descriptors (e.g. using autoscaling) prior to calculating the principal components. However, unless each principal component is largely associated with any particular descriptor it can be difficult to interpret the physical meaning of any subsequent results. ... [Pg.681]

A rapid head-space analysis instrument for the analysis of the volatile fractions of 105 extra virgin olive oils coming from five different Mediterranean areas was put forward by Cerrato-Oliveros and his co-workers. The rough information collected by this system was unraveled and interpreted with well-known multivariate techniques of display (principal component analysis), feature selection (stepwise linear discriminant analysis), and classification (linear discriminant analysis). 93.4% of the samples were correctly classified and 90.5% correctly predicted by the cross-validation procedure, whilst 80.0% of an external test set, aiming at full validation of the classification rule, were correctly assigned. [Pg.177]

Principal component analysis (PCA) and principal component regression (PCR) were used to analyze the data [39,40]. PCR was used to construct calibration models to predict Ang II dose from spectra of the aortas. A cross-validation routine was used with NIR spectra to assess the statistical significance of the prediction of Ang II dose and collagen/elastin in mice aortas. The accuracy of the PCR method in predicting Ang II dose from NIR spectra was determined by the F test and the standard error of performance (SEP) calculated from the validation samples. [Pg.659]

Principal components analysis was used to reduce the number of parameters needed to represent the variance in the spectral data set. The principal components were then used to generate a linear discriminant model. All three tissue classes were successfully discriminated as shown in Figure 4.12. The classification model was tested using a leave one out cross-validation in which all but one spectrum were used to build the model. This model was then used to predict the remaining spectrum. This was repeated for all 498 spectra. Of 498 tissue spectra, 492 were correctly classified as normal, invasive carcinoma or CIN. The cross-validation misclassified six spectra, two of which were normal samples assigned as invasive carcinoma. The other four were either invasive carcinoma or CIN misclassified as either CIN or invasive carcinoma respectively. Importantly, no abnormal samples were classified as normal. Based on the cross-validation results, sensitivity and specificity values were calculated as 99.5% and 100% respectively for normal tissue, 99% and 99.2% respectively for CIN and 98.5% and 99% respectively for invasive carcinoma. [Pg.126]

To construct the reference model, the interpretation system required routine process data collected over a period of several months. Cross-validation was applied to detect and remove outliers. Only data corresponding to normal process operations (that is, when top-grade product is made) were used in the model development. As stated earlier, the system ultimately involved two analysis approaches, both reduced-order models that capture dominant directions of variability in the data. A PLS analysis using two loadings explained about 60% of the variance in the measurements. A subsequent PCA analysis on the residuals showed that five principal components explain 90% of the residual variability. [Pg.85]

Initially an optimised model was constructed using the data collected as outlined above by constructing a principal component (PC)-fed linear discriminant analysis (LDA) model (described elsewhere) [7, 89], The linear discriminant function was calculated for maximal group separation and each individual spectral measurement was projected onto the model (using leave-one-out cross-validation) to obtain a score. The scores for each individual spectrum projected onto the model and colour coded for consensus pathology are shown in Fig. 13.3. The simulation experiments used this optimised model as a baseline to compare performance of models with spectral perturbations applied to them. The optimised model training performance achieved 93% accuracy overall for the three groups. [Pg.324]

There is an approach in QSRR in which principal components extracted from analysis of large tables of structural descriptors of analytes are regressed against the retention data in a multiple regression, i.e., principal component regression (PCR). Also, the partial least square (PLS) approach with cross-validation 29 finds application in QSRR. Recommendations for reporting the results of PC A have been published 130). [Pg.519]

The number of principal components that should be used for a PLS-1 or PLS-2 analysis is usually determined by first calculating the root-mean-square error of cross validation (RMSECV) using one principal component (PC.) The process is repeated using 2, 3, 4, and so on, PCs. The RMSECV, which is sometimes called... [Pg.217]

Figure 8 PLS analysis derives vectors u and t from the Y block (or y vector BA, = logarithms of relative affinities or other biological activities) and the X block (S,y = steric field variable of molecule i in the grid point j E,y = electrostatic field variable of molecule i in the grid point j) that are related to principal components. These latent variables are skewed within their confidence hyperboxes to achieve a maximum intercorrelation (diagram). SAMPLS is a PLS modification which first derives the covariance matrix of the X block and then the PLS result from this covariance matrix. Especially in cross-validation (see below), SAMPLS analysis is much faster than ordinary PLS analysis...

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...