Cross-validation technique

In PAT, one is often faced with the task of building, optimizing, evaluating, and deploying a model based on a limited set of calibration data. In such a situation, one can use model validation and cross-validation techniques to perform two of these functions namely to optimize the model by determining the optimal model complexity and to perform preliminary evaluation of the model s performance before it is deployed. There are several validation methods that are commonly used in PAT applications, and some of these are discussed below. [Pg.408]

In this example, two principal components are arbitrarily selected. More or fewer may be necessary, and this is a function of a predetermined stopping rule for extraction of principal components from X. In SIMCA method, a cross validation technique (2) is used. [Pg.246]

Schmoor C, Sauerbrei W, Schumacher M (2000). Sample size considerations for the evaluation of prognostic factors in survival analysis. Statistics in Medicine 19 441-452 Schumacher M, Hollander N, Sauerbrei W (1997). Resampling and cross-validation techniques a tool to reduce bias caused by model building Statistics in Medicine 16 2813-2827... [Pg.193]

Burden, F.R., Brereton, R.G. and Walsh, P.T. (1997). A Comparison of Cross-Validation and non-Cross-Validation Techniques Application to Polycyclic Aromatic Hydrocarbons Electronic Absorption Spectra. The Analyst, 122,1015-1022. [Pg.545]

It is important to know how many principal components (factors) should be retained to accurately describe the data matrix D in Eq. (15), and still reduce the amount of noise. A common method used is the cross validation technique, which provides a pseudo-predictive method to estimate the number of factors to retain. The cross validation technique leaves a percentage of the data (y %) out at a time. Using this reduced data set, PCA is again carried out to provide new loading and scores. These are then used to predict the deleted data and then used to calculate the ensuing error dehned by... [Pg.56]

In order to obtain independent model estimates in component models a different cross-validation technique has to be used. Instead of leaving out a complete sample/row, it is possible to leave out only one (or a few) elements. Using an algorithm that handles missing data, it is possible to estimate the relevant component model without the left-out element. The estimate of this element is obtained from the model of the whole data array (tirPjr in case of leaving out element Xij in PCA) where there is no dependence between the left-out element and the model. This is the basis for the cross-validation routines in several papers [Louwerse et al. 1999],... [Pg.149]

Cross-validation estimates model robustness and predictivity to avoid overfitting in QSAR [27]. In 3D-QSAR models, PLS and NN model complexity are established by testing the significance of adding a new dimension to the current QSAR, i.e., a PLS component or a hidden neuron, respectively. The optimal number of PLS components or hidden neurons is usually chosen from the analysis with the highest q2 (cross-validated r2) value, Eq. (3). The most popular cross-validation technique is leave-one-out (LOO), where each compound is left out of the model once and only once, yielding reproducible results. An extremely fast LOO method, SAMPLS [42], which evaluates the covariance matrix only, allows the end user to rapidly estimate the robustness of 3D-QSAR models. Randomly repeated cross-validation rounds using leave 20% out (L5G), or leave 50% out (L2G), are routinely used to check internal... [Pg.574]

An alternative approach to using traditional parametric statistical methods to calculate the significance of fitted correlations, would be to directly assess the model based on its ability to predict, rather than merely to assess how well the model fits the training set. When the quality of the model is assessed by the prediction of a test set, rather than the fit of the model to its training set, a statistic related to r or can be defined, and denoted q or q, to indicate that the quality measure is assessed in prediction. A q may be calculated by internal cross-validation techniques, or by the quality of predictions of an independent test set, in which case an upper-case is used. The equation to calculate q (or Q ) is shown in equation 9.3. [Pg.248]

Correlation of experimental and calculated activities assesses the quality of 3D-QSAR models. The squared correlation coefficient (r ) yielded by this statistics is a measure of the goodness of fit. The robustness of the model is tested via cross-validation techniques (leave-x%-out), indicating the goodness of prediction q ). Models with > 0.4—0.5 are considered to yield reasonable predictions for hypo-... [Pg.1179]

To assess the predictive ability of a QSAR in the frame of MTD method the cross-validation technique is used, in which one supposes that one or more of the known experimental values are in fact unknown . The analysis is repeated, excluding the temporarily unknown compotmds. The resulting equations are used to predict the experimental measurements for the omitted compound(s), and the resulting individual squared errors of prediction are accumulated. The cross-validation cycle is repeated, leaving one out (LOO) or more (LMO) different compotmd(s), until each compound has been excluded and predicted exactly once. The result of cross-validation is the predictive discrepancy sum of squares, sometimes called PRESS (Predictive REsidual Sum of Squares) ... [Pg.360]

Several cross-validation techniques are readily implemented in GOLPE (generating optimal linear PLS estimations). This program performs chemo-metric analyses on GRID and CoMFA fields, and it can be used to further refine the PLS model. In the two-random-groups cross-validation procedure,... [Pg.154]

In practice, the soft margin versions of the standard SVM (also known as C-SVM) described in the previous sections often suffer from the following problems. Firstly, there is a problem of how to determine the error penalty parameter C. Although the cross-validation technique can be used to determine this parameter, it is still hard to explain. Secondly, the time taken for a support vector classifier to compute the class of a new sample is proportional to the number of support vectors, so if that number is large, the computation is time-consuming. [Pg.51]

The quality of developed PLS model was evaluated by cross-validation technique. The values of root mean square error of cross validation obtained are relatively low, 1.68% and 1.32% (volume/volume), respectively for com oil and sunflower. Based on this result, the method developed has a good ability to estimate the percentage of com oil and sunflower as oil adulterants in virgin eoconut oil samples. [Pg.150]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...