Leave-one-out approach

The alternative to the leave-one-out approach to the JKK is the grouped or blocked JKK (18). Here there are g blocks of size s. The grouped JKK can save time by executing the PM procedure on the g blocks. Here again, one has an estimator of 6, t, which is the estimate of 6 with the block eliminated. Next, a pseudo value is calculated as follows ... [Pg.403]

For this purpose the "leave one out" approach (also known as the jack-knife method (13,20)) was used the activity of each compound was predicted by the equation obtained from linear regression analysis on the sample leaving out the compound In question. The calculated standard deviation of the difference between the calculated and measured values Is, of course, larger than that obtained the usual way. Its Increase, however, brings plausible Information on the predictive power of the regression equation, as well as which compound s removal would Increase the significance of the equation most. Also Informative, which parameters and how frequently are considered In the equations obtained In the "leave one out" approach their continuous and systematic appearance also suggests a "real" correlation. [Pg.178]

The availability of many different similarity measures necessitates methods for comparing their effectiveness. This is normally done using an approach that is based on the similar-property principle. This principle states that structurally similar molecules are expected to exhibit similar properties (or activities). If the principle applies, then the effectiveness of a structurally based similarity procedure can be determined by the extent to which the similarities resulting from its use mirror similarities in property, this being effected by means of simulated property prediction experiments based on a leave-one-out approach. [Pg.2754]

The maximum number of latent variables is the smaller of the number of x values or the number of molecules. However, there is an optimum number of latent variables in the model beyond which the predictive ability of the model does not increase. A number of methods have been proposed to decide how many latent variables to use. One approach is to use a cross-validation method, which involves adding successive latent variables. Both leave-one-out and the group-based methods can be applied. As the number of latent variables increases, the cross-validated will first increase and then either reach a plateau or even decrease. Another parameter that can be used to choose the appropriate number of latent variables is the standard deviation of the error of the predictions, SpREss ... [Pg.725]

This procedure is known as "leave one out" cross-validation. This is not the only way to do cross-validation. We could apply this approach by leaving out all permutations of any number of samples from the training set. The only constraint is the size of the training set, itself. Nonetheless, whenever the term cross-validation is used, it almost always refers to "leave one out" cross-validation. [Pg.108]

The predictive quality of the models is judged according to the cross-validated R2, known as q2, obtained using the leave-one-out (LOO) approach, which is calculated as follows ... [Pg.486]

The most reliable approach would be an exhaustive search among all possible variable subsets. Since each variable could enter the model or be omitted, this would be 2m - 1 possible models for a total number of m available regressor variables. For 10 variables, there are about 1000 possible models, for 20 about one million, and for 30 variables one ends up with more than one billion possibilities—and we are still not in the range for m that is standard in chemometrics. Since the goal is best possible prediction performance, one would also have to evaluate each model in an appropriate way (see Section 4.2). This makes clear that an expensive evaluation scheme like repeated double CV is not feasible within variable selection, and thus mostly only fit-criteria (AIC, BIC, adjusted R2, etc.) or fast evaluation schemes (leave-one-out CV) are used for this purpose. It is essential to use performance criteria that consider the number of used variables for instance simply R2 is not appropriate because this measure usually increases with increasing number of variables. [Pg.152]

A common approach to cross-validation is called leave-one-out" cross-validation. Here one sample is left out, a PC model with given number of factors is calculated using the remaining samples, and then the residua of the sample left out is computed. This is repeated for each sample and for models with 1 to n PCs. The result is a set of cross-validation residuals for a given number of PCs. The residuals as a function of the number of PCs can be examined graphically as discussed above to determine the inherent dimensionality. In practice, the cross-validation residuals are summarized into a single number termed the Root Mean Squared Error of Cross Validation for PCA (RMSECV PCA), calculated as follows ... [Pg.230]

The software requires the following information the concentration and spectral data, the preprocessing selections, the maximum number of factors to estimate, and the validation approach used to choose the optimal number of factors. The maximum rank selected is 10 for constructing the model to predict the caustic concentration. The validation technique is leave-one-out cross-validation where an entire design point is left out. Tliat is, there are 12 cross validation steps and all spectra for each standard (at various temperatures) are left out of the model building phase at each step. [Pg.341]

The cross-validation approach can also be used to estimate the predictive ability of a calibration model. One method of cross-validation is leave-one-out cross-validation (LOOCV). Leave-one-out cross-validation is performed by estimating n calibration models, where each of the n calibration samples is left out one at a time in turn. The resulting calibration models are then used to estimate the sample left out, which acts as an independent validation sample and provides an independent prediction of each y, value, y(i), where the notation i indicates that the /th sample was left out during model estimation. This process of leaving a sample out is repeated until all of the calibration samples have been left out. The predictions y(i) can be used in Equation 5.10 to estimate the RMSECV. However, LOOCV has been shown to determine models that are overfitting (too many parameters are included) [7, 8], The same is true for v-fold cross-validation, where the calibrations set is split into... [Pg.115]

The validation of N NMR prediction is best performed by comparing the predicted shifts for compounds not in the database with the experimental shifts available in the literature or measured directly. ACD/Labs have reported [42] a statistical analysis of their N NMR prediction. Using a classical leave-one-out (LOO) approach they predicted the N shifts for >8300 individual chemical stmctures contained within the ACD/NNMR v 8.08 NNMR program database. The resulting analysis gave a correlation coefficient of = 0.97 over 21 244 points. The distribution in deviations between the experimental values and the predicted values using this LOO approach is shown in Figure 14.5. [Pg.420]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...