Leave one out

The maximum number of latent variables is the smaller of the number of x values or the number of molecules. However, there is an optimum number of latent variables in the model beyond which the predictive ability of the model does not increase. A number of methods have been proposed to decide how many latent variables to use. One approach is to use a cross-validation method, which involves adding successive latent variables. Both leave-one-out and the group-based methods can be applied. As the number of latent variables increases, the cross-validated will first increase and then either reach a plateau or even decrease. Another parameter that can be used to choose the appropriate number of latent variables is the standard deviation of the error of the predictions, SpREss ... [Pg.725]

The second task discussed is the validation of the regression models with the aid of the cross-validation (CV) procedures. The leave-one-out (LOO) as well as the leave-many-out CV methods are used to evaluate the prognostic possibilities of QSAR. In the case of noisy and/or heterogeneous data the LM method is shown to exceed sufficiently the LS one with respect to the suitability of the regression models built. The especially noticeable distinctions between the LS and LM methods are demonstrated with the use of the LOO CV criterion. [Pg.22]

Return to Step 2, above. Add the new PRESS value calculated in step 3, to the PRESS values calculated so far. Continue this process until PRESS values for all combinations of "leave one out" have been computed and summed. [Pg.108]

This procedure is known as "leave one out" cross-validation. This is not the only way to do cross-validation. We could apply this approach by leaving out all permutations of any number of samples from the training set. The only constraint is the size of the training set, itself. Nonetheless, whenever the term cross-validation is used, it almost always refers to "leave one out" cross-validation. [Pg.108]

Many people use the term PRESS to refer to the result of leave-one-out cross-validation. This usage is especially common among the community of statisticians. For this reason, the terms PRESS and cross-validation are sometimes used interchangeably. However, there is nothing inate in the definition of PRESS that need restrict it to a particular set of predictions. As a result, many in the chemometrics community use the term PRESS more generally, applying it to predictions other than just those produced during cross-validation. [Pg.168]

When applied to QSAR studies, the activity of molecule u is calculated simply as the average activity of the K nearest neighbors of molecule u. An optimal K value is selected by the optimization through the classification of a test set of samples or by the leave-one-out cross-validation. Many variations of the kNN method have been proposed in the past, and new and fast algorithms have continued to appear in recent years. The automated variable selection kNN QSAR technique optimizes the selection of descriptors to obtain the best models [20]. [Pg.315]

In QSAR equations, n is the number of data points, r is the correlation coefficient between observed values of the dependent and the values predicted from the equation, is the square of the correlation coefficient and represents the goodness of fit, is the cross-validated (a measure of the quality of the QSAR model), and s is the standard deviation. The cross-validated (q ) is obtained by using leave-one-out (LOO) procedure [33]. Q is the quality factor (quality ratio), where Q = r/s. Chance correlation, due to the excessive number of parameters (which increases the r and s values also), can. [Pg.47]

It is usual to have the coefficient of determination, r, and the standard deviation or RMSE, reported for such QSPR models, where the latter two are essentially identical. The value indicates how well the model fits the data. Given an r value close to 1, most of the variahon in the original data is accounted for. However, even an of 1 provides no indication of the predictive properties of the model. Therefore, leave-one-out tests of the predictivity are often reported with a QSAR, where sequentially all but one descriptor are used to generate a model and the remaining one is predicted. The analogous statistical measures resulting from such leave-one-out cross-validation often are denoted as and SpR ss- Nevertheless, care must be taken even with respect to such predictivity measures, because they can be considerably misleading if clusters of similar compounds are in the dataset. [Pg.302]

Fig. 36.10. Prediction error (RMSPE) as a function of model complexity (number of factors) obtained from leave-one-out cross-validation using PCR (o) and PLS ( ) regression.

Further analysis yielded new models for each of the chemical classes with improved statistical significance. The final model for nonaromatics contained six descriptors and had an Rs of 0.932 (leave-one-out 0.878), the final model for the aromatics contained 21 descriptors and had an Rs of 0.942 (leave-one-out 0.823), and the final model for the heteroaromatics contained 13 descriptors and had an Rs value of 0.863 (leave-one-out 0.758). These statistical results were considered reliable enough for the models to be regarded as predictive. The analysis did yield some interesting insights into the impact of various structural fragments on human oral bioavailability. However, these observations were based on the sign of the coefficient and so must be treated with some caution. [Pg.450]

The predictive quality of the models is judged according to the cross-validated R2, known as q2, obtained using the leave-one-out (LOO) approach, which is calculated as follows ... [Pg.486]

Here, y are the estimated y values using OLS for all n observations. Therefore, for estimating the prediction error with full CV, it suffices to perform an OLS regression with all n observations. This yields the same estimation of the prediction error as leave-one-out CV, but saves a lot of computing time. [Pg.143]

The most reliable approach would be an exhaustive search among all possible variable subsets. Since each variable could enter the model or be omitted, this would be 2m - 1 possible models for a total number of m available regressor variables. For 10 variables, there are about 1000 possible models, for 20 about one million, and for 30 variables one ends up with more than one billion possibilities—and we are still not in the range for m that is standard in chemometrics. Since the goal is best possible prediction performance, one would also have to evaluate each model in an appropriate way (see Section 4.2). This makes clear that an expensive evaluation scheme like repeated double CV is not feasible within variable selection, and thus mostly only fit-criteria (AIC, BIC, adjusted R2, etc.) or fast evaluation schemes (leave-one-out CV) are used for this purpose. It is essential to use performance criteria that consider the number of used variables for instance simply R2 is not appropriate because this measure usually increases with increasing number of variables. [Pg.152]

For each chromosome (variable subset), a so-called fitness (response, objective function) has to be determined, which in the case of variable selection is a performance measure of the model created from this variable subset. In most GA applications, only fit-criteria that consider the number of variables are used (AIC, BIC, adjusted R2, etc.) together with fast OLS regression and fast leave-one-out CV (see Section 4.3.2). Rarely, more powerful evaluation schemes are applied (Leardi 1994). [Pg.157]

See also in sourсe #XX -- [ Pg.701 ]

See also in sourсe #XX -- [ Pg.100 , Pg.107 , Pg.118 ]

See also in sourсe #XX -- [ Pg.37 , Pg.110 , Pg.124 , Pg.229 ]

See also in sourсe #XX -- [ Pg.61 , Pg.97 ]

See also in sourсe #XX -- [ Pg.34 , Pg.148 , Pg.163 ]

See also in sourсe #XX -- [ Pg.160 , Pg.317 , Pg.319 , Pg.367 , Pg.374 , Pg.377 , Pg.418 , Pg.535 , Pg.587 ]

See also in sourсe #XX -- [ Pg.15 ]

See also in sourсe #XX -- [ Pg.131 ]

See also in sourсe #XX -- [ Pg.701 ]

See also in sourсe #XX -- [ Pg.81 ]

See also in sourсe #XX -- [ Pg.329 ]

See also in sourсe #XX -- [ Pg.76 ]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...