Regression cross model validation

L. Gidskehang, E. Anderssen and B.K. Alsberg, Cross model validation and optimisation of bihnear regression models, Chemom. Intell. Lab. Syst., 93, 1-10 (2008). [Pg.438]

Westad F, Afseth NK, Bro R. Finding relevant spectral regions between spectroscopic techniques by use of cross-model validation and partial least squares regression. Anal Chim Acta 2007 595 323-7. [Pg.185]

The literature of the past three decades has witnessed a tremendous explosion in the use of computed descriptors in QSAR. But it is noteworthy that this has exacerbated another problem rank deficiency. This occurs when the number of independent variables is larger than the number of observations. Stepwise regression and other similar approaches, which are popularly used when there is a rank deficiency, often result in overly optimistic and statistically incorrect predictive models. Such models would fail in predicting the properties of future, untested cases similar to those used to develop the model. It is essential that subset selection, if performed, be done within the model validation step as opposed to outside of the model validation step, thus providing an honest measure of the predictive ability of the model, i.e., the true q2 [39,40,68,69]. Unfortunately, many published QSAR studies involve subset selection followed by model validation, thus yielding a naive q2, which inflates the predictive ability of the model. The following steps outline the proper sequence of events for descriptor thinning and LOO cross-validation, e.g.,... [Pg.492]

Spectrophotometric monitoring with the aid of chemometrics has also been applied to more complex mixtures. To solve the mixtures of corticosteroid de-xamethasone sodium phosphate and vitamins Bg and Bi2, the method involves multivariate calibration with the aid of partial least-squares regression. The model is evaluated by cross-validation on a number of synthetic mixtures. The compensation method and orthogonal function and difference spectrophotometry are applied to the direct determination of omeprazole, lansoprazole, and pantoprazole in grastroresistant formulations. Inverse least squares and PCA techniques are proposed for the spectrophotometric analyses of metamizol, acetaminophen, and caffeine, without prior separation. Ternary and quaternary mixtures have also been solved using iterative algorithms. [Pg.4519]

The second task discussed is the validation of the regression models with the aid of the cross-validation (CV) procedures. The leave-one-out (LOO) as well as the leave-many-out CV methods are used to evaluate the prognostic possibilities of QSAR. In the case of noisy and/or heterogeneous data the LM method is shown to exceed sufficiently the LS one with respect to the suitability of the regression models built. The especially noticeable distinctions between the LS and LM methods are demonstrated with the use of the LOO CV criterion. [Pg.22]

Fig. 36.10. Prediction error (RMSPE) as a function of model complexity (number of factors) obtained from leave-one-out cross-validation using PCR (o) and PLS ( ) regression.

Like ANNs, SVMs can be useful in cases where the x-y relationships are highly nonlinear and poorly nnderstood. There are several optimization parameters that need to be optimized, including the severity of the cost penalty , the threshold fit error, and the nature of the nonlinear kernel. However, if one takes care to optimize these parameters by cross-validation (Section 12.4.3) or similar methods, the susceptibility to overfitting is not as great as for ANNs. Furthermore, the deployment of SVMs is relatively simpler than for other nonlinear modeling alternatives (such as local regression, ANNs, nonlinear variants of PLS) because the model can be expressed completely in terms of a relatively low number of support vectors. More details regarding SVMs can be obtained from several references [70-74]. [Pg.389]

Each of the regression models is evaluated for prediction abihty, typically using cross validation. [Pg.424]

The optimal number of components from the prediction point of view can be determined by cross-validation (10). This method compares the predictive power of several models and chooses the optimal one. In our case, the models differ in the number of components. The predictive power is calculated by a leave-one-out technique, so that each sample gets predicted once from a model in the calculation of which it did not participate. This technique can also be used to determine the number of underlying factors in the predictor matrix, although if the factors are highly correlated, their number will be underestimated. In contrast to the least squares solution, PLS can estimate the regression coefficients also for underdetermined systems. In this case, it introduces some bias in trade for the (infinite) variance of the least squares solution. [Pg.275]

The pioneers of bioavailability modeling can be traced back to year 2000. Andrews and coworkers [57] developed a regression model to predict bioavailability for 591 compounds. Compared to the Lipinski s "Rule of Five," the false negative predictions were reduced from 5% to 3%, while the false positive predictions decreased from 78% to 53%. The model could achieve a relatively good correlation (r2 = 0.71) for the training set. But when 80/20 cross-validation was applied, the correlation was decreased to q1 — 0.58. [Pg.114]

In another work, Parra and coworkers proposed a method based on chemically modified voltammetric electrodes for the identification of adulterations made in wine samples, by addition of a number of forbidden adulterants frequently used in the wine industry to improve the organoleptic characteristics of wines, like, for example, tartaric acid, tannic acid, sucrose, and acetaldehyde (Parra et ah, 2006b). The patterns identified via PCA allowed an efficient detection of the wine samples that had been artificially modified. In the same study, PLS regression was applied for a quantitative prediction of the substances added. Model performances were evaluated by means of a cross-validation procedure. [Pg.99]

S. Gourvenec, J. A. Fernandez-Pierna, D. L. Massart and D. N. Rutledge, An evaluation of the PoLiSh smoothed regression and the Monte Carlo cross-validation for the determination of the complexity of a PLS model, Chemom. Intell. Lab. Syst., 68, 2003, 41-51. [Pg.238]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...