Robustness cross-validation

Noteworthy are the articles from Altria et al. and Assi et al. describing the robustness and validation of the determination of potassium as a counterion. An intercompany cross-validation of the determination of sodium in an acidic drug salt was also published. [Pg.338]

A widely used approach to establish model robustness is the randomization of response [25] (i.e., in our case of activities). It consists of repeating the calculation procedure with randomized activities and subsequent probability assessments of the resultant statistics. Frequently, it is used along with the cross validation. Sometimes, models based on the randomized data have high q values, which can be explained by a chance correlation or structural redundancy [26]. If all QSAR models obtained in the Y-randomization test have relatively high values for both and LOO (f, it implies that an acceptable QSAR model cannot be obtained for the given dataset by the current modeling method. [Pg.439]

For a well-behaved sensor array, only a small subset k of n available PCs is sufficient to characterize the matrix. Once again, Principal Component Regression (PCR) is a data reduction tool. The robustness of the selection of k can be tested by cross-validation in which case data subsets are randomly selected and the error matrix H xn is calculated. [Pg.323]

The classification model was cross-validated using leave one spectrum out to test the robustness of the model. By doing so the individual spectrum is... [Pg.331]

PCs are ranked according to the fraction of variance of the dataset that they explain. The first PC is the most important (it explains the largest fraction of variance), and so forth. Selecting the correct number of PCs is crucial. Too few PCs will leave important information out of the model, but too many PCs will include noise, and decrease the model s robustness (if R J, the PCA is pointless). Each time you make a new PCA model, you should examine the residuals matrix E. If the residuals are structured, it means that some information is left out. You can also decide on the correct number of PCs by performing a cross-validation (see below), or by examining the percentage of the variance explained by the model. [Pg.260]

The fourth recommendation of the OECD experts is related to appropriate measuring and reporting goodness-of-fit, robustness, and predictivity of the model. The main intention was to clearly distinguish, whether a measure was derived only from the training set, from the internal validation (i.e., cross-validation, where the same chemicals are used for training and validation, but not at the same time) or from validation with use of an external set of compounds, not previously engaged in model optimization and/or calibration (external validation). A widely applied measure of fit is the squared correlation coefficient R2-1 — (RSS/ TSS), where RSS is the residual sum of squares and TSS is the total sum of squares... [Pg.205]

The correct reclassification rate of discriminant function (Equation 8.3) amounts to 91.1% (Class 1 93.3% Class 2 88.5%) with a fairly stable cross validation (all compounds 80.4% Class 1 76.7% Class 2 84.6%). Cross validation is a tool to assess the robustness of the model, and is performed by constructing a model on two thirds of the compounds, and checking the ability of the model to predict the activity of the remaining one third correctly. [Pg.188]

When additional data are available, a QS AR model should be validated by predicting the activity of other chemicals not used in the training set, but whose activities are known (i.e., the test set). This is called external validation. The major difference between the cross-validation and external validation is that the chemicals selected in the latter case are in a sense random. This provides a more robust evaluation of the model s predictive capability for untested chemicals than cross-validation. We feel strongly that the confidence in a model s predictive capability can be tested and validated when robust prediction has been demonstrated with an external test set. Further details regarding a formal framework for the validation of QSARs are provided in Chapter 20. [Pg.307]

The R-RMSECV values are rather time consuming because, for every choice of k, they require the whole RPCR procedure to be performed n times. Faster algorithms for cross validation are described [80], They avoid the complete recomputation of resampling methods, such as the MCD, when one observation is removed from the data set. Alternatively, one could also compute a robust R2-value [61], For q= 1 it equals ... [Pg.199]

Engelen, S. and Hubert, M., Fast cross-validation for robust PCA, Proc. COMPSTAT 2004, J. Antoch, Ed., Springer-Verlag, Heidelberg, 989-996, 2004. [Pg.214]

A rapid, robust ICP-MS method was described for the determination of I in food for human consumption as well as for pets [34], The sample preparation was made by alkaline hydrolysis with TMAH using either MW heating or the high-pressure asher. Method validation was carried out using seven food CRMs with certiPed I content and by cross-validation with GC, neutron activation analysis (NAA) and colorimetry. The method has been proven to give accurate and repeatable results for a range of fortiPed food commodities for human and animal consumption. [Pg.27]

The LOO cross-validated q oo values for the initial models was 0.875 using the water probe and 0.850 using the methyl probe. The application of the SRD/FFD variable selection resulted in an improvement of the significance of both models. The analysis yielded a correlation coefficient with a cross-validated q Loo of 0.937 for the water probe and 0.923 for the methyl probe. In addition we tested the reliability of the models by applying leave-20%-out and leave-50%-out cross-validation. Both models are also robust, indicated by high correlation coefficients of = 0.910 (water probe, SDEP = 0.409) and 0.895 (methyl probe, SDEP = 0.440) obtained by using the leave-50%-out cross-validation procedure. The statistical results gave confidence that the derived model could also be used for the prediction of novel compounds. [Pg.163]

In summary, this first report of a classification strategy specifically tailored to the classification of biomedical spectral data shows that reliable and robust classification must satisfy the following criteria adequate data set size proper data reduction proper data classification (balanced training set with cross-validation, e.g. LOO) the use of several classifiers and choice of appropriate consensus classification. [Pg.86]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...