Cross-validation procedure

When not enough examples are available to make an independent monitoring set, the cross-validation procedure can be applied (see Chapter 10). The data set is split into C different parts and each part is used once as monitoring set. The network is trained and tested C times. The results of the C test sessions give an indication on the performance of the network. It is strongly advised to validate the network that has been trained by the above procedure with a second independent test set (see Section 44.5.10). [Pg.677]

E.P.P.A. Derks, M.L.M. Beckers, W.J. Meissen and L.M.C. Buydens, A parallel cross-validation procedure for artificial neural networks. Computers Chem., 20 (1995) 439-448. [Pg.696]

Figure 3.13 shows the result of this procedure for groups 3 and 4 from the glass vessels data from Section 1.5.3 (n = 20, m 13, (Janssen et al. 1998)). The cross validation procedure with four segments is repeated 100 times, resulting in 100... [Pg.90]

Most of the QSAR-modeling methods implement the leave-one-out (LOO) (or leave-some-out) cross-validation procedure. The outcome from this procedure is a... [Pg.438]

The raw data are discussed in detail in Section 5.2.2.2. In the ICLS and MIR ap plications the data are split into calibration and validation sets. For this PLS analysis, the 95 spectra from the 12 design points are all used to construct the model using a leave-one-out cross-validation procedure. [Pg.341]

In another work, Parra and coworkers proposed a method based on chemically modified voltammetric electrodes for the identification of adulterations made in wine samples, by addition of a number of forbidden adulterants frequently used in the wine industry to improve the organoleptic characteristics of wines, like, for example, tartaric acid, tannic acid, sucrose, and acetaldehyde (Parra et ah, 2006b). The patterns identified via PCA allowed an efficient detection of the wine samples that had been artificially modified. In the same study, PLS regression was applied for a quantitative prediction of the substances added. Model performances were evaluated by means of a cross-validation procedure. [Pg.99]

Oliveri et al. (2009) presented the development of an artificial tongue based on cyclic voltammetry at Pt microdisk electrodes for the classification of olive oils according to their geographical origin the measurements are made directly in the oil samples, previously mixed with a proper quantity of a RTIL (room temperature ionic liquid). The pattern recognition techniques applied were PCA for data exploration and fc-NN for classification, validating the results by means of a cross-validation procedure with five cancellation groups. [Pg.107]

Schedule of the leave-one-out cross-validation scheme. Any cross-validation procedure will perform in the same way although considering more samples at each validation segment. [Pg.206]

The leave-one-out method is among the simplest ones to use because it requires no input parameters. Because it runs N cross-validation procedures, it can take a rather long time if N is large, depending on the processing speed of the computer. The concern about representative validation samples is usually not an issue with this method because it tests... [Pg.272]

Thus, PLS (but not MR) assumes that data X may contain a structure irrelevant to the relation with Y. This is the philosophical difference between MR and PLS. In general, PLS does not fit more dimensions to a set of data than those that improve the predictive ability of the model. This is ensured by the cross-validation procedure (see Appendix A). [Pg.304]

In the NN method, the property F of the target compound is calculated as an average (or weighted average) of that for its NN in the space of descriptors selected for the model. Different metrics (Euclidian distances, Tanimoto similarity coefficients, etc.), can be used to identify the neighbors. Their number k is optimized using a cross-validation procedure for the training set. [Pg.325]

We should, however, remark that in reality the data analyst will look for more samples, will probably try several cross-validation procedures, will test the classification functions using independent test sets, and so on. [Pg.195]

Sometimes the question arises whether it is possible to find an optimum regression model by a feature selection procedure. The usual way is to select the model which gives the minimum predictive residual error sum of squares, PRESS (see Section 5.7.2) from a series of calibration sets. Commonly these series are created by so-called cross-validation procedures applied to one and the same set of calibration experiments. In the same way PRESS may be calculated for a different sets of features, which enables one to find the optimum set . [Pg.197]

Figure 13.8 An extensive cross-validation procedure to validate the Decision Tree classification model. In this method, the NCTR dataset was divided into 2 groups, 2/3 for training and 1/3 for testing. The process was repeated 2000 times. Concordance was calculated based on the misclassifications divided by the number of training chemicals for the training models and the misclassifications divided by the testing chemicals for prediction.

CM 18.1 to CM 18.3 were assessed in terms of their Cooper statistics, which define an upper limit to predictive performance. In addition, cross-validated Cooper statistics, which provide a more realistic indication of a model s capacity to predict the classifications of independent data, were obtained by applying the threefold cross-validation procedure to the best-sized CTs. In the threefold cross-validation procedure, the data set is randomly divided into three approximately equal parts, the CT is re-parameterized using two thirds of the data, and predicted classifications are made for the remaining third of the data. The cross-validated Cooper statistics are the mean values of the usual Cooper statistics, taken over the three iterations of the cross-validation procedure. The Cooper statistics for CM 18.1 to CM 18.3 are summarized in Table 18.6. [Pg.406]

The results with PCR and PLS regression include the number of PCs obtained by leave-one-out cross-validation procedure, the values of regression coefficients for X variables, the value of R, and the root mean square error of calibration (RMSE C ) and the root mean square error of prediction by cross-validation proce-... [Pg.708]

Figure 6.5 Relationships between the sensitivity [7 P/(7 P +FN)] (shown by the curve), specificity [TNj TN+FP) and accuracy (concordance) [ TP + TN)j TP + FP + TN+FN)] as functions of False Positive Rate [FPj TN + PP)]. The estimations were obtained by PASS 2007 in a leave-one-out cross-validation procedure for antineoplastic activity.

The CoLiBRI models were developed using standard leave-one-out cross-validation procedure as follows ... [Pg.313]

The LOO cross-validated q oo values for the initial models was 0.875 using the water probe and 0.850 using the methyl probe. The application of the SRD/FFD variable selection resulted in an improvement of the significance of both models. The analysis yielded a correlation coefficient with a cross-validated q Loo of 0.937 for the water probe and 0.923 for the methyl probe. In addition we tested the reliability of the models by applying leave-20%-out and leave-50%-out cross-validation. Both models are also robust, indicated by high correlation coefficients of = 0.910 (water probe, SDEP = 0.409) and 0.895 (methyl probe, SDEP = 0.440) obtained by using the leave-50%-out cross-validation procedure. The statistical results gave confidence that the derived model could also be used for the prediction of novel compounds. [Pg.163]

Validate this HDP by a standard cross-validation procedure, which generates the cross-validated (or q ) value for the kNN-QSAR model built by use of this HDP. The standard leave-one-out procedure has been implemented as follows (0 Eliminate a compound from the training set. Hi) Calculate the activity of the eliminated compound, which is treated as an unknown, as the average activity of the k most similar compounds found in the remaining molecules (k is set to 1 initially). The similarities between compounds are calculated using only the selected descriptors (i.e., the current trial HDP) instead of the whole set of descriptors, iiii) Repeat this procedure until every compound in the training set has been eliminated and predicted once, iiv)... [Pg.63]

Comparative analysis of the performance of various algorithms has been carried out in the past (Kabsh and Sander, 1983). However, this task can be deceptive if factors such as the selection of proteins for the testing set and the choice of the scoring index are not carried out properly. The present work alms to provide an updated evaluation of several predictive methods with a testing set size that permits to obtain more accurate statistics, which in turn can possibly measure the usefulness of the information gathered by those methods and also identify trends that characterize the behavior of individual algorithms. Further, we present a uniform testing of these methods, vis-a-vis the size of the datasets, the measure of accuracy and proper cross-validation procedures. [Pg.783]

The simplest and most general cross-validation procedure is the leave-one-out technique (LOO technique), where each object is taken away, one at a time. In this case, given n objects, n reduced models have to be calculated. This technique is particularly important as this deletion scheme is unique, and the predictive ability of the different models can be compared accurately. However, in several cases, the predictive ability obtained is too optimistic, particularly when the number of objects is quite large. This is because of a too small perturbation of the data when only one object is left out. [Pg.462]

When the number of objects is not too small, more realistic predictive abilities are obtained by deleting more than one object at each step. To apply this cross-validation procedure, called the leave-more-out technique (LMO technique), the number of cancellation groups is defined by the user, i.e. the number of blocks the data are divided into, and, at each step, all the objects belonging to a block are left out from the calculation of the model. [Pg.462]

The NIPALS algorithm can tolerate missing data. It is therefore possible to compute a principal components model if data are left out from the data matrix during the modelling process. This can be used to determine whether or not a new component is significant by examining how well the expanded model with the new component can predict left-out data, as compared to the model without the new component. If the new component does not improve the predictions, it is considered not to be significant. The cross validation procedure can be summarized as follows ... [Pg.364]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...