K-fold cross validation

Leaving out one object at a time represents only a small perturbation to the data when the number (n) of observations is not too low. The popular LOO procedure has a tendency to lead to overfitting, giving models that have too many factors and a RMSPE that is optimistically biased. Another approach is k-fold cross-validation where one applies k calibration steps (5 < k < 15), each time setting a different subset of (approximately) n/k samples aside. For example, with a total of 58 samples one may form 8 subsets (2 subsets of 8 samples and 6 of 7), each subset tested with a model derived from the remaining 49 or 50 samples. In principle, one may repeat this / -fold cross-validation a number of times using a different splitting [20]. [Pg.370]

Cross-validation is used to estimate the generalization error of a model or to compare the performance of different models. K-fold cross-validation divides a data set into k different subsets of equal size n. The validation procedure includes k runs and applies a round-robin approach. During each run one of the k subsets is left out and used as the test set while the remaining subsets are used for training the model. Leave-one-out cross-validation is present if k equals the sample size (i.e., each subset includes only one case). The selection between leave-one-out cross-validation and k-fold cross-vahdation depends on the situation. The former is preferred for continuous error functions, whereas the latter is preferred for determining the number of misclassified cases. A frequent value for k-fold cross-validation is k = 10. [Pg.420]

The WEKA software suite [23] has been used in carrying out the experiments. The results were evaluated using Accuracy (Acc). For the training and validation steps, we used k-fold cross-validation with k = 10. Cross-validation is a robust validation method for variable selection [24]. Repeated cross-validation (as calculated by the WEKA environment) allows robust statistical tests. We also use the measurement provided automatically by WEKA Coverage of cases (0.95 level). [Pg.277]

A general method for testing the suitability of a particular ANN architecture with respect to the approximation capability is cross-validation, more precisely k-fold cross-validation (k > 2). This can be equally well used to test the suitability of any particular choice of any ANN property that has to be selected from various possibilities, e.g., a particular choice of the activation function. Cross-validation is actually a general method for choosing parameters and other properties of statistical models (Hand, 1997 Berthold and Hand, 2002). In the context of ANNs trained with catalytic data, the method proceeds as follows ... [Pg.135]

It has been demonstrated that, although these parameters are all based on the goodness of fit RSS, they are related to the prediction ability of a model as it can be usually estimated from validation techniques. For example, in the case of the v-fold cross-validation parameter PRESS (k, v), where vis the v-dimensional subset of objects in turn deleted from the training set, PRESS (k, v) is asymptotically equivalent, for n —> oc, to FPE with... [Pg.643]

Cross-validation is a leave-one-out or leave-some-out validation technique in which part of the data set is reserved for validation. Essentially, it is a data-splitting technique. The distinction lies within the manner of the split and the number of data sets evaluated. In the strict sense a -fold cross-validation involves the division of available data into k subsets of approximately equal size. Models are built k times, each time leaving out one of the subsets from the build. The k models are evaluated and compared as described previously, and a hnal model is dehned based on the complete data set. Again, this technique as well as all validation strategies offers flexibility in its application. Mandema et al. successfully utilized a cross-validation strategy for a population pharmacokinetic analysis with oxycodone in which a portion of the data was reserved for an evaluation of predictive performance. Although not strictly a cross-validation, it does illustrate the spirit of the approach. [Pg.341]

Both k and v can adversely or positively affect this estimate of the true error. For example, a larger v (e.g., v = 10) results in a smaller proportion of A in the test set thus, a higher proportion in the training set decreases the bias. An additional way to decrease the bias inherent in this estimate is by performing repeated v-fold cross-validation. As such, the entire process is repeated v times, each time initially assigning the observations to different partitions, implementing n-fold CV as described above and subsequently averaging the v error estimates. [Pg.229]

Let k < tn. In /c-fold cross-validation (CV) m is randomly partitioned into k subsets of... [Pg.226]

Camacho J, Ferrer A. Cross-validation in PCA models with the element-wise k-fold (ekf) algorithm theoretical aspects. J Chemometr 2012 26 361-73. [Pg.137]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...