Cross-Validation and Bootstrapping

Cross validation and bootstrap techniques can be applied for a statistically based estimation of the optimum number of PCA components. The idea is to randomly split the data into training and test data. PCA is then applied to the training data and the observations from the test data are reconstmcted using 1 to m PCs. The prediction error to the real test data can be computed. Repeating this procedure many times indicates the distribution of the prediction errors when using 1 to m components, which then allows deciding on the optimal number of components. For more details see Section 3.7.1. [Pg.78]

Cross-validation is an alternative to the split-sample method of estimating prediction accuracy (5). Molinaro et al. describe and evaluate many variants of cross-validation and bootstrap re-sampling for classification problems where the number of candidate predictors vastly exceeds the number of cases (13). The cross-validated prediction error is an estimate of the prediction error associated with application of the algorithm for model building to the entire dataset. [Pg.334]

Very often a test population of data is not available or would be prohibitively expensive to obtain. When a test population of data is not possible to obtain, internal validation must be considered. The methods of internal PM model validation include data splitting, resampling techniques (cross-validation and bootstrapping) (9,26-30), and the posterior predictive check (PPC) (31-33). Of note, the jackknife is not considered a model validation technique. The jackknife technique may only be used to correct for bias in parameter estimates, and for the computation of the uncertainty associated with parameter estimation. Cross-validation, bootstrapping, and the posterior predictive check are addressed in detail in Chapter 15. [Pg.237]

The resampling approaches of cross-validation (CV) and bootstrapping do not have the drawback of data splitting in that all available data are used for model development so that the model provides an adequate description of the information contained in the gathered data. Cross-validation and bootstrapping are addressed in Chapter 15. One problem with CV deserves attention. Repeated CV has been demonstrated to be inconsistent if one validates a model by CV and then randomly shuffles the data, after shuffling, the model may not be validated. [Pg.238]

To alleviate this biased estirrratiorL resampling methods, such as cross-validation and bootstrapping, can be employed to more accurately estimate prediction error. In the next sections, these techniques are described as well as the impUcations of their use in the framework of model selection and performance assessment. [Pg.224]

Kohavi, R. A. (1995). Study of cross-validation and bootstrap for accuracy estimation and model selection. In International joint conference on artificial intelligence (pp. 1137-1143). Schramm, S. (2011). Methode zur Berechnung der Feldeffektivitat integraler Fufigdnger-schutzsySterne. Dissertation, Technische Universitdt MUnchen. [Pg.141]

If PCA is used for dimension reduction and creation of uncorrelated variables, the optimum number of components is crucial. This value can be estimated from a scree plot showing the accumulated variance of the scores as a function of the number of used components. More laborious but safer methods use cross validation or bootstrap techniques. [Pg.114]

A second possibility is to use some estimate of the variance of the loadings. This can be done by the jackknife method due to Quenouille and Tukey (see [37]) or by Efron s bootstrap method [38] (the colourful terminology stems from the expressions jack of all trades and master of none and lifting yourself up by your own bootstraps ). The use of the bootstrap to estimate the variance of the loadings in PCA has been described [39] and will not be elaborated upon further. The jackknife method is used partly because it is a natural side-product of the cross-validation and therefore computationally non-demanding and partly because the jackknife estimate of variance is used later on in conjunction with PLS. [Pg.329]

Jackknife (IKK), cross-validation, and the bootstrap are the methods referred to as resampling techniques. Though not strictly classified as a resampling technique, the posterior predictive check is also covered in this chapter, as it has several characteristics that are similar to resampling methods. [Pg.401]

Although this approach is still used, it is undesirable for statistical reasons error calculations underestimate the true uncertainty associated with the equations (17, 21). A better approach is to use the equations developed for one set of lakes to infer chemistry values from counts of taxa from a second set of lakes (i.e., cross-validation). The extra time and effort required to develop the additional data for the test set is a major limitation to this approach. Computer-intensive techniques, such as jackknifing or bootstrapping, can produce error estimates from the original training set (53), without having to collect data for additional lakes. [Pg.30]

Cramer, R.D., Bunce, J.D., Patterson, D.E. and Frank, I.E., Cross-validation, bootstrapping, and partial least squares compared with multiple regression in conventional QSAR studies, Quant. Struct.-Act. Relat., 7, 18-25, 1988. [Pg.179]

MR spectra from 33 patients with breast cancer with vascular invasion and 52 without were subjected to the SCS-based analysis. Maximally discriminatory subregions were 0.47-0.55, 0.57-0.62, 0.86-0.92, 1.00-1.03, 1.69-1.71, 1.99-2.05, 2.55-2.56 and 2.63-2.72 ppm for the first derivatives of the spectra, and 0.75-0.81, 0.90-0.94, 1.03-1.12, 1.21-1.24, 1.59-1.63, 2.00-2.04, 2.24-2.27 and 2.70-2.74 ppm for rank-ordered spectra. Using LDA and bootstrap-based cross-validation, two separate classifiers, A (using the optimal regions from the first derivatives of the spectra) and B (using the optimal regions from the rank-ordered spectra), were developed. The final classifier was the Wolpert-combined A + B classifiers.61... [Pg.102]

A variety of procedures are available to assess a model s true expected performance split sample validation, cross-validation, jackknifing, and bootstrapping. [Pg.420]

B. Efron and G. Gong, A leisurely look at the bootstrap, the jackknife and cross-validation. Am Statistician 37 36-48 (1983). [Pg.244]

Many-fit (multiple data sets) diagnostics are used when several data sets are available for describing the same problem, or when the data set is so large that it can be split into several sets. In the first case, test-set validation is possible and in the latter case, the theory of resampling is applicable, making possible bootstrapping, jackknifing, cross-validation, split-half analyses etc. [Pg.146]

Cross-validation is an internal resampling method much like the older Jackknife and Bootstrap methods [Efron 1982, Efron Gong 1983, Efron Tibshirani 1993, Wehrens et al. 2000]. The principle of cross-validation goes back to Stone [ 1974] and Geisser [ 1974] and the basic idea is simple ... [Pg.148]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...