Cross-validation, description

Data splitting is fairly straightforward and covered in detail in the next section on validation. It simply implies that data to be modeled are partitioned based on differences in sampling (i.e., windows where suspect 0 are believed to be constant). The most common data splits to explore pharmacokinetic time dependencies would be single-dose, chronic non-steady-state, and steady-state conditions. Data subsets are modeled individually with all parameters and variability estimates along with any relevant covariate expressions compared in a manner similar to a validation procedure (see next section). Data can be combined in a leave-one-out strategy (see cross-validation description) to examine the uniformity of data windows. ... [Pg.335]

For partial least-squares (PLS) or principal component regression (PCR), the infrared spectra were transferred to a DEC VAX 11/750 computer via the NIC-COM software package from Nicolet. This package also provided utility routines used to put the spectra into files compatible with the PLS and PCR software. The PLS and PCR program with cross-validation was provided by David Haaland of Sandia National Laboratory. A detailed description of the program and the procedures used in it has been given (5). [Pg.47]

The goodness of fit of PLS models is calculated as an error of the prediction, in a manner similar to the description in ordinary least squares methods. Using the so-called cross-validation test one can determine the number of significant vectors in U and T and also the error of prediction. [Pg.200]

The resampling approaches of cross-validation (CV) and bootstrapping do not have the drawback of data splitting in that all available data are used for model development so that the model provides an adequate description of the information contained in the gathered data. Cross-validation and bootstrapping are addressed in Chapter 15. One problem with CV deserves attention. Repeated CV has been demonstrated to be inconsistent if one validates a model by CV and then randomly shuffles the data, after shuffling, the model may not be validated. [Pg.238]

The main goal of the data analysis is usually to find X, but the residual E can give important clues to the quality of this model. Possibly, residuals obtained from a test-set or from cross-validation can be used instead of fitted residuals. Random noise or some symmetrical type of distribution for the elements of E is normally expected and this can be verified from plotting the residuals and by the use of diagnostics. A good description of the use of residuals in three-way analysis is given by Kroonenberg [1983],... [Pg.167]

For the bread data, the results are a little different. First of all the difference between cross-validated and fitted description (not shown) of X is more distinct because the number of samples is smaller, making the model more prone to overfit and because the data are... [Pg.287]

Cross-validation, in which objects are eliminated and only the excluded objects are predicted from the resulting model to check its stability and validity (see chapter 5.3 for a detailed description), seems to be a too crude instrument to (automatically) decide on the validity of a QSAR regression equation. Cross-validation may be applied to relatively large data sets. But if only few compounds are included in the QSAR equation, if a certain parameter is mainly based on a single data point, or if the compounds have been selected according to a rational design procedure, e.g. a D-optimal design (chapter 6), cross-validation may incorrectly indicate a lack of validity of the QSAR model. [Pg.99]

In Figure 4 the root mean square error (RMS) and the root mean square error of cross validation (RMSECV) of different data processing methods and parameters are shown. As expected, the RMSECV is larger than the RMS for each method. The larger errors of the IHM are due to the non-perfect description of the pure spectra. Interestingly, CPR shows for a set of ranks (number of components used for description of the spectra) and power coefficients the lowest errors. In this example of the mixture of water and oil, this is attributed to the fact, that CPR not only considers the correlation, but also the variance with a power coefficient. [Pg.54]

CoMFA is attractive because of its combination of understandable molecular description, statistical analysis, and graphic display of results in a computer program that is unambiguous in its application. Molecules are described with molecular interaction fields similar to those computed by GRID, statistics are computed by and cross-validation, and... [Pg.205]

Finally - and equally important - Jens contribution to the formal treatment of GOS based on the polarization propagator method and Bethe sum rules has been shown to provide a correct quantum description of the excitation spectra and momentum transfer in the study of the stopping cross section within the Bethe-Bloch theory. Of particular interest is the correct description of the mean excitation energy within the polarization propagator for atomic and molecular compounds. This motivated the study of the GOS in the RPA approximation and in the presence of a static electromagnetic field to ensure the validity of the sum rules. [Pg.365]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...