PRESS value, cross-validation

Indicator functions have the advantage that they can be used on data sets for which no concentration values (y-data) are available. But cross-validation and, especially PRESS, can often provide more reliable guidance. [Pg.103]

Fig. 31.16. Two of the four masks that are used in the calculation of a PRESS value in cross-validation. The remaining two masks are obtained by a parallel shift of the diagonal lines that represent data points to be deleted.

It can be calculated as usual for SEP, see Eq. (6.98) by use of test samples. It is also possible to estimate the PRESS-value on the basis of standard samples only applying cross validation by means of the so-called hat matrix H (Faber and Kowalski [1997a, b] Frank and Todeschini [1994]) ... [Pg.189]

The PRESS value of cross-validation is given by the sum of all the k variations... [Pg.189]

Now we come to the Standard Error of Estimate and the PRESS statistic, which show interesting behavior indeed. Compare the values of these statistics in Tables 25-IB and 25-1C. Note that the value in Table 25-1C is lower than the value in Table 25-1B. Thus, using either of these as a guide, an analyst would prefer the model of Table 25-1C to that of Table 25-1B. But we know a priori that the model in Table 25-1C is the wrong model. Therefore we come to the inescapable conclusion that in the presence of error in the X variable, the use of SEE, or even cross-validation as an indicator, is worse than useless, since it is actively misleading us as to the correct model to use to describe the data. [Pg.124]

In Ma in PCR. m the PRESS values for all 1 to ne eigenvectors used in U, S and V to compute the predicted qualities qs,cross are stored in a vector PRESS that is displayed in Figure 5-65. The figure does not show a clear minimum. In Figure 5-66 we show the results of the cross-validation for ne=12 this number has already been used for the calibration in Figure 5-60. [Pg.305]

A simple and classical method is Wold s criterion [39], which resembles the well-known F-test, defined as the ratio between two successive values of PRESS (obtained by cross-validation). The optimum dimensionality is set as the number of factors for which the ratio does not exceeds unity (at that moment the residual error for a model containing A components becomes larger than that for a model with only A - 1 components). The adjusted Wold s criterion limits the upper ratio to 0.90 or 0.95 [35]. Figure 4.17 depicts how this criterion behaves when applied to the calibration data set of the working example developed to determine Sb in natural waters. This plot shows that the third pair (formed by the third and fourth factors) yields a PRESS ratio that is slightly lower than one, so probably the best number of factors to be included in the model would be three or four. [Pg.208]

Another criterion that is based on the predictive ability of PCA is the predicted sum of squares (PRESS) statistic. To compute the (cross validated) PRESS value at a certain k, we remove the ith observation from the original data set (for i = 1,. .., n), estimate the center and the k loadings of the reduced data set, and then compute the fitted value of the ith observation following Equation 6.16, now denoted as x, . Finally, we set... [Pg.193]

Although cross-validation is always performed on the preprocessed data, the RSS and PRESS values are always calculated on the x block in the original units, as discussed in Chapter 4, Section 4.33.2. The reason for this relates to rather complex problems that occur when standardising a column after one sample has been removed. There are, of course, many other possible approaches. When performing cross-validation, the only output available involves error analysis. [Pg.452]

The PRESS value is determined by leave-one-out cross-validation [15]. Basically, one spectrum at a time is removed from the set of calibration spectra, a calibration model is built from the remaining spectra, and the concentration for the excluded spectrum is estimated as The squared differences between these values and their respective known concentrations c j are summed up as PRESS. The set of wavelengths that minimizes the PRESS value is deemed best. [Pg.34]

When compared to using PRESS as the optimization criterion based on cross-validation, the SEP for an independent set improves the optimization speed several hundred percents. To avoid selection of wavelength combinations specific to the prediction set, it is necessary to validate the predictive ability of selected wavelengths by using additional prediction sets. In addition, the PRESS value for the calibration spectra should also be acceptable. [Pg.53]

Another conceptually different approach is cross-validation. In Equation (2.19), X is regarded as a model for X, and as such the model should be able to predict the values of X. This can be checked by performing a cross-validation scheme in which parts of X are left out of the calculations and kept apart, the model is built and used to predict the left out entries. The sum of squared differences between the predicted and the real entries serves as a measure of discrepancy. All data in X are left out once, and the squared differences are summed in a so called PRESS statistics (PRediction Error Sum of Squares). The model that gives the lowest PRESS is selected and the pseudo-rank of X is defined as the number of components in that model. [Pg.27]

When cross-validation is used for selecting the number of components, the number yielding a minimum PRESS value is often chosen. In practice, a minimum in the PRESS values may not always be present or sometimes the minimum is only marginally better than simpler models. In such cases, it becomes more difficult to decide on the number of components... [Pg.148]

The whole procedure can be repeated for different values of P, Q and R, and then the Tucker3 model with the lowest PRESS can be selected. By progressing through steps (i) to (iii) the block or element indicated in Figure 7.1 (shaded box) is never part of any model. Hence, this part can be independently predicted, which is crucial for proper cross-validation. It is a matter of choice how to leave out slices one slice at a time, more slices at a time etc. This can also be different for the different modes of X. When the array is preprocessed (e.g. centered and/or scaled), then this preprocessing has to be repeated for every step again, that is, the data are left out from the original array. [Pg.150]

Let DPLS(X, y, k) be cross-validation VS-DPLS that selects the k largest wj at each factor. The optimal selection of k (i.e. kopt) is the one with lowest PRESS value ... [Pg.375]

A suitable criterion function for regression analysis should reflect how well the response values are predicted. In the adaptive wavelet algorithm, the criterion function considered for regression is based on the PRESS statistic and is then converted to a leave-one-out cross-validated R-squared measure as follows... [Pg.452]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...