Validation in Chemometrics and PAT

The underlying assumption is further that this procedure leads to information as to the future performance of the full iV-object model but this is an equally flawed assumption. This procedure has patently no link to any new data set, new measurements , generated after the model has been established and cross-validated. In reality, all that cross-validation delivers is a measure of internal training set stability with respect to sub-setting (sequential exclusion of one object, or one segment). [Pg.77]

The essential characteristic of a proper test set is that it represents a new drawing from the population , realized as a new, independent [X,Y] data set specifically not used in the modeling. It is evident that any A -object data set constitute but only one specific realization of an iV-tuple of individual TSE materializations. It takes a completely new ensemble of objects, the test set, to secure a second manifestation. All new measurements, for example when a PAT model is used for the purpose of automated prediction, constitute precisely such a new drawing/sampling. All new measurement situations are therefore to be likened to a test set - and this is exactly what is missing in all forms of cross-validation. [Pg.77]

While the training set is only used for modeling, the test set is only used for performance testing. In prediction performance for example, the predicted values represent new A-data used as input to the training set model, which then predicts the corresponding T-values (i, predw) to tte compared with the test set values [Pg.77]

From the above discussion it follows that all types, variants or schemes of the cross-validation type by necessity must be inferior, indeed unscientific [27,28], precisely because no TSE-contributions from any future data set is ever validated. Only test set validation stands up to the logical demands of the validation imperative itself. Cross-validation is therefore logically and scientifically discredited and must never be used as a replacement for a true test (f cross-validation is used, this must always be accompanied by appropriate warnings [27,28]. [Pg.77]

Admittedly there do exist a few, rare situations in which no option for test set validation is possible (historical data, very small data sets, other. ..). In such cross-validation finds its only legitimate application area (NB None of these situations mnst result from voluntary decisions made by the data analyst, however). In historical data there simply does not exist the option to make any resampling, etc. In small data sets, this option might have existed, bnt perhaps went unused because of negligence - or this small sample case may be fnlly legitimate. [Pg.77]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...