Test set selection

Golbraikh, a. Tropsha, a. Predictive QSAR modeling based on diversity sampling of experimental datasets for the training and test set selection. [Pg.455]

For training and test set selection, the same criteria apply as for any other QSAR method, and the reader is referred to literature covering this topic. In the 3D QSAR context, the most critical ones are steps 3 and 4 as they can have dramatic impact on the results of the study, even if the same initial data set is used for model generation. Therefore, these steps will be discussed in detail in the following sections. [Pg.589]

It will sometimes happen that as a result of the final test a redefinition of the functional form with subsequent reparameterization becomes necessary. If the test data that pinpointed the failure are of a type that can be used in refinement, they should be transferred to the reference set. In any case, the test set should be discarded and a new test set selected. The reason for this is that the force field is no longer independent of the initial test set, since it has been used to influence a design decision. One can therefore argue that a reapplication of the same set tests for only internal, not external, predictivity. [Pg.31]

Despite the worse fit and internal predictivity, as compared with equation (1), the validity of this model is proven by its excellent test set (compounds 13-22) predictivity pnsd = 0.909 spRESs = 0.406). The differences between both models, especially in their test set predictivity, provide striking evidence for the influence of the training and test set selections on the obtained results. Thus, a careful selection of the training set molecules is of utmost importance. A broad variety of structural features should be included in these molecules, in order to allow reliable predictions for the test set compounds. [Pg.451]

The problems of inadequate training and test set selections have already been discussed in detail in Section 3. The same problem is also observed in cross-validation. In well-designed training sets, where a small number of objects is selected to explore the parameter space with a minimum number of experiments, cross-validation fails the eliminated objects cannot be predicted by models which are derived from objects that do not contain all structural features of the excluded objects. [Pg.456]

How critical training and test set selections are, is illustrated with a simple example. Cramer et al., correlated the binding affinities of 31 steroids, e.g., 17 and 18, to CBG (corticosteroid binding globulin training set 1-21, test set 22-31), using comparative molecular field analysis (CoMFA). The same set of compounds can be easily described by a one-parameter Free-Wilson equation 4,5-C=C- encodes the presence or absence of a cycloaliphatic 4,5-double bond in ring A of the steroids (equation 28). ... [Pg.2318]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...