Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...

Articles Figures Tables About

Leave-multiple-out

The most frequently used technique for model validation is no doubt the CV. It can be applied in several variants leave-one-out (LOO), leave-multiple-out... [Pg.161]

Controlling the complexity of a model is called regularization. To this end, hold-out data is important. In order to benefit from a training set that is as large as possible and still to be able to measure the performance on unseen data, cross validation is used. It does multiple iterations of training and testing on different partitionings of the data. Leave-one-out is certainly the most prominent concept here [154] however, other ways to partition are in use as well. [Pg.76]

Figures 11 and 12 illustrate the performance of the pR2 compared with several of the currently popular criteria on a specific data set resulting from one of the drug hunting projects at Eli Lilly. This data set has IC50 values for 1289 molecules. There were 2317 descriptors (or covariates) and a multiple linear regression model was used with forward variable selection the linear model was trained on half the data (selected at random) and evaluated on the other (hold-out) half. The root mean squared error of prediction (RMSE) for the test hold-out set is minimized when the model has 21 parameters. Figure 11 shows the model size chosen by several criteria applied to the training set in a forward selection for example, the pR2 chose 22 descriptors, the Bayesian Information Criterion chose 49, Leave One Out cross-validation chose 308, the adjusted R2 chose 435, and the Akaike Information Criterion chose 512 descriptors in the model. Although the pR2 criterion selected considerably fewer descriptors than the other methods, it had the best prediction performance. Also, only pR2 and BIC had better prediction on the test data set than the null model. Figures 11 and 12 illustrate the performance of the pR2 compared with several of the currently popular criteria on a specific data set resulting from one of the drug hunting projects at Eli Lilly. This data set has IC50 values for 1289 molecules. There were 2317 descriptors (or covariates) and a multiple linear regression model was used with forward variable selection the linear model was trained on half the data (selected at random) and evaluated on the other (hold-out) half. The root mean squared error of prediction (RMSE) for the test hold-out set is minimized when the model has 21 parameters. Figure 11 shows the model size chosen by several criteria applied to the training set in a forward selection for example, the pR2 chose 22 descriptors, the Bayesian Information Criterion chose 49, Leave One Out cross-validation chose 308, the adjusted R2 chose 435, and the Akaike Information Criterion chose 512 descriptors in the model. Although the pR2 criterion selected considerably fewer descriptors than the other methods, it had the best prediction performance. Also, only pR2 and BIC had better prediction on the test data set than the null model.
This procedure led to a predictive 4 component PLS model for 72 VolSurf descriptors and 51 thrombin inhibitors. A crossvalidated r2(cv) value of 0.599 after leave-one-out crossvalidation and a conventional r2 value of 0.812 were obtained. Statistical validation using leave-two-out and leave-multiple-groups-out crossvalidation procedures underscores the significance of the final model. The graph of experimental versus calculated log(ESA) permeability values is shown in Figure 8 on the left. The overall model quality corresponds to the model reported by Sugano etal. (2000). [Pg.432]

Hou fitness function. A parameter which combines the multiple correlation coefficient R and the leave-one-out Q [Hou, Wang et al, 1999] ... [Pg.646]

We have applied kNN (Zheng and Tropsha 2000) and simulated annealing - partial least squares (SA-PLS) (Cho et al. 1998) QSAR approaches to a dataset of 48 chemically diverse functionalized amino acids (FAAs) with anticonvulsant activity that were synthesized previously, and successful QSAR models of FAA anticonvulsants have been developed (Shen et al. 2002). Both methods utilized multiple descriptors such as molecular connectivity indices or atom-pair descriptors, which are derived from two-dimensional molecular topology. QSAR models with high internal accuracy were generated, with leave-one-out cross-validated (q ) values rang-... [Pg.1324]

Aptula et al. used multiple linear regression to investigate the toxicity of 200 phenols to the ciliated protozoan Tetrahymena pyriformis Using their MLR model, they then predicted the toxicity of another 50 phenols. Here we present a comparative study for the entire set of 250 phenols, using multiple linear regression, artificial neural networks, and SVM regression methods. Before computing the SVM model, the input vectors were scaled to zero mean and unit variance. The prediction power of the QSAR models was tested with complete cross-validation leave-5%-out (L5%0), leave-10%-out (L10%O), leave-20%-out (L20%O), and leave-25%-out (L25%0). The capacity parameter C was optimized for each SVM model. [Pg.363]

When possible, cancel out the units, leaving only mL Step 5. Solve die problem by multiplication. Cancel out the numbers when possible ... [Pg.44]

Liquid-liquid extraction (also called solvent extraction) is the transfer of a substance (a consolute) dissolved in one liquid to a second liquid (the solvent) that is immiscible with the first liquid or miscible to a very limited degree. This operation is commonly used in fine chemicals manufacture (I) to wash out impurities from a contaminated solution to a solvent in order to obtain a pure solution (raffinate) from which the pure substance will be isolated, and (2) to pull out a desired substance from a contaminated liquid into the solvent leaving impurities in the first liquid. The former operation is typically employed when an organic phase is to be depleted from impurities which are soluble in acidic, alkaline, or neutral aqueous solutions Water or a diluted aqueous solution is then used as the solvent. The pure raffinate is then appropriately processed (e.g. by distillation) to isolate the desired consolute. In the latter version of extraction impurities remain in the first phase. The extract that has become rich in the desired consolute is then appropriately processed to isolate the consolute. Extraction can also be used to fractionate multiple consolutes. [Pg.252]

A final observation is in order the quantitative application of the equilibrium thermodynamical formalism to living systems and especially to ecosystems is generally inadequate since they are complex in their organisation, involving many interactions and feedback loops, several hierarchical levels may have to be considered, and the sources and types of energy involved can be multiple. Furthermore, they are out-of-equilibrium open flow systems and need to be maintained in such condition since equilibrium is death. Leaving aside very simple cases, in the present state of the art we are, therefore, limited to general semiquantitative statements or descriptions (e.g. ecosystem narratives ). [Pg.123]


See other pages where Leave-multiple-out is mentioned: [Pg.116]    [Pg.376]    [Pg.399]    [Pg.329]    [Pg.116]    [Pg.376]    [Pg.399]    [Pg.329]    [Pg.491]    [Pg.475]    [Pg.447]    [Pg.452]    [Pg.118]    [Pg.405]    [Pg.435]    [Pg.114]    [Pg.15]    [Pg.168]    [Pg.426]    [Pg.429]    [Pg.434]    [Pg.17]    [Pg.304]    [Pg.305]    [Pg.160]    [Pg.418]    [Pg.59]    [Pg.248]    [Pg.249]    [Pg.290]    [Pg.118]    [Pg.141]    [Pg.154]    [Pg.305]    [Pg.64]    [Pg.119]    [Pg.1326]    [Pg.399]    [Pg.100]    [Pg.492]    [Pg.157]    [Pg.199]    [Pg.229]   
See also in sourсe #XX -- [ Pg.329 ]




SEARCH



Leave-out

© 2024 chempedia.info