Training sets compounds selected

These considerations provide an impetus for the development of fast, nonlinear, variable selection QSAR methods that can avoid the aforementioned problems of linear QSAR. Several nonlinear QSAR methods have been proposed in recent years. Most of these methods are based on either artificial neural network (ANN) (50, 61, 137-142) or machine learning techniques (65,143-145). Given that optimization of many parameters is involved in these techniques, the speed of the analysis is relatively slow. More recently. Hirst reported a simple and fast nonlinear QSAR method (146), in which the activity surface was generated from the activities of training set compounds based on some predefined mathematical function. [Pg.62]

Eleven compounds that were not included in the training set were selected as a test data set to validate the QSAR models. All of the test compounds were well predicted. The mean and standard deviation of prediction errors were 0.28 and 0.005 for the CoMFA model, and only 0.33 and 0.011 for the CoMSIA model. The predictive which was analogous to the cross-validated correlation coefficient q, was 0.883 for the CoMFA and 0.908 for the CoMSIA, suggesting a high reliability of these models. [Pg.330]

A data set can be split into a training set and a test set randomly or according to a specific rule. The 1293 compounds were divided into a training set of 741 compounds and a test set ot 552 compounds, based on their distribution in a K.NN map. From each occupied neuron, one compound was selected and taken into the training set, and the other compounds were put into the test set. This selection ensured that both the training set and the test set contained as much information as possible, and covered the chemical space as widely as possible. [Pg.500]

Aqueous solubility is selected to demonstrate the E-state application in QSPR studies. Huuskonen et al. modeled the aqueous solubihty of 734 diverse organic compounds with multiple linear regression (MLR) and artificial neural network (ANN) approaches [27]. The set of structural descriptors comprised 31 E-state atomic indices, and three indicator variables for pyridine, ahphatic hydrocarbons and aromatic hydrocarbons, respectively. The dataset of734 chemicals was divided into a training set ( =675), a vahdation set (n=38) and a test set (n=21). A comparison of the MLR results (training, r =0.94, s=0.58 vahdation r =0.84, s=0.67 test, r =0.80, s=0.87) and the ANN results (training, r =0.96, s=0.51 vahdation r =0.85, s=0.62 tesL r =0.84, s=0.75) indicates a smah improvement for the neural network model with five hidden neurons. These QSPR models may be used for a fast and rehable computahon of the aqueous solubihty for diverse orgarhc compounds. [Pg.93]

Two models of practical interest using quantum chemical parameters were developed by Clark et al. [26, 27]. Both studies were based on 1085 molecules and 36 descriptors calculated with the AMI method following structure optimization and electron density calculation. An initial set of descriptors was selected with a multiple linear regression model and further optimized by trial-and-error variation. The second study calculated a standard error of 0.56 for 1085 compounds and it also estimated the reliability of neural network prediction by analysis of the standard deviation error for an ensemble of 11 networks trained on different randomly selected subsets of the initial training set [27]. [Pg.385]

Abraham s data set of 57 compounds was selected as training set for log BB prediction. The test set contained the 13 compounds used by Clark and Liu et al. A three-component model was built from the atom type descriptors, and it estimated the data set of 57 compounds with an r2 = 0.897, q2 = 0.504, and RMSEE = 0.259. The relatively lower q2 resulted from the small size of the data set. Totally, 94 different atom types were identified for the 57 compounds, and half of these atom types occurred only once or twice through the whole data set. When the compounds containing these atom types were left out in cross-validation, the contribution of these atom types could not be predicted accurately, since they did not appear in the training set. After... [Pg.539]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...