QSPR models

Besides these LFER-based models, approaches have been developed using whole-molecule descriptors and learning algorithms other then multiple linear regression (see Section 10.1.2). [Pg.494]

Breindl et. al. published a model based on semi-empirical quantum mechanical descriptors and back-propagation neural networks [14]. The training data set consisted of 1085 compounds, and 36 descriptors were derived from AMI and PM3 calculations describing electronic and spatial effects. The best results with a standard deviation of 0.41 were obtained with the AMl-based descriptors and a net architecture 16-25-1, corresponding to 451 adjustable parameters and a ratio of 2.17 to the number of input data. For a test data set a standard deviation of 0.53 was reported, which is quite close to the training model. [Pg.494]

These few examples are of course a small and arbitrarily chosen set of methods for the calculation of log P values. Nevertheless, it is hoped that they demonstrate some basic principles in the prediction of a physicochemical property. [Pg.494]

As another example, we shall consider the influence of the number of descriptors on the quality of learning. Lucic et. al. [3] performed a study on QSPR models employing connectivity indices as descriptors. The dataset contained 18 isomers of octane. The physical property for modehng was boiling points. The authors were among those who introduced the technique of orthogonahzation of descriptors. [Pg.207]

Molecular dipole moments are often used as descriptors in QPSR models. They are calculated reliably by most quantum mechanical techniques, not least because they are part of the parameterization data for semi-empirical MO techniques. Higher multipole moments are especially easily available from semi-empirical calculations using the natural atomic orbital-point charge (NAO-PC) technique [40], but can also be calculated rehably using ab-initio or DFT methods. They have been used for some QSPR models. [Pg.392]

The molecular electronic polarizability is one of the most important descriptors used in QSPR models. Paradoxically, although it is an electronic property, it is often easier to calculate the polarizability by an additive method (see Section 7.1) than quantum mechanically. Ah-initio and DFT methods need very large basis sets before they give accurate polarizabilities. Accurate molecular polarizabilities are available from semi-empirical MO calculations very easily using a modified version of a simple variational technique proposed by Rivail and co-workers [41]. The molecular electronic polarizability correlates quite strongly with the molecular volume, although there are many cases where both descriptors are useful in QSPR models. [Pg.392]

Furthermore, QSPR models for the prediction of free-energy based properties that are based on multilinear regression analysis are often referred to as LFER models, especially, in the wide field of quantitative structure-activity relationships (QSAR). [Pg.489]

D descriptors), the 3D structure, or the molecular surface (3D descriptors) of a structure. Which kind of descriptors should or can be used is primarily dependent on the si2e of the data set to be studied and the required accuracy for example, if a QSPR model is intended to be used for hundreds of thousands of compounds, a somehow reduced accuracy will probably be acceptable for the benefit of short processing times. Chapter 8 gives a detailed introduction to the calculation methods for molecular descriptors. [Pg.490]

In order to develop a proper QSPR model for solubility prediction, the first task is to select appropriate input deseriptors that are highly correlated with solubility. Clearly, many factors influence solubility - to name but a few, the si2e of a molecule, the polarity of the molecule, and the ability of molecules to participate in hydrogen honding. For a large diverse data set, some indicators for describing the differences in the molecules are also important. [Pg.498]

We know that every QSPR model is limited by tbe data set that is used for building the model. In order to examine the diversity of this data set (the Huuskonen... [Pg.500]

Building a QSPR model consists of three steps descriptor calculation, descriptor analysis and optimization, and establishment of a mathematical relationship between descriptors and property. [Pg.512]

Shen M, Xiao Y, Golbraikh A, Gombar VK, Tropsha A. Development and validation of k-nearest-neighbor QSPR models of metabolic stability of drug candidates. J Med Chem 2003 46 3013-20. [Pg.375]

Eros D, Keri G, Kovesdi I, Szantai-Kis C, Meszaros G and Orfi L. Comparison of predictive ability of water solubility QSPR models generated by MLR, PLS and ANN methods. Mini Rev Med Chem 2004 4 167-77. [Pg.508]

Aqueous solubility is selected to demonstrate the E-state application in QSPR studies. Huuskonen et al. modeled the aqueous solubihty of 734 diverse organic compounds with multiple linear regression (MLR) and artificial neural network (ANN) approaches [27]. The set of structural descriptors comprised 31 E-state atomic indices, and three indicator variables for pyridine, ahphatic hydrocarbons and aromatic hydrocarbons, respectively. The dataset of734 chemicals was divided into a training set ( =675), a vahdation set (n=38) and a test set (n=21). A comparison of the MLR results (training, r =0.94, s=0.58 vahdation r =0.84, s=0.67 test, r =0.80, s=0.87) and the ANN results (training, r =0.96, s=0.51 vahdation r =0.85, s=0.62 tesL r =0.84, s=0.75) indicates a smah improvement for the neural network model with five hidden neurons. These QSPR models may be used for a fast and rehable computahon of the aqueous solubihty for diverse orgarhc compounds. [Pg.93]

The process of defining any QSPR model involves three fundamental components (i) a set of descriptors, (ii) a method to select the most appropriate descriptors, and (iii) the experimental data to train and test the model. It is important to note here that none of these components are unique and many models can be... [Pg.301]

It is usual to have the coefficient of determination, r, and the standard deviation or RMSE, reported for such QSPR models, where the latter two are essentially identical. The value indicates how well the model fits the data. Given an r value close to 1, most of the variahon in the original data is accounted for. However, even an of 1 provides no indication of the predictive properties of the model. Therefore, leave-one-out tests of the predictivity are often reported with a QSAR, where sequentially all but one descriptor are used to generate a model and the remaining one is predicted. The analogous statistical measures resulting from such leave-one-out cross-validation often are denoted as and SpR ss- Nevertheless, care must be taken even with respect to such predictivity measures, because they can be considerably misleading if clusters of similar compounds are in the dataset. [Pg.302]

A problem of all such linear QSPR models is the fact that, by definition, they cannot account for the nonlinear behavior of a property. Therefore, they are much less successful for log S as they are for all kinds of logarithmic partition coefficients. [Pg.302]

NN can be used to select descriptors and to produce a QSPR model. Since NN models can take into account nonlinearity, these models tend to perform better for log S prediction than those refined using MLR and PLS. However, to train nonlinear behavior requires significantly more training data that to train linear behavior. Another disadvantage is their black-box character, i.e. that they provide no insight into how each descriptor contributes to the solubility. [Pg.302]

Refinement of a QSPR model requires experimental solubilities to train the model. Several models have used the dataset of Huuskonen [44] who sourced experimental data from the AQUASOL [45] and PHYSPROP [46] databases. The original set had a small number of duplicates, which have been removed in most subsequent studies using this dataset, leaving 1290 compounds. When combined, the log Sw... [Pg.302]

From the results described above it is clear that a different QSPR model can be obtained depending on what data is used to train the model and on the method used to derive the model. This state of affairs is not so much a problem if, when using the model to predict the solubility of a compound, it is clear which model is appropriate to use. The large disparity between models also highlights the difficulty in extrapolating any physical significance from the models. Common to all models described above is the influence of H-bonding, a feature that does at least have a physical interpretation in the process of aqueous solvation. [Pg.304]

The question of selecting the most appropriate method for any one compound has been addressed recently by Kiihne et al. [52]. Initially several different methods are used to predict the solubility of a reference library of compounds. A subset of compounds from this reference library that are most similar to the compound of interest is identified and the method with the smallest sum of errors in the predicted solubility for this subset is chosen to predict the solubility. Dearden [3] considered whether a consensus approach could improve prediction over any one method. While the predictions from certain pairs of methods could be combined with improved results, some combinations led to poorer performance than either method alone. Chen et al. [53] were able to achieve improved correlation with their QSPR model using different QSPRs for different classes of compounds. Thus, while each QSPR used the same set of eight descriptors, the contribution of each descriptor changed according to the compound type. Each group had 82-101 compounds and achieved an of 0.86-0.92. [Pg.304]

Beck, B., Breindl, A., Qark, T. QM/NN QSPR models with error estimation vapor pressure and log P. J. Chem. Inf. Comput. Sci. 2000, 40,1046-1051. [Pg.403]

We also note that many ADME, QSAR or QSPR models, based on experimental or computed parameters, use a combination of log P and partial charges and/or fraction ionized at a given pH, as independent variables, rather than the potentially more physiological log or log values. This tendency may reflect a perceived superiority and accuracy of the logP values, whether computed or experimentally determined, and may also be reflected by the nature of the data stored observed among different industrial settings. [Pg.413]

Multiple linear regression (MLR) is a classic mathematical multivariate regression analysis technique [39] that has been applied to quantitative structure-property relationship (QSPR) modeling. However, when using MLR there are some aspects, with respect to statistical issues, that the researcher must be aware of ... [Pg.398]

Basak, S. C., Mills, D. Use of mathematical structural invariants in the development of QSPR models. MATCH (Commun. Math. Comput. Chem.) 2001, 44, 15-30. [Pg.499]

A from the center of a positive ionizable group was identified. However, its predictive performance on a test set consisted of eight structurally similar compounds was relatively poor. To achieve a computational model with greater predictability, a descriptor-based QSPR model was also developed. Descriptors related to molecular hydrophobicity as well as hydrogen bond donor, shape and charge features contributed to explain hOCTl inhibitor properties of the analyzed compounds. [Pg.390]

QSPR models have been developed by six multivariate calibration methods as described in the previous sections. We focus on demonstration of the use of these methods but not on GC aspects. Since the number of variables is much larger than the number of observations, OLS and robust regression cannot be applied directly to the original data set. These methods could only be applied to selected variables or to linear combinations of the variables. [Pg.187]

Sivaraman N, Srinivasan TG, Vasudeva Rao PR, Natarajan R (2001) QSPR modelling for solubility of fullerene (CM ) in organic solvents. J. Chem. Inf. Comput. Sci. 41 1067-1074. [Pg.336]

New Approach to QSPR Modeling of Fullerene C60 Solubility in Organic Solvents ... [Pg.337]

Optimal descriptor used for the QSPR modeling of the C60 solubility is expressed as ... [Pg.341]

Castro EA, Toropov AA, Nesterova AI, Nabiev OM (2004) QSPR modeling aqueous solubility of polychlorinated biphenyls by optimization of correlation weights of local and global graph invariants. CEJC 2 500-523. [Pg.349]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...