Predictive QSAR models model validation

Shen M, Beguin C, Golbraikh A, Stables JP, Kohn H, Tropsha A. Application of predictive QSAR models to database mining identification and experimental validation of novel anticonvulsant compounds. J Med Chem 2004 47(9) 2356-64. [Pg.317]

Building Predictive QSAR Models The Importance of Validation 439... [Pg.439]

Figure 10.1 Flowchart of predictive QSAR modeling framework based on the validated QSAR models.

The correspondence between in vivo and in vitro predictions of neuropathic potential suggest that valid predictive QSAR models may be based on the in vitro approach. Adoption of this in vitro system would result in reducing experimental animal use, lowering costs, accelerating data production, and enabling standardization of a biochemically based risk assessment of the neuropathic potential of OP compounds. [Pg.286]

Selecting an optimum group of descriptors is both an important and time-consuming phase in developing a predictive QSAR model. Frohlich, Wegner, and Zell introduced the incremental regularized risk minimization procedure for SVM classification and regression models, and they compared it with recursive feature elimination and with the mutual information procedure. Their first experiment considered 164 compounds that had been tested for their human intestinal absorption, whereas the second experiment modeled the aqueous solubility prediction for 1297 compounds. Structural descriptors were computed by those authors with JOELib and MOE, and full cross-validation was performed to compare the descriptor selection methods. The incremental... [Pg.374]

The abbreviation QSAR stands for quantitative structure-activity relationships. QSPR means quantitative structure-property relationships. As the properties of an organic compound usually cannot be predicted directly from its molecular structure, an indirect approach Is used to overcome this problem. In the first step numerical descriptors encoding information about the molecular structure are calculated for a set of compounds. Secondly, statistical methods and artificial neural network models are used to predict the property or activity of interest, based on these descriptors or a suitable subset. A typical QSAR/QSPR study comprises the following steps structure entry or start from an existing structure database), descriptor calculation, descriptor selection, model building, model validation. [Pg.432]

In QSAR equations, n is the number of data points, r is the correlation coefficient between observed values of the dependent and the values predicted from the equation, is the square of the correlation coefficient and represents the goodness of fit, is the cross-validated (a measure of the quality of the QSAR model), and s is the standard deviation. The cross-validated (q ) is obtained by using leave-one-out (LOO) procedure [33]. Q is the quality factor (quality ratio), where Q = r/s. Chance correlation, due to the excessive number of parameters (which increases the r and s values also), can. [Pg.47]

QSAR model validation is an essential task in developing a statistically vahd and predictive model, because the real utility of a QSAR model is in its ability to predict accurately the modeled property for new compounds. The following approaches have been used for the vahdation of QSAR Eqs. 1-20 ... [Pg.69]

ACD/Tox Suite is a collection of software modules that predict probabilities for basic toxicity endpoints. Predictions are made from chemical structure and based upon large validated databases and QSAR models, in combination with expert knowledge of organic chemistry and toxicology. ToxSuite modules for Acute Toxicity, Genotoxicity, Skin Irritation, and Aquatic Toxicity have been used. [Pg.197]

The literature of the past three decades has witnessed a tremendous explosion in the use of computed descriptors in QSAR. But it is noteworthy that this has exacerbated another problem rank deficiency. This occurs when the number of independent variables is larger than the number of observations. Stepwise regression and other similar approaches, which are popularly used when there is a rank deficiency, often result in overly optimistic and statistically incorrect predictive models. Such models would fail in predicting the properties of future, untested cases similar to those used to develop the model. It is essential that subset selection, if performed, be done within the model validation step as opposed to outside of the model validation step, thus providing an honest measure of the predictive ability of the model, i.e., the true q2 [39,40,68,69]. Unfortunately, many published QSAR studies involve subset selection followed by model validation, thus yielding a naive q2, which inflates the predictive ability of the model. The following steps outline the proper sequence of events for descriptor thinning and LOO cross-validation, e.g.,... [Pg.492]

Basak, S. C., Mills, D., Hawkins, D. M., Kraker, J. J. Proper statistical modeling and validation in QSAR A case study in the prediction of rat fat air partitioning. In Computation in Modem Science and Engineering, Proceedings of the International Conference on Computational Methods in Science and Engineering 2007 (ICCMSE 2007), Simos, T. E., Maroulis, G., Eds., American Institute of Physics, Melville, New York, 2007, pp. 548-551. [Pg.501]

This procedure assessed whether some of the different descriptors used by different equations were intercorrelated and, therefore, interchangeable [59]. The remaining diverse QSAR equations were further classified by size (number of descriptors they include). The best equations of each encountered size were kept for final validation with the VS molecules and for further analysis. Consensus models featuring average predictions over these equations were also generated and validated. We focus here on the discussion of the minimalist overlay-independent and overlay-based QSAR models, each including only six descriptors, and refer to the optimal consensus model of the overlay-based QSAR approach families for comparative purposes. [Pg.125]

Typically, the final part of QSAR model development is the model validation [17, 18], when the predictive power of the model is tested on an independent set of compounds. In essence, predictive power is one of the most important characteristics of QSAR models. It can be defined as the ability of a model to predict accurately the target property (e.g., biological activity) of compounds that were not used for model development. The typical problem of QSAR modeling is that at the time of the model development a researcher only has, essentially, training set molecules, so predictive ability can only be characterized by statistical characteristics of the training set model and not by true external validation. [Pg.438]

Thus, it is still uncommon to test QSAR models (characterized by a reasonably high q ) for their ability to predict accurately biological activities of compounds not included in the training set. In contrast to such expectations, it has been shown that if a test set with known values of biological activities is available for prediction, there exists no correlation between the LOO cross-validated and the correlation coefficient between the predicted and observed activities for the test set (Figure 16.1). In our experience [17, 28], this phenomenon is characteristic of many datasets and is independent of the descriptor types and optimization techniques used to develop training set models. In a recent review, we emphasized the importance of external validation in developing reliable models [18]. [Pg.440]

It needs to be emphasized that no matter how robust, significant, and validated a QSAR may be, it cannot be expected to reliably predict the modeled property for the entire universe of chemicals. Therefore, before a QSAR model is put into use for screening chemicals, its domain of application must be defined and predictions for only those chemicals that fall in this domain should be considered reliable. Some approaches that aid in defining the applicability domain are described below. [Pg.441]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...