Problems in QSAR

One of the most important problems in QSAR analysis is establishing the domain of applicability for each model. In the absence of the applicability domain restriction, each model can formally predict the activity of any compound, even with a completely different structure from those included in the training set. Thus, the absence of the model applicability domain as a mandatory component of any QSAR model would lead to the unjustified extrapolation of the model in the chemistry space and, as a result, a high likelihood of inaccurate predictions. In our research we have always paid particular attention to this issue (12, 20-27). A good overview of commonly used applicability domain definitions can be found in reference (28). [Pg.116]

Gordeeva, E.V., Molchanova, M.S. and Zefirov, N.S. (1990). General Methodology and Computer Program for the Exhaustive Restoring of Chemical Structures by Molecular Coimectivity Indices. Solution of the Inverse Problem in QSAR/QSPR. Tetrahedron Comput.MethodoL, 3, 389-415. [Pg.573]

Skvortsova, M.I., Baskin, I.I., Slovokhotova, Q.L., Palyulin, V.A. and Zefirov, N.S. (1993). Inverse Problem in QSAR/QSPR Studies for the Case of Topological Indices Characterizing Molecular Shape (Kier Indices). J.Chem.lnf.Comput.ScL, 33,630-634. [Pg.647]

Zefirov, N.S., Palyulin, V.A., Skvortsova, M.I. and Baskin, 1.1. (1995). Inverse Problems in QSAR. In QSAR and Molecular Modelling Cocepts, Computational Tools and Biological Applications (Sanz, E, Giraldo, J. and Manaut, K, eds.), Prous Science, Barcelona (Spain), pp. 40-41. [Pg.666]

Ort/io-substituents pose special problems because their a values include a steric contribution furthermore, as compared to meta- and /lora-substituents, many ort/i< -substituents cause conformational changes, sometimes being favorable for binding, sometimes being very unfavorable although the problem of ortho-substituent parametrization has been discussed in detail (e.g. [41, 296]), it is (and will remain) a difficult problem in QSAR studies. [Pg.45]

Outliers, i.e. data that cannot be explained by the model, constitute a serious problem in QSAR studies. Most often they are omitted from the data set without further comments, which is not a good practice. A lot of information might be derived from the careful inspection and consideration of the residuals of a multiple regression analysis (e.g. [574]) and of so-called outliers (e.g. [575, 576]). [Pg.99]

So do we know of something more important than searching for additional molecular descriptors, improving the interpretation of our models, and similar questions Are there some giant problems in QSAR that deserve our attention And if there are, what are these problans One may expect that there are at least several highly important problans in any alive discipline of science because, if not, the discipline would be dead and not alive. To find than may not, however, be easy. [Pg.18]

A challenging task in material science as well as in pharmaceutical research is to custom tailor a compound s properties. George S. Hammond stated that the most fundamental and lasting objective of synthesis is not production of new compounds, but production of properties (Norris Award Lecture, 1968). The molecular structure of an organic or inorganic compound determines its properties. Nevertheless, methods for the direct prediction of a compound s properties based on its molecular structure are usually not available (Figure 8-1). Therefore, the establishment of Quantitative Structure-Property Relationships (QSPRs) and Quantitative Structure-Activity Relationships (QSARs) uses an indirect approach in order to tackle this problem. In the first step, numerical descriptors encoding information about the molecular structure are calculated for a set of compounds. Secondly, statistical and artificial neural network models are used to predict the property or activity of interest based on these descriptors or a suitable subset. [Pg.401]

The ability of partial least squares to cope with data sets containing very many x values is considered by its proponents to make it particularly suited to modern-day problems, where it is very easy to compute an extremely large number of descriptors for each compound (as in CoMFA). This contrasts with the traditional situation in QSAR, where it could be time-consuming to measure the required properties or where the analysis was restricted to traditional substituent constants. [Pg.727]

Computational chemists in the pharmaceutical industry also expanded from their academic upbringing by acquiring an interest in force field methods, QSAR, and statistics. Computational chemists with responsibility to work on pharmaceuticals came to appreciate the fact that it was too limiting to confine one s work to just one approach to a problem. To solve research problems in industry, one had to use the best available technique, and this did not mean going to a larger basis set or a higher level of quantum mechanical theory. It meant using molecular mechanics or QSAR or whatever. [Pg.14]

The method of PCA can be used in QSAR as a preliminary step to Hansch analysis in order to determine the relevant parameters that must be entered into the equation. Principal components are by definition uncorrelated and, hence, do not pose the problem of multicollinearity. Instead of defining a Hansch model in terms of the original physicochemical parameters, it is often more appropriate to use principal components regression (PCR) which has been discussed in Section 35.6. An alternative approach is by means of partial least squares (PLS) regression, which will be more amply covered below (Section 37.4). [Pg.398]

Despite the existence of several databases for certain substances, it is not possible to find physicochemical and/or toxicological parameters to assess the risk for all substances. The lack of data is one of the main problems in risk assessment. This is especially true for emerging pollutants. One solution to solve this problem is the use of QSAR or estimation tools. QSAR models correlate the structure of the substance with their activities (physicochemical properties, environmental fate, and/or toxicological properties). [Pg.104]

The literature of the past three decades has witnessed a tremendous explosion in the use of computed descriptors in QSAR. But it is noteworthy that this has exacerbated another problem rank deficiency. This occurs when the number of independent variables is larger than the number of observations. Stepwise regression and other similar approaches, which are popularly used when there is a rank deficiency, often result in overly optimistic and statistically incorrect predictive models. Such models would fail in predicting the properties of future, untested cases similar to those used to develop the model. It is essential that subset selection, if performed, be done within the model validation step as opposed to outside of the model validation step, thus providing an honest measure of the predictive ability of the model, i.e., the true q2 [39,40,68,69]. Unfortunately, many published QSAR studies involve subset selection followed by model validation, thus yielding a naive q2, which inflates the predictive ability of the model. The following steps outline the proper sequence of events for descriptor thinning and LOO cross-validation, e.g.,... [Pg.492]

Typically, the final part of QSAR model development is the model validation [17, 18], when the predictive power of the model is tested on an independent set of compounds. In essence, predictive power is one of the most important characteristics of QSAR models. It can be defined as the ability of a model to predict accurately the target property (e.g., biological activity) of compounds that were not used for model development. The typical problem of QSAR modeling is that at the time of the model development a researcher only has, essentially, training set molecules, so predictive ability can only be characterized by statistical characteristics of the training set model and not by true external validation. [Pg.438]

QSAR are useful In the design of pesticides and medicinal drugs, and In environmental problems such as the prediction of toxicity and blodegradablllty. An empirical relationship can be properly used only for Interpolation whereas one based solidly on well-established theory can be used at least to some extent for extrapolation as well. It seems of real Importance, then, to determine the nature and slgmiflcance of steric and bulk parameters In QSAR. [Pg.249]

By dividing the problem this way, we translate it from an abstract problem in catalysis to one of relating one multi dimensional space to another. This is still an abstract problem, but the advantage is that we can now quantify the relationship between spaces B and C using QSAR and QSPR models. Note that space B contains molecular descriptor values, rather than structures. These values, however, are directly related to the structures (8). [Pg.263]

Benigni, R. and Richard, A.M., QSARs of mutagens and carcinogens two case studies illustrating problems in the construction of models for noncongeneric chemicals, Mutation Research, 37(1), 29-46, 1996. [Pg.181]

An advantage of defining the problem in this manner is that the partition coefficient has become a central property in quantitative structure-activity relationships (QSAR) and a large data base of P values is available in the medicinal chemistry literature (22-24). In particular, if a correlation (Equation 15) between the polymer-water and octanol-water partition coefficients can be established for a series of solutes, it becomes possible to utilize log P (oc-tanol/water) value as a reference point from which to calculate the polymer-water value. [Pg.61]

Another important consideration in the selection of a test set is to ensure that the chemicals in the data set relate to the real problem in question. It should be emphasized that the QSAR models developed in our project are used primarily to predict the activity of environmental chemicals, mostly pesticides and industrial chemicals. A data set reported by Nishihara et al. (Nishihara et al., 2000) was also selected as a test set. This data set contains 517 chemicals tested with the yeast two-hybrid assay, of which over 86% are pesticides and industrial chemicals. Only 463 chemicals were used for this validation study after structure processing. Only 62 chemicals were categorized as active on the basis of having on activity greater than 10% of 10 7M H2, as defined in the original paper (Nishihara et al., 2000). The majority of the chemicals were inactive, which is similar to the real-world situation where inactive chemicals are expected from a large proportion of those in the environment. [Pg.309]

The use of QSARs for the regulatory assessment of chemicals is very limited, mainly because there is widespread disagreement on the possible applications of QSARs and the extent to which QSAR predictions can be relied upon. To address the credibility problem of QSARs, it will be necessary to obtain international agreement on acceptability criteria for the development and validation of QSARs, and to apply the criteria in the context of a formal framework that guarantees independence in the selection, confirmation, and validation of QSARs. [Pg.439]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...