The nonlinear character of log has not often been discussed previously. Nevertheless, Jorgensen and Duffy [26] argued the need for a nonlinear contribution to their log S regression, which is a product of H-bond donor capacity and the square root of H-bond acceptor capacity divided by the surface area. Indeed, for the example above their QikProp method partially reflects for this nonlinearity by predichng a much smaller solubility increase for the indole to benzimidazole mutation (0.45 versus 1.82 [39, 40]). Abraham and Le [41] introduced a similar nonlinearity in the form of a product of H -bond donor and H -bond acceptor capacity while all logarithmic partition coefficients are linear regressions with respect to their solvation parameters. Nevertheless, Abraham s model fails to reflect the test case described above. It yields changes of 1.8(1.5) and 1.7(1.7) [42] for the mutations described above. [Pg.301]

The concept of functional groups in chemistry is pivotal to our understanding of physicochemical behavior. It is not surprising, then, that we might hope that various properties of molecules might be able to be described based on this concept. From this simple premise the world of QSPRs emerges. This field of study cannot be adequately reviewed here and, further, Dearden [3] has recently reviewed the important features of deriving a QSPR for the prediction of aqueous solubility. The most critical requirement of any QSPR is that it be predictive it is not sufficient that QSPR be able to reproduce the training data. To this issue, it is very important that the QSPR not be over-fit. Like many physicochemical properties, solubility tends to be an information-deficient property. Thus, the number of compounds whose solubility is accurately determined experimentally and published is limited to at most a few thousand, and question marks have to be put to several of these data due to the reasons mentioned above, i.e. due to unreported protonation and tautomeric states, polymorphism, etc. Often the number of compounds used to derive a QSPR is limited to a few hundred. It is, therefore, imperative that the number of descriptors used in the QSPR be restricted to a very small number to ensure the predictive nature of the model be retained, even at the cost of a more precise model. Although there is no formal ratio of observations to descriptors needed to ensure a predictive model, it is not unusual to expect ratios of at least 10 to 20. [Pg.301]

It is usual to have the coefficient of determination, r, and the standard deviation or RMSE, reported for such QSPR models, where the latter two are essentially identical. The value indicates how well the model fits the data. Given an r value close to 1, most of the variahon in the original data is accounted for. However, even an of 1 provides no indication of the predictive properties of the model. Therefore, leave-one-out tests of the predictivity are often reported with a QSAR, where sequentially all but one descriptor are used to generate a model and the remaining one is predicted. The analogous statistical measures resulting from such leave-one-out cross-validation often are denoted as and SpR ss- Nevertheless, care must be taken even with respect to such predictivity measures, because they can be considerably misleading if clusters of similar compounds are in the dataset. [Pg.302]

A problem of all such linear QSPR models is the fact that, by definition, they cannot account for the nonlinear behavior of a property. Therefore, they are much less successful for log S as they are for all kinds of logarithmic partition coefficients. [Pg.302]

NN can be used to select descriptors and to produce a QSPR model. Since NN models can take into account nonlinearity, these models tend to perform better for log S prediction than those refined using MLR and PLS. However, to train nonlinear behavior requires significantly more training data that to train linear behavior. Another disadvantage is their black-box character, i.e. that they provide no insight into how each descriptor contributes to the solubility. [Pg.302]

All the techniques described above can be used to calculate molecular structures and energies. Which other properties are important for chemoinformatics Most applications have used semi-empirical theory to calculate properties or descriptors, but ab-initio and DFT are equally applicable. In the following, we describe some typical properties and descriptors that have been used in quantitative structure-activity (QSAR) and structure-property (QSPR) relationships. [Pg.390]

Molecular dipole moments are often used as descriptors in QPSR models. They are calculated reliably by most quantum mechanical techniques, not least because they are part of the parameterization data for semi-empirical MO techniques. Higher multipole moments are especially easily available from semi-empirical calculations using the natural atomic orbital-point charge (NAO-PC) technique [40], but can also be calculated rehably using ab-initio or DFT methods. They have been used for some QSPR models. [Pg.392]

To know what QSAR and QSPR are, and the steps in QSAR/QSPR. [Pg.401]

The method of building predictive models in QSPR/QSAR can also be applied to the modeling of materials without a unique, clearly defined structure. Instead of the connection table, physicochemical data as well as spectra reflecting the compound s structure can be used as molecular descriptors for model building, [Pg.402]

The QSPR/QSAR methodology can also be applied to materials and mixtures where no structural information is available. Instead of descriptors derived from the compound s structure, various physicochemical properties, including spectra, can be used. In particular, spectra are valuable in this context as they reflect the structure in a sensitive way. [Pg.433]

Two approaches to quantify/fQ, i.e., to establish a quantitative relationship between the structural features of a compoimd and its properties, are described in this section quantitative structure-property relationships (QSPR) and linear free energy relationships (LFER) cf. Section 3.4.2.2). The LFER approach is important for historical reasons because it contributed the first attempt to predict the property of a compound from an analysis of its structure. LFERs can be established only for congeneric series of compounds, i.e., sets of compounds that share the same skeleton and only have variations in the substituents attached to this skeleton. As examples of a QSPR approach, currently available methods for the prediction of the octanol/water partition coefficient, log P, and of aqueous solubility, log S, of organic compoimds are described in Section 10.1.4 and Section 10.15, respectively. [Pg.488]

Furthermore, QSPR models for the prediction of free-energy based properties that are based on multilinear regression analysis are often referred to as LFER models, especially, in the wide field of quantitative structure-activity relationships (QSAR). [Pg.489]

The general procedure in a QSPR approach consists of three steps structure representation descriptor analysis and model building (see also Chapter X, Section 1.2 of the Handbook). [Pg.489]

Descriptors have to be found representing the structural features which are related to the target property. This is the most important step in QSPR, and the development of powerful descriptors is of central interest in this field. Descriptors can range from simple atom- or functional group counts to quantum chemical descriptors. They can be derived on the basis of the connectivity (topological or [Pg.489]

D descriptors), the 3D structure, or the molecular surface (3D descriptors) of a structure. Which kind of descriptors should or can be used is primarily dependent on the si2e of the data set to be studied and the required accuracy for example, if a QSPR model is intended to be used for hundreds of thousands of compounds, a somehow reduced accuracy will probably be acceptable for the benefit of short processing times. Chapter 8 gives a detailed introduction to the calculation methods for molecular descriptors. [Pg.490]

Figure 10.1-1. Flow chart for the general model building process in QSPR studies. |

Building a QSPR model consists of three steps descriptor calculation, descriptor analysis and optimization, and establishment of a mathematical relationship between descriptors and property. [Pg.512]

Quantitative Structure-Property Relationships (QSPR) 3, 96, 392, 401ff, 488ff, 494, 516, 605 [Pg.644]

QCISD (quadratic CISD) 113, 117, 119 QSAR (quantitative structure-activity relationships) 695-706, 710, 711 cross-validation 701 deriving equation 698-70 discriminant analysis 703-5 interpreting equation 702 neural networks 703-5 principal components regression 706 -property relationship 695, 702 selecting compounds for analysis 697-8 QSPR (quantitative structure-property relationship) 695, 702 quadratic region 283-4 quadrupole 76, 181, 183, 185-6, 196 quantitative structure-activity see QSAR quantum mechanics future role 160-1 [Pg.756]

QSPR methods have yielded the most accurate results. Most often, they use large expansions of parameters obtainable from semiempirical calculations along with other less computationally intensive properties. This is often the method of choice for small molecules. [Pg.114]

When the property being described is a physical property, such as the boiling point, this is referred to as a quantitative structure-property relationship (QSPR). When the property being described is a type of biological activity, such as drug activity, this is referred to as a quantitative structure-activity relationship (QSAR). Our discussion will first address QSPR. All the points covered in the QSPR section are also applicable to QSAR, which is discussed next. [Pg.243]

The first step in developing a QSPR equation is to compile a list of compounds for which the experimentally determined property is known. Ideally, this list should be very large. Often, thousands of compounds are used in a QSPR study. If there are fewer compounds on the list than parameters to be fitted in the equation, then the curve fit will fail. If the same number exists for both, then an exact fit will be obtained. This exact fit is misleading because it fits the equation to all the anomalies in the data, it does not necessarily reflect all the correct trends necessary for a predictive method. In order to ensure that the method will be predictive, there should ideally be 10 times as many test compounds as fitted parameters. The choice of compounds is also important. For [Pg.243]

The process described in the preceding paragraphs has seen widespread use. This is partly because it has been automated very well in the more sophisticated QSPR programs. [Pg.246]

The development of group additivity methods is very similar to the development of a QSPR method. Group additivity methods can be useful for properties that are additive by nature, such as the molecular volume. For most properties, QSPR is superior to group additivity techniques. [Pg.246]

© 2019 chempedia.info