# QSPR

This is the domain of establishing Structure-Property or Structure-Activity Relationships (SPR or SAR), or even of finding such relationships in a quantitative manner (QSPR or QSAR). [c.3]

All the techniques described above can be used to calculate molecular structures and energies. Which other properties are important for chemoinformatics Most applications have used semi-empirical theory to calculate properties or descriptors, but ab-initio and DFT are equally applicable. In the following, we describe some typical properties and descriptors that have been used in quantitative structure-activity (QSAR) and structure-property (QSPR) relationships. [c.390]

The molecular electronic polarizability is one of the most important descriptors used in QSPR models. Paradoxically, although it is an electronic property, it is often easier to calculate the polarizability by an additive method (see Section 7.1) than quantum mechanically. Ah-initio and DFT methods need very large basis sets before they give accurate polarizabilities. Accurate molecular polarizabilities are available from semi-empirical MO calculations very easily using a modified version of a simple variational technique proposed by Rivail and co-workers [41]. The molecular electronic polarizability correlates quite strongly with the molecular volume, although there are many cases where both descriptors are useful in QSPR models. [c.392]

The MEP at the molecular surface has been used for many QSAR and QSPR applications. Quantum mechanically calculated MEPs are more detailed and accurate at the important areas of the surface than those derived from net atomic charges and are therefore usually preferable [Ij. However, any of the techniques based on MEPs calculated from net atomic charges can be used for full quantum mechanical calculations, and vice versa. The best-known descriptors based on the statistics of the MEP at the molecular surface are those introduced by Murray and Politzer [44]. These were originally formulated for DFT calculations using an isodensity surface. They have also been used very extensively with semi-empirical MO techniques and solvent-accessible surfaces [1, 2]. The charged polar surface area (CPSA) descriptors proposed by Stanton and Jurs [45] are also based on charges derived from semi-empirical MO calculations. [c.393]

To know what QSAR and QSPR are, and the steps in QSAR/QSPR. [c.401]

Figure 8-1. The general QSPR/QSAR problem, |

The QSPR/QSAR methodology can also be applied to materials and mixtures where no structural information is available. Instead of descriptors derived from the compound s structure, various physicochemical properties, including spectra, can be used. In particular, spectra are valuable in this context as they reflect the structure in a sensitive way. [c.433]

Two approaches to quantify/fQ, i.e., to establish a quantitative relationship between the structural features of a compoimd and its properties, are described in this section quantitative structure-property relationships (QSPR) and linear free energy relationships (LFER) cf. Section 3.4.2.2). The LFER approach is important for historical reasons because it contributed the first attempt to predict the property of a compound from an analysis of its structure. LFERs can be established only for congeneric series of compounds, i.e., sets of compounds that share the same skeleton and only have variations in the substituents attached to this skeleton. As examples of a QSPR approach, currently available methods for the prediction of the octanol/water partition coefficient, log P, and of aqueous solubility, log S, of organic compoimds are described in Section 10.1.4 and Section 10.15, respectively. [c.488]

Furthermore, QSPR models for the prediction of free-energy based properties that are based on multilinear regression analysis are often referred to as LFER models, especially, in the wide field of quantitative structure-activity relationships (QSAR). [c.489]

Quantitative Structure-Property Relationships (QSPR) [c.489]

The general procedure in a QSPR approach consists of three steps structure representation descriptor analysis and model building (see also Chapter X, Section 1.2 of the Handbook). [c.489]

Descriptors have to be found representing the structural features which are related to the target property. This is the most important step in QSPR, and the development of powerful descriptors is of central interest in this field. Descriptors can range from simple atom- or functional group counts to quantum chemical descriptors. They can be derived on the basis of the connectivity (topological or [c.489]

Figure 10.1-1. Flow chart for the general model building process in QSPR studies. |

The establishment of QSAR/QSPR models. This process is explained in more detail in Chapter 8. Good QSAR/QSPR models should be interpretable and guide the further development of a new drug. The computer system PASS prediction of activity spectra for substances) allows to predict simultaneously more than 500 biological activities. Among these activities are pharmacological main and side effects, mechanism of action, mutagenicity, carcinogenicity, teratogenicity, and embryotoxicity [19]. [c.605]

Quantitative structure property relationships (QSPR) and, when applied to biological activity, quantitative structure activity relationships (QSAR) are methods for determining properties due to very sophisticated mechanisms purely by a curve ht of that property to aspects of the molecular structure. This allows a property to be predicted independent of having a complete knowledge of its origin. For example, drug activity can be predicted without knowing the nature of the binding site for that drug. QSPR is covered in more detail in Chapter 30. [c.108]

It is important to realize that many important processes, such as retention times in a given chromatographic column, are not just a simple aspect of a molecule. These are actually statistical averages of all possible interactions of that molecule and another. These sorts of processes can only be modeled on a molecular level by obtaining many results and then using a statistical distribution of those results. In some cases, group additivities or QSPR methods may be substituted. [c.110]

QSPR methods have yielded the most accurate results. Most often, they use large expansions of parameters obtainable from semiempirical calculations along with other less computationally intensive properties. This is often the method of choice for small molecules. [c.114]

When the property being described is a physical property, such as the boiling point, this is referred to as a quantitative structure-property relationship (QSPR). When the property being described is a type of biological activity, such as drug activity, this is referred to as a quantitative structure-activity relationship (QSAR). Our discussion will first address QSPR. All the points covered in the QSPR section are also applicable to QSAR, which is discussed next. [c.243]

The first step in developing a QSPR equation is to compile a list of compounds for which the experimentally determined property is known. Ideally, this list should be very large. Often, thousands of compounds are used in a QSPR study. If there are fewer compounds on the list than parameters to be fitted in the equation, then the curve fit will fail. If the same number exists for both, then an exact fit will be obtained. This exact fit is misleading because it fits the equation to all the anomalies in the data, it does not necessarily reflect all the correct trends necessary for a predictive method. In order to ensure that the method will be predictive, there should ideally be 10 times as many test compounds as fitted parameters. The choice of compounds is also important. For [c.243]

The process described in the preceding paragraphs has seen widespread use. This is partly because it has been automated very well in the more sophisticated QSPR programs. [c.246]

It is possible to use nonlinear curve fitting (i.e., exponents of best fit). Nonlinear fitting is done by using a steepest-descent algorithm to minimize the deviation between the fitted and correct values. The drawback is possibly falling into a local minima, thus necessitating the use of global optimization algorithms. Automated algorithms for determining which descriptors to include in a nonlinear fit are possible, but there is not yet a consensus as to what technique is best. This approach can yield a closer fit to the data than multiple linear techniques. However, it is less often used due to the large amount of manual trial-and-error work necessary. Automated nonlinear fitting algorithms are expected to be included in future versions of QSPR software packages. [c.246]

The development of group additivity methods is very similar to the development of a QSPR method. Group additivity methods can be useful for properties that are additive by nature, such as the molecular volume. For most properties, QSPR is superior to group additivity techniques. [c.246]

As another example, we shall consider the influence of the number of descriptors on the quality of learning. Lucic et. al. [3] performed a study on QSPR models employing connectivity indices as descriptors. The dataset contained 18 isomers of octane. The physical property for modehng was boiling points. The authors were among those who introduced the technique of orthogonahzation of descriptors. [c.207]

The final group of methods used to calculate net atomic charges does not derive them from the electron density, but rather from the electrostatic potential aroimd the molecule. These mclecular-electrostatic- otential (MEP) derived charges are calculated by least-squares fitting of a set of net atomic charges so that they reproduce the calculated MEPs at a grid of points around the molecule as closely as possible. The CHELP [36] and RESP 137] techniques are well known for ab-initio and DFT calculations and MNDO-ESP [38] or VESPA [39] charges can be derived from semi-empirical calculations. Because MEP-derived charges are designed to reproduce the electrostatic properties of molecules as well as possible, they are inherently attractive for describing physical properties. However, in practice the simple Coulson or Mulliken charges have been used more frequently. MEP-derived charges, however, do occur in many QSPR models as the sums of all the MEP-derived charges on atoms of a given element in the molecule. [c.392]

Molecular dipole moments are often used as descriptors in QPSR models. They are calculated reliably by most quantum mechanical techniques, not least because they are part of the parameterization data for semi-empirical MO techniques. Higher multipole moments are especially easily available from semi-empirical calculations using the natural atomic orbital-point charge (NAO-PC) technique [40], but can also be calculated rehably using ab-initio or DFT methods. They have been used for some QSPR models. [c.392]

In general, a QSPR/QSAR study starts from a structure database. The molecular structitrc of each compound is entered and stored, providing information about -at least - the molecule s topology (suitable formats are discussed in Sections 2.4 and 2.9. If molecular descriptors are derived from the compound s 3D structure, both experimental and calculated geometries are used. Calculated geometries are submitted to a conformational analysis in order to restrict the study to low-cncrgy conformations. Based on the structure database, a variety of descriptors can be calculated. Optional descriptor subsets are selected. Statistical methods like multilinear regression analysis, or artificial neural networks such as backpropagation neural networks, arc applied to build models. These models relate the descriptors with the property or activity of interest. Finally, the models are validated with an external data set which has not been used for the construction of the model. The steps of a typical QSPR/QSAR study arc summarised as [c.402]

Quantum chemical descriptors such as atomic charges, HOMO and LUMO energies, HOMO and LUMO orbital energy differences, atom-atom polarizabilities, super-delocalizabilities, molecular polarizabilities, dipole moments, and energies sucb as the beat of formation, ionization potential, electron affinity, and energy of protonation are applicable in QSAR/QSPR studies. A review is given by Karelson et al. [45]. [c.427]

The abbreviation QSAR stands for quantitative structure-activity relationships. QSPR means quantitative structure-property relationships. As the properties of an organic compound usually cannot be predicted directly from its molecular structure, an indirect approach Is used to overcome this problem. In the first step numerical descriptors encoding information about the molecular structure are calculated for a set of compounds. Secondly, statistical methods and artificial neural network models are used to predict the property or activity of interest, based on these descriptors or a suitable subset. A typical QSAR/QSPR study comprises the following steps structure entry or start from an existing structure database), descriptor calculation, descriptor selection, model building, model validation. [c.432]

D descriptors), the 3D structure, or the molecular surface (3D descriptors) of a structure. Which kind of descriptors should or can be used is primarily dependent on the si2e of the data set to be studied and the required accuracy for example, if a QSPR model is intended to be used for hundreds of thousands of compounds, a somehow reduced accuracy will probably be acceptable for the benefit of short processing times. Chapter 8 gives a detailed introduction to the calculation methods for molecular descriptors. [c.490]

Two methods should be mentioned here as examples. Schaper and Samitier reported a model using topological descriptors (see Section 8.3) and back-propagation neural networks (see Section 9.5.7) [13]. Atoms excluding hydrogen were described by an algorithm called "canonical numbering , taking the number of bonds and the mass of the atoms into account. Using this information a set of indicator variables was derived, which stated whether an atom as described by the algorithm was found in a molecule or not. This descriptor set was reduced by simply eliminating the variables that were zero for the entire data set. The remaining 147 descriptors for the training data set were then used for the training of a three-layered back-propagation network, resulting in a standard deviation for the best net of 0.25 (147-3-1 architecture), whereas the test data set of 50 similar structures yielded a standard deviation of 0.66. This indicated that the trained network had not generalized the structural features related to log P. The ratio of the training data to the number of adjustable parameters was unfavorably low (—0.6) but should have been about 2.0 (see Section 9.5). On the other hand, the descriptors may not have encoded the important features after aU, the descriptor optimization process was quite rudimentary. This is thus a good example of problems that one may encounter in QSPR studies. [c.494]

Juts et al. developed QSPR models for the prediction of solubihty using multiple linear regression analysis (MLRA) and computational neural networks (CNN) (mainly back-propagation neural networks), relating it to the structures of a diverse set of 332 compounds [21]. A series of topological, geometric, and electronic descriptors were calculated. Genetic algorithm and simulated annealing routines, in conjunction with MLRA and CNN, were used to select subsets of descriptors that relate accurately to aqueous solubility. Nine descriptors, including four topological, one geometric, one electronic, and three polar surface area ones, were selected. The model had the corresponding root mean square (RMS) errors of 0.394, 0.358, and 0.343 for the training set, cross-validation set, and test set, respectively. [c.497]

Recently, several QSPR solubility prediction models based on a fairly large and diverse data set were generated. Huuskonen developed the models using MLRA and back-propagation neural networks (BPG) on a data set of 1297 diverse compoimds [22]. The compounds were described by 24 atom-type E-state indices and six other topological indices. For the 413 compoimds in the test set, MLRA gave = 0.88 and s = 0.71 and neural network provided [c.497]

In order to develop a proper QSPR model for solubility prediction, the first task is to select appropriate input deseriptors that are highly correlated with solubility. Clearly, many factors influence solubility - to name but a few, the si2e of a molecule, the polarity of the molecule, and the ability of molecules to participate in hydrogen honding. For a large diverse data set, some indicators for describing the differences in the molecules are also important. [c.498]

We know that every QSPR model is limited by tbe data set that is used for building the model. In order to examine the diversity of this data set (the Huuskonen [c.500]

To characterize the complete arrangement of atoms in a molecule, the entire molecule can be regarded as a connectivity graph where the edges represent the bonds and the nodes represent the atoms. By adding the number of bonds or the sum of bond lengths between aU pairs of atoms, it is possible to calculate a descriptor that defines the constitution of a molecule independently of conformational changes. The resulting descriptor is not restricted regarding the number of atoms. Clerc and Terkovics [8] used this method based on the number of bonds for the investigation of quantitative structure-property relationships (QSPR). [c.516]

A quantitative structure-activity relationship (QSAR) relates numerical properties of tl molecular structure to its activity by a mathematical model. The term quantitative stru ture-property relationship (QSPR) is also used, particularly when some property oth( than biological activity is concerned. In drug design, QSAR methods have often bee used to consider qualities beyond in vitro potency. The most potent enzyme inhibitor is ( little use as a drug if it cannot reach its target. The in vivo activity of a molecule is often composite of many factors. A structure-activity study can help to decide which featurt of a molecule give rise to its overall activity and help to make modihed compounds wil enhanced properties. The relationship between these numerical properties and the activil is often described by an equation of the general form [c.711]

A significant amount of research has focused on deriving methods for predicting log P, where P is the octanol-water partition coelficient. Other solubility and adsorption properties are generally computed from the log P value. There are some group additivity methods for predicting log P, some of which have extremely complex rules. QSPR techniques are reliably applicable to the widest range of compounds. Neural network based methods are very accurate so long as the unknown can be considered an interpolation between compounds in the training set. Database techniques are very accurate for organic compounds. The solvation methods discussed in chapter 24 can also be used. [c.115]

A similar technique is to derive a group additivity method. In this method, a contribution for each functional group must be determined. The contributions for the functional groups composing the molecule are then added. This is usually done from computations on a whole list of molecules using a htting technique, similar to that employed in QSPR. [c.208]

Floppy molecules present some additional dilficulty in applying QSAR/QSPR. They are also much more dilficult to work with in 3D QSAR. With QSAR/ QSPR, this problem can be avoided by using only descriptors that do not depend on the conformation, but the accuracy of results may sulfer. For more accurate QSPR, the lowest-energy conformation is usually what should be used. For QSAR or 3D QSAR, the conformation most closely matching a rigid molecule in the test set should be used. If all the molecules are floppy, finding the lowest-energy conformer for all and looking for some commonality in the majority might be the best option. [c.249]

QSPR and QSAR are useful techniques for predicting properties that would be very dilficult to predict by any other method. This is a somewhat empirical or indirect calculation that ultimately limits the accuracy and amount of information which can be obtained. When other means of computational prediction are not available, these techniques are recommended for use. There are a variety of algorithms in use that are not equivalent. An examination of published results and tests of several techniques are recommended. [c.249]

See pages that mention the term

**QSPR**:

**[c.491] [c.494] [c.208] [c.243] [c.245]**

See chapters in:

** Computational chemistry
-> QSPR
**

Molecular modelling Principles and applications (2001) -- [ c.695 , c.702 ]