The next step is to obtain geometries for the molecules. Crystal structure geometries can be used however, it is better to use theoretically optimized geometries. By using the theoretical geometries, any systematic errors in the computation will cancel out. Furthermore, the method will predict as yet unsynthesized compounds using theoretical geometries. Some of the simpler methods require connectivity only. [Pg.244]

Molecular descriptors must then be computed. Any numerical value that describes the molecule could be used. Many descriptors are obtained from molecular mechanics or semiempirical calculations. Energies, population analysis, and vibrational frequency analysis with its associated thermodynamic quantities are often obtained this way. Ah initio results can be used reliably, but are often avoided due to the large amount of computation necessary. The largest percentage of descriptors are easily determined values, such as molecular weights, topological indexes, moments of inertia, and so on. Table 30.1 lists some of the descriptors that have been found to be useful in previous studies. These are discussed in more detail in the review articles listed in the bibliography. [Pg.244]

Once the descriptors have been computed, is necessary to decide which ones will be used. This is usually done by computing correlation coelficients. Correlation coelficients are a measure of how closely two values (descriptor and property) are related to one another by a linear relationship. If a descriptor has a correlation coefficient of 1, it describes the property exactly. A correlation coefficient of zero means the descriptor has no relevance. The descriptors with the largest correlation coefficients are used in the curve fit to create a property prediction equation. There is no rigorous way to determine how large a correlation coefficient is acceptable. [Pg.244]

Intercorrelation coefficients are then computed. These tell when one descriptor is redundant with another. Using redundant descriptors increases the amount of fitting work to be done, does not improve the results, and results in unstable fitting calculations that can fail completely (due to dividing by zero or some other mathematical error). Usually, the descriptor with the lowest correlation coefficient is discarded from a pair of redundant descriptors. [Pg.244]

The process described in the preceding paragraphs has seen widespread use. This is partly because it has been automated very well in the more sophisticated QSPR programs. [Pg.246]

It is possible to use nonlinear curve fitting (i.e., exponents of best fit). Nonlinear fitting is done by using a steepest-descent algorithm to minimize the deviation between the fitted and correct values. The drawback is possibly falling into a local minima, thus necessitating the use of global optimization algorithms. Automated algorithms for determining which descriptors to include in a nonlinear fit are possible, but there is not yet a consensus as to what technique is best. This approach can yield a closer fit to the data than multiple linear techniques. However, it is less often used due to the large amount of manual trial-and-error work necessary. Automated nonlinear fitting algorithms are expected to be included in future versions of QSPR software packages. [Pg.246]

The validation of the prediction equation is its performance in predicting properties of molecules that were not included in the parameterization set. Equations that do well on the parameterization set may perform poorly for other molecules for several different reasons. One mistake is using a limited selection of molecules in the parameterization set. For example, an equation parameterized with organic molecules may perform very poorly when predicting the properties of inorganic molecules. Another mistake is having nearly as many fitted parameters as molecules in the test set, thus fitting to anomalies in the data rather than physical trends. [Pg.246]

The development of group additivity methods is very similar to the development of a QSPR method. Group additivity methods can be useful for properties that are additive by nature, such as the molecular volume. For most properties, QSPR is superior to group additivity techniques. [Pg.246]

All the techniques described above can be used to calculate molecular structures and energies. Which other properties are important for chemoinformatics Most applications have used semi-empirical theory to calculate properties or descriptors, but ab-initio and DFT are equally applicable. In the following, we describe some typical properties and descriptors that have been used in quantitative structure-activity (QSAR) and structure-property (QSPR) relationships. [Pg.390]

Molecular dipole moments are often used as descriptors in QPSR models. They are calculated reliably by most quantum mechanical techniques, not least because they are part of the parameterization data for semi-empirical MO techniques. Higher multipole moments are especially easily available from semi-empirical calculations using the natural atomic orbital-point charge (NAO-PC) technique [40], but can also be calculated rehably using ab-initio or DFT methods. They have been used for some QSPR models. [Pg.392]

To know what QSAR and QSPR are, and the steps in QSAR/QSPR. [Pg.401]

The method of building predictive models in QSPR/QSAR can also be applied to the modeling of materials without a unique, clearly defined structure. Instead of the connection table, physicochemical data as well as spectra reflecting the compound s structure can be used as molecular descriptors for model building, [Pg.402]

The QSPR/QSAR methodology can also be applied to materials and mixtures where no structural information is available. Instead of descriptors derived from the compound s structure, various physicochemical properties, including spectra, can be used. In particular, spectra are valuable in this context as they reflect the structure in a sensitive way. [Pg.433]

Two approaches to quantify/fQ, i.e., to establish a quantitative relationship between the structural features of a compoimd and its properties, are described in this section quantitative structure-property relationships (QSPR) and linear free energy relationships (LFER) cf. Section 3.4.2.2). The LFER approach is important for historical reasons because it contributed the first attempt to predict the property of a compound from an analysis of its structure. LFERs can be established only for congeneric series of compounds, i.e., sets of compounds that share the same skeleton and only have variations in the substituents attached to this skeleton. As examples of a QSPR approach, currently available methods for the prediction of the octanol/water partition coefficient, log P, and of aqueous solubility, log S, of organic compoimds are described in Section 10.1.4 and Section 10.15, respectively. [Pg.488]

Furthermore, QSPR models for the prediction of free-energy based properties that are based on multilinear regression analysis are often referred to as LFER models, especially, in the wide field of quantitative structure-activity relationships (QSAR). [Pg.489]

The general procedure in a QSPR approach consists of three steps structure representation descriptor analysis and model building (see also Chapter X, Section 1.2 of the Handbook). [Pg.489]

Descriptors have to be found representing the structural features which are related to the target property. This is the most important step in QSPR, and the development of powerful descriptors is of central interest in this field. Descriptors can range from simple atom- or functional group counts to quantum chemical descriptors. They can be derived on the basis of the connectivity (topological or [Pg.489]

D descriptors), the 3D structure, or the molecular surface (3D descriptors) of a structure. Which kind of descriptors should or can be used is primarily dependent on the si2e of the data set to be studied and the required accuracy for example, if a QSPR model is intended to be used for hundreds of thousands of compounds, a somehow reduced accuracy will probably be acceptable for the benefit of short processing times. Chapter 8 gives a detailed introduction to the calculation methods for molecular descriptors. [Pg.490]

Figure 10.1-1. Flow chart for the general model building process in QSPR studies. |

Building a QSPR model consists of three steps descriptor calculation, descriptor analysis and optimization, and establishment of a mathematical relationship between descriptors and property. [Pg.512]

Quantitative Structure-Property Relationships (QSPR) 3, 96, 392, 401ff, 488ff, 494, 516, 605 [Pg.644]

QCISD (quadratic CISD) 113, 117, 119 QSAR (quantitative structure-activity relationships) 695-706, 710, 711 cross-validation 701 deriving equation 698-70 discriminant analysis 703-5 interpreting equation 702 neural networks 703-5 principal components regression 706 -property relationship 695, 702 selecting compounds for analysis 697-8 QSPR (quantitative structure-property relationship) 695, 702 quadratic region 283-4 quadrupole 76, 181, 183, 185-6, 196 quantitative structure-activity see QSAR quantum mechanics future role 160-1 [Pg.756]

QSPR methods have yielded the most accurate results. Most often, they use large expansions of parameters obtainable from semiempirical calculations along with other less computationally intensive properties. This is often the method of choice for small molecules. [Pg.114]

The first step in developing a QSPR equation is to compile a list of compounds for which the experimentally determined property is known. Ideally, this list should be very large. Often, thousands of compounds are used in a QSPR study. If there are fewer compounds on the list than parameters to be fitted in the equation, then the curve fit will fail. If the same number exists for both, then an exact fit will be obtained. This exact fit is misleading because it fits the equation to all the anomalies in the data, it does not necessarily reflect all the correct trends necessary for a predictive method. In order to ensure that the method will be predictive, there should ideally be 10 times as many test compounds as fitted parameters. The choice of compounds is also important. For [Pg.243]

© 2019 chempedia.info