Statistics and QSAR

An essential component of calculations is to calibrate new methods, and to use the results of calculations to predict or rationalize the outcome of experiments. Both of these types of investigation compare two types of data and the interest is in characterizing how well one set of data can represent or predict the other. Unfortunately, one or both sets of data usually contain noise , and it may be difficult to decide whether a poor correlation is due to noisy data or to a fundamental lack of connection. Statistics is a tool for quantifying such relationships. We will start with some philosophical considerations and move into elementary statistical measures, before embarking on more advanced tools. [Pg.547]

Model Parameters - Computational implementation Results Reality Hartree-Fock Basis set Various cutoffs Total energies o Atomization energy [Pg.547]

Introduction to Computational Chemistry, Second Edition. Frank Jensen. 2007 John Wiley Sons, Ltd [Pg.547]

A challenging task in material science as well as in pharmaceutical research is to custom tailor a compound s properties. George S. Hammond stated that the most fundamental and lasting objective of synthesis is not production of new compounds, but production of properties (Norris Award Lecture, 1968). The molecular structure of an organic or inorganic compound determines its properties. Nevertheless, methods for the direct prediction of a compound s properties based on its molecular structure are usually not available (Figure 8-1). Therefore, the establishment of Quantitative Structure-Property Relationships (QSPRs) and Quantitative Structure-Activity Relationships (QSARs) uses an indirect approach in order to tackle this problem. In the first step, numerical descriptors encoding information about the molecular structure are calculated for a set of compounds. Secondly, statistical and artificial neural network models are used to predict the property or activity of interest based on these descriptors or a suitable subset. [Pg.401]

The results of the 4D-QSAR case study are interesting and provide large amounts of data about the system of interest, and, unlike static 3D-QSAR methods (CoMFA and SOMFA), 4D-QSAR is able to provide the exact locations of statistically important interaction pharmacophore elements. The ability of this method to overcome the question of What conformation to use and predict the bioactive conformation is impressive and a major reason to use the software. Yet it is the ability to construct manifold models and examine several models for the same alignment that is the true benefit of this method. Add to the list the ability to determine the best alignment scheme (based on statistical and experimental results) and this method will provide more information than one could imagine. This abundance of information is key when troubleshooting results that are not in agreement with current beliefs. [Pg.203]

The key to any successful QSAR study is attention to detail. Special care needs to be taken not to bias the results in any manner and be aware that a small indiscretion with the preparation of the molecules or bioactivities at the beginning of the study can lead to a large issue at the end. Remember that QSAR is based on statistics, and like any science based on an art, it can be fickle. [Pg.204]

Statistical and computational methods have been used to quantify structure-activi relationships leading to quantitative structure-activity relationships (QSAR). The concqpt of QSAR can be dated back to the work of Crum, Brown and Fraser from 1868 to 1869, and Richardson, also in 1869. Many notable papers were published in the period leading up to the twentieth century by men such as Berthelot and Jungfleisch in 1872, Nemst in 1891, Ov ton in 1897 and Meyer in 1899 (7). Professor Corwin Hansch is now regarded by many as the father of QSAR, because of his work in the development of new and innovative techniques for QSAR. He and his co-woikers produced a paper that was to be known as the birtii of QSAR, and was oititled "Correlation of biological activity of phenoxyacetic acids with Hammett substituent constants and partition coefficients" (2). [Pg.100]

Some pharmacophore searching programs are designed to provide 3D-QSAR models that are capable of predicting the activity from a quantitative point of view. Apex-3D [34], Catalyst HypoGen [9] and Phase [17] are examples. Consequently, such a model should have correct statistics and abide by the common QSAR validation approaches. Box 3 describes some of them very briefly as they have been reviewed elsewhere (see [35-41]). [Pg.333]

One of the most useful tools to spot and eliminate errors is a spreadsheet, such as Excel or QuattroPro. QSAR modelers very frequently use spreadsheets to organize data into columns and rows of standardized values of the independent and dependent parameters. Spreadsheets allow easy sorting and filtering — two important functions used to find problem data and duplicates and other errors. In addition, spreadsheets have search and replace routines, plotting, and correlation functions, which allow the data to be reviewed in various comprehensive ways. The data can also be exported to other file types, which allow analysis by other software for statistics and any types of quantitative and qualitative relationships that may exist. It cannot be emphasized enough that the typical spreadsheet functions (including graphing functions) are excellent tools to find and eliminate erroneous or questionable values, duplicates, and other problem entries. [Pg.39]

We discuss how the size of a library can he drastically reduced without loss of information or decreases in the chances of finding a lead compound. The approach is based on the use of statistical molecular design (SMD) for the selection oflibrary compounds to synthesise and test, followed by the use of quantitative structure activity relationships (QSARs) for the evaluation of the resulting test data. The use of SMD and QSAR is, in turn, critically dependent on an appropriate translation of the molecular structure to numerical descriptors, the recognition of inhomogeneities (clusters) in both the structural... [Pg.197]

For an early discussion of the use of this type of descriptor combined with statistical experimental design, see Hellberg et al. [17] More recently, for the design of peptide libraries and QSAR modelling, principal properties have been described for a total of 87 natural and unnatural amino acids [18-20],... [Pg.203]

In organic chemistry, decomposition of molecules into substituents and molecular frameworks is a natural way to characterize molecular structures. In QSAR, both the Hansch-Fujita " and the Free-Wilson classical approaches are based on this decomposition, but only the second one explicitly accounts for the presence or the absence of substituent(s) attached to molecular framework at a certain position. While the multiple linear regression technique was associated with the Free-Wilson method, recent modifications of this approach involve more sophisticated statistical and machine-learning approaches, such as the principal component analysis and neural networks. ... [Pg.9]

Simplistic and heuristic similarity-based approaches can hardly produce as good predictive models as modern statistical and machine learning methods that are able to assess quantitatively biological or physicochemical properties. QSAR-based virtual screening consists of direct assessment of activity values (numerical or binary) of all compounds in the database followed by selection of hits possessing desirable activity. Mathematical methods used for models preparation can be subdivided into classification and regression approaches. The former decide whether a given compound is active, whereas the latter numerically evaluate the activity values. Classification approaches that assess probability of decisions are called probabilistic. [Pg.25]

To obtain a statistically sound QSAR, it is important that certain caveats be kept in mind. One needs to be cognizant about col-linearity between variables and chance correlations. Use of a correlation matrix ensures that variables of significance and/or interest are orthogonal to each other. With the rapid proliferation of parameters, caution must be exercised in amassing too many variables for a QSAR analysis. Topliss has elegantly demonstrated that there is a high risk of ending up with a chance correlation when too many variables are tested (62). [Pg.10]

Benigni, R. and Giuliani, A. (1991). What Kind of Statistics for QSAR Research Quant.Struct. Act.Relat., 10, 99-100. [Pg.537]

QSAR, statistical, and computational methods are used to determine the possibility that a material is a sensitizer and the potential severity of sensitization. In vivo methods are useful to diagnose skin disorders such as drug eruptions, contact dermatitis, immediate contact reactions (contact urticaria), and more. Allergic Contact Dermatitis (ACD) is an inflammatory skin disease, marked by a delayed skin response following skin contact with an allergic chemical. Test groups must be very large to assess this effect. To test for ACD, a test article or sample(s) must be initially exposed to the same skin site/area (induction phase). After a rest period of a week or more (others say over... [Pg.2647]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...