Structural analyses, molecular descriptors

A single index can absorb a limited amount of structural information and often needs to be combined with information that other indices carry. Wiener used two descriptors, W and P (paths of length 3), to correlate selected thermodynamic properties of alkanes and structurally related amides, alcohols, and fatty acids. Selection of topological indices for regression analysis and other studies is always somewhat arbitrary, just as the selection of a coordinate system is in solving mathematical problems in calculus. The solution of problems, naturally, does not depend on the choice of coordinates, but solving of the problem may. Similarly, in structure-property-activity studies the relative magnitude of the computed result for two structures is not necessarily sensitive to the choice of descriptors, but the possibility of a simple interpretation of results may depend on the selection of descriptors. It is therefore desirable to use descriptors that are structurally related as this will facilitate comparison and interpretation of the results obtained for different molecules. We will review here several sets of structurally related molecular descriptors. [Pg.3022]

For example, the objects may be chemical compounds. The individual components of a data vector are called features and may, for example, be molecular descriptors (see Chapter 8) specifying the chemical structure of an object. For statistical data analysis, these objects and features are represented by a matrix X which has a row for each object and a column for each feature. In addition, each object win have one or more properties that are to be investigated, e.g., a biological activity of the structure or a class membership. This property or properties are merged into a matrix Y Thus, the data matrix X contains the independent variables whereas the matrix Ycontains the dependent ones. Figure 9-3 shows a typical multivariate data matrix. [Pg.443]

Multivariate data analysis usually starts with generating a set of spectra and the corresponding chemical structures as a result of a spectrum similarity search in a spectrum database. The peak data are transformed into a set of spectral features and the chemical structures are encoded into molecular descriptors [80]. A spectral feature is a property that can be automatically computed from a mass spectrum. Typical spectral features are the peak intensity at a particular mass/charge value, or logarithmic intensity ratios. The goal of transformation of peak data into spectral features is to obtain descriptors of spectral properties that are more suitable than the original peak list data. [Pg.534]

Spectral features and their corresponding molecular descriptors are then applied to mathematical techniques of multivariate data analysis, such as principal component analysis (PCA) for exploratory data analysis or multivariate classification for the development of spectral classifiers [84-87]. Principal component analysis results in a scatter plot that exhibits spectra-structure relationships by clustering similarities in spectral and/or structural features [88, 89]. [Pg.534]

Chemoinformatics (or cheminformatics) deals with the storage, retrieval, and analysis of chemical and biological data. Specifically, it involves the development and application of software systems for the management of combinatorial chemical projects, rational design of chemical libraries, and analysis of the obtained chemical and biological data. The major research topics of chemoinformatics involve QSAR and diversity analysis. The researchers should address several important issues. First, chemical structures should be characterized by calculable molecular descriptors that provide quantitative representation of chemical structures. Second, special measures should be developed on the basis of these descriptors in order to quantify structural similarities between pairs of molecules. Finally, adequate computational methods should be established for the efficient sampling of the huge combinatorial structural space of chemical libraries. [Pg.363]

Schuffenhauer A, Gillet VJ, Willett P. Similarity searching in files of 3D chemical structures analysis of the BIOSTER database using 2D fingerprints and molecular field descriptors. J Chem Inf Comput Sci 2000 40 295-307. [Pg.208]

In general, the described techniques provide an effective, flexible, and relatively fast solution for library design based on analysis of bioscreening data. The quantitative relationships, based on the assessment of contribution values of various molecular descriptors, not only permit the estimation of potential biological activity of candidate compounds before synthesis but also provide information concerning the modification of the structural features necessary for this activity. Usually these techniques are applied in the form of computational filters for constraining the size of virtual combinatorial libraries and... [Pg.365]

The concept of property space, which was coined to quanhtahvely describe the phenomena in social sciences [11, 12], has found many appUcahons in computational chemistry to characterize chemical space, i.e. the range in structure and properhes covered by a large collechon of different compounds [13]. The usual methods to approach a quantitahve descriphon of chemical space is first to calculate a number of molecular descriptors for each compound and then to use multivariate analyses such as principal component analysis (PCA) to build a multidimensional hyperspace where each compound is characterized by a single set of coordinates. [Pg.10]

On the other hand, there is considerable interest to quantify the similarities between different molecules, in particular, in pharmacology [7], For instance, the search for a new drug may include a comparative analysis of an active molecule with a large molecular library by using combinatorial chemistry. A computational comparison based on the similarity of empirical data (structural parameters, molecular surfaces, thermodynamical data, etc.) is often used as a prescreening. Because the DFT reactivity descriptors measure intrinsic properties of a molecular moiety, they are in fact chemical fingerprints of molecules. These descriptors establish a useful scale of similarity between the members of a large molecular family (see in particular Chapter 15) [18-21],... [Pg.332]

Menziani, M.C., Montorsi, M., De Benedetti, P.G. and Karelson, M. (1999) Relevance of theoretical molecular descriptors in quantitative structure-activity relationship analysis of alphal-adrenergic receptor antagonists. Bioorganic el Medicinal Chemistry, 7, 2437-2451. [Pg.192]

In our study we compare two diversity-driven design methods (uniform cell coverage and clustering), two analysis methods motivated by similarity (cell-based analysis and cluster-classification), and two descriptor sets (BCUT and constitutional). Thus, our study addresses some of the many questions arising in a sequential screen how to choose the initial screen, how to analyze the structure-activity data, and what molecular descriptor set to use. The study is limited to one assay and thus cannot be definitive, but it at least provides preliminary insights and reveals some trends. [Pg.308]

Partial least squares regression analysis (PLS) has been used to predict intensity of sweet odour in volatile phenols. This is a relatively new multivariate technique, which has been of particular use in the study of quantitative structure-activity relationships. In recent pharmacological and toxicological studies, PLS has been used to predict activity of molecular structures from a set of physico-chemical molecular descriptors. These techniques will aid understanding of natural flavours and the development of synthetic ones. [Pg.100]

Bodor, et al. [42] compare the use of artificial neural networks with regression analysis techniques for the development of predictive solubility models. They report that the performance of the neural network model is superior to the regression-based model. Their study is based on a training set of 331 compounds. The model requires a diverse set of molecular descriptors to account for the structural variety in the training compounds. [Pg.128]

The QSAR models can be used to estimate the treatability of organic pollutants by SCWO. For two chemical classes such as aliphatic and aromatic compounds, the best correlation exists between the kinetic rate constants and EHOMO descriptor. The QSAR models are compiled in Table 10.13. By analyzing the behavior of the kinetic parameters on molecular descriptors, it is possible to establish a QSAR model for predicting degradation rate constants by the SCWO for organic compounds with similar molecular structure. This analysis may provide an insight into the kinetic mechanism that occurs with this technology. [Pg.433]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...