Descriptor chemical/structural

Over and beyond the representations of chemical structures presented so far, there are others for specific applications. Some of the representations discussed in this section, e.g., fragment coding or hash coding, can also be seen as structure descriptors, but this is a more philosophical question. Structure descriptors are introduced in Chapter 8. [Pg.70]

A particularly good selection of physical properties may be spectra, because they are known to depend strongly on the chemical structure. In fact, different types of spectra carry different kinds of structural information, NMR spectra characterize individual carbon atoms in their molecular environment. They therefore correspond quite closely to fragment-based descriptors, as underlined by the success of approaches to predict NMR spectra by fragment codes (see Section 10.2.3). [Pg.431]

For example, the objects may be chemical compounds. The individual components of a data vector are called features and may, for example, be molecular descriptors (see Chapter 8) specifying the chemical structure of an object. For statistical data analysis, these objects and features are represented by a matrix X which has a row for each object and a column for each feature. In addition, each object win have one or more properties that are to be investigated, e.g., a biological activity of the structure or a class membership. This property or properties are merged into a matrix Y Thus, the data matrix X contains the independent variables whereas the matrix Ycontains the dependent ones. Figure 9-3 shows a typical multivariate data matrix. [Pg.443]

The data analysis module of ELECTRAS is twofold. One part was designed for general statistical data analysis of numerical data. The second part offers a module For analyzing chemical data. The difference between the two modules is that the module for mere statistics applies the stati.stical methods or rieural networks directly to the input data while the module for chemical data analysis also contains methods for the calculation ol descriptors for chemical structures (cl. Chapter 8) Descriptors, and thus structure codes, are calculated for the input structures and then the statistical methods and neural networks can be applied to the codes. [Pg.450]

As explained in Chapter 8, descriptors are used to represent a chemical structure and, thus, to provide a coding which allows electronic processing of chemical data. The example given here shows how a GA is used to Rnd an optimal set of descriptors for the task of classification using a Kohoncii neural network. The chromosomes of the GA are to be used as a means for selecting the descriptors they indicate which descriptors are used and which are rejected ... [Pg.471]

Finding the adequate descriptor for the representation of chemical structures is one of the basic problems in chemical data analysis. Several methods have been developed in the most recent decades for the description of molecules including their chemical or physicochemical properties [1]. [Pg.515]

Further prerequisites depend on the chemical problem to be solved. Some chemical effects have an undesired influence on the structure descriptor if the experimental data to be processed do not account for them. A typical example is the conformational flexibility of a molecule, which has a profound influence on a 3D descriptor based on Cartesian coordinates. In particular, for the application of structure descriptors with structure-spectrum correlation problems in... [Pg.517]

Multivariate data analysis usually starts with generating a set of spectra and the corresponding chemical structures as a result of a spectrum similarity search in a spectrum database. The peak data are transformed into a set of spectral features and the chemical structures are encoded into molecular descriptors [80]. A spectral feature is a property that can be automatically computed from a mass spectrum. Typical spectral features are the peak intensity at a particular mass/charge value, or logarithmic intensity ratios. The goal of transformation of peak data into spectral features is to obtain descriptors of spectral properties that are more suitable than the original peak list data. [Pg.534]

B and W J Howe 1991. Computer Design of Bioactive Molecules - A Method for Receptor-Based Novo Ligand Design. Proteins Structure, Function and Genetics 11 314-328. i H L 1965. The Generation of a Unique Machine Description for Chemical Structures - A hnique Developed at Chemical Abstracts Service. Journal of Chemical Documentation 5 107-113. J 1995. Computer-aided Estimation of Symthetic Accessibility. PhD thesis. University of Leeds, itan R, N Bauman, J S Dixon and R Venkataraghavan 1987. Topological Torsion A New )lecular Descriptor for SAR Applications. Comparison with Other Descriptors. Journal of emical Information and Computer Science 27 82-85. [Pg.740]

With the development of accurate computational methods for generating 3D conformations of chemical structures, QSAR approaches that employ 3D descriptors have been developed to address the problems of 2D QSAR techniques, e.g., their inability to distinguish stereoisomers. The examples of 3D QSAR include molecular shape analysis (MSA) [34], distance geometry [35,36], and Voronoi techniques [37]. [Pg.359]

Chemoinformatics (or cheminformatics) deals with the storage, retrieval, and analysis of chemical and biological data. Specifically, it involves the development and application of software systems for the management of combinatorial chemical projects, rational design of chemical libraries, and analysis of the obtained chemical and biological data. The major research topics of chemoinformatics involve QSAR and diversity analysis. The researchers should address several important issues. First, chemical structures should be characterized by calculable molecular descriptors that provide quantitative representation of chemical structures. Second, special measures should be developed on the basis of these descriptors in order to quantify structural similarities between pairs of molecules. Finally, adequate computational methods should be established for the efficient sampling of the huge combinatorial structural space of chemical libraries. [Pg.363]

Schuffenhauer A, Gillet VJ, Willett P. Similarity searching in files of 3D chemical structures analysis of the BIOSTER database using 2D fingerprints and molecular field descriptors. J Chem Inf Comput Sci 2000 40 295-307. [Pg.208]

Ivanduc, O., Balahan, A. T. The graph description of chemical structures. In Topological Indices and Related Descriptors in QSAR and QSPR, Devillers,). [Pg.106]

The importance of methods to predict log P from chemical structure was described in Chapter 14, which is focused on fragment- and atom-based approaches. In this chapter property-based approaches are reviewed, which comprise two main categories (i) methods that use three-dimensional (3D) structure representation and (ii) methods that are based on topological descriptors. [Pg.381]

In addition to looking for data trends in physical property space using PCA and PLS, trends in chemical structure space can be delineated by viewing nonlinear maps (NLM) of two-dimensional structure descriptors such as Unity Fingerprints or topological atom pairs using tools such as Benchware DataMiner [42]. Two-dimensional NLM plots provide an overview of chemical structure space and biological activity/molecular properties are mapped in a 3rd and/or 4th dimension to look for trends in the dataset. [Pg.189]

Wajima and coauthors offer an alternative approach to utilize animal VD data to predict human VD [13]. Several compound descriptors that included both chemical structural elements as well as animal VDSS values were subject to multiple linear regression and partial least squares statistical analyses, with human VDSS as the independent parameter to be predicted using a dataset of 64 drugs. Methods derived in this manner were compared to simple allometry for overall accuracy. Their analyses yielded the following regressions ... [Pg.478]

Cross-compound correlation including chemical descriptors Human VD is related to animal VD and chemical structure VDSS in rat and dog, computed chemical parameters [13]... [Pg.486]

At the low end of the hierarchy are the TS descriptors. This is the simplest of the four classes molecular structure is viewed only in terms of atom connectivity, not as a chemical entity, and thus no chemical information is encoded. Examples include path length descriptors [13], path or cluster connectivity indices [13,14], and number of circuits. The TC descriptors are more complex in that they encode chemical information, such as atom and bond type, in addition to encoding information about how the atoms are connected within the molecule. Examples of TC descriptors include neighborhood complexity indices [23], valence path connectivity indices [13], and electrotopological state indices [17]. The TS and TC are two-dimensional descriptors which are collectively referred to as TIs (Section 31.2.1). They are straightforward in their derivation, uncomplicated by conformational assumptions, and can be calculated very quickly and inexpensively. The 3-D descriptors encode 3-D aspects of molecular structure. At the upper end of the hierarchy are the QC descriptors, which encode electronic aspects of chemical structure. As was mentioned previously, QC descriptors may be obtained using either semiempirical or ab initio calculation methods. The latter can be prohibitive in terms of the time required for calculation, especially for large molecules. [Pg.485]

As illustrated in the next section, the use of biological fingerprints, such as from a BioPrint profile, provides a way to characterize, differentiate and cluster compounds that is more relevant in terms ofthe biological activity of the compounds. The data also show that different in silico descriptors based on the chemical structure can produce quite different results. Thus, the selection of the in silico descriptor to be used, which can range from structural fragments (e.g. MACCS keys), through structural motifs (Daylight keys) to pharmacophore/shape keys (based on both the 2D structure via connectivity and from actual 3D conformations), is very important and some form of validation for the problem at hand should be performed. [Pg.33]

The goal of differentiation, such as via clustering using descriptors derived from chemical structures, is to produce an end result as dose as possible to the result that... [Pg.37]

A classical Hansch approach and an artificial neural networks approach were applied to a training set of 32 substituted phenylpiperazines characterized by their affinity for the 5-HTiA-R and the generic arAR [91]. The study was aimed at evaluating the structural requirements for the 5-HTiA/ai selectivity. Each chemical structure was described by six physicochemical parameters and three indicator variables. As electronic descriptors, the field and resonance constants of Swain and Lupton were used. Furthermore, the vdW volumes were employed as steric parameters. The hydrophobic effects exerted by the ortho- and meta-substituents were measured by using the Hansch 7t-ortho and n-meta constants [91]. The resulting models provided a significant correlation of electronic, steric and hydro-phobic parameters with the biological affinities. Moreover, it was inferred that the... [Pg.169]

It has been stated that the global LSER equation (eq. 1.55) takes into consideration simultaneously the descriptors of the analyte and the composition of the binary mobile phase and it can be more easily employed than the traditional local LSER model [79], The prerequisite of the application of LSER calculations is the exact knowledge of the chemical structure and physicochemical characteristics of the analyses to be separated. Synthetic dyes as pollutants in waste water and sludge comply with these requirements, therefore in these cases LSER calculations can be used for the facilitation of the development of optimal separation strategy. [Pg.27]

In many cases of practical interest, no theoretically based mathematical equations exist for the relationships between x and y we sometimes know but often only assume that relationships exist. Examples are for instance modeling of the boiling point or the toxicity of chemical compounds by variables derived from the chemical structure (molecular descriptors). Investigation of quantitative structure-property or structure-activity relationships (QSPR/QSAR) by this approach requires multivariate calibration methods. For such purely empirical models—often with many variables—the... [Pg.117]

This example belongs to the area quantitative structure-property relationships (QSPR) in which chemical-physical properties of chemical compounds are modeled by chemical structure data—mostly built by multivariate calibration methods as described in this chapter und using molecular descriptors (Todeschini and Consonni... [Pg.186]

A set of n = 209 polycyclic aromatic compounds (PAC) was used in this example. The chemical structures have been drawn manually by a structure editor software approximate 3D-structures including all H-atoms have been made by software Corina (Corina 2004), and software Dragon, version 5.3 (Dragon 2004), has been applied to compute 1630 molecular descriptors. These descriptors cover a great diversity of chemical structures and therefore many descriptors are irrelevant for a selected class of compounds as the PACs in this example. By a simple variable selection, descriptors which are constant or almost constant (all but a maximum of five values constant), and descriptors with a correlation coefficient >0.95 to another descriptor have been eliminated. The resulting m = 467 descriptors have been used as x-variables. The y-variable to be modeled is the Lee retention index (Lee et al. 1979) which is based on the reference values 200, 300, 400, and 500 for the compounds naphthalene, phenanthrene, chrysene, and picene, respectively. [Pg.187]

Chemical structures are often characterized by binary vectors in which each vector component (with value 0 or 1) indicates absence or presence of a certain substructure (binary substructure descriptors). An appropriate and widely used similarity measure for such binary vectors is the Tanimoto index (Willett 1987), also called Jaccard similarity coefficient (Vandeginste et al. 1998). Let xA and xB be binary vectors with m components for two chemical structures A and B, respectively. The Tanimoto index fAB is given by... [Pg.269]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...