Structure search with descriptors

RI is a program that supports the development and execution of rule scripts, written in rule description language (RDL) [25]. RDL combines substructure search with descriptor-oriented selection, incorporates Boolean logic, and allows the generation of a tree-like decision structure. RI was written in Delphi code under MS Windows and derived from the OASIS SAR system [26,27]. [Pg.56]

There are two schools of thought in model building. One method is to search for correlations amongst a vast array of physicochemical and structural variables, with no preconceived notion of mechanism. This approach avoids bias and may detect an unexpected relationship, but there is a danger of finding chance relationships that can be misleading (Topliss and Edwards, 1979). The alternative method is to suggest a physical model and make a choice of appropriate descriptors to test that model. [Pg.103]

Similarity Search. A type of "fuzzy" structure searching in which molecules are compared with respect to the degree of overlap they share in terms of topological and/or physicochemical properties. Topological descriptors usually consist of substructure keys or fingerprints, in which case a similarity coefficient like the Tanimoto coefficient is computed. In the case of calculated properties, a simple correlation coefficient may be used. The similarity coefficient used in a similarity search can also be used in various types of cluster analysis to group similar structures. [Pg.410]

Many numerical values such as the number of aromatic rings or rotatable bonds or the calculated logP can be derived from the chemical structure. Like 1D descriptors, these numerical descriptors are often used for fast filtering of databases, for example, to sort out molecules with an unwanted lipophilicity. Since they are independent for the molecule conformation, 2D descriptors are usually well suited for similarity searches as well as for machine learning purposes. [Pg.64]

Third, there is a need to be able to integrate the basic searching facilities with the more sophisticated routines for three-dimensional structure matching that have been described in this review. ° ° ° An obvious related area is the use of three-dimensional structures to derive descriptors for quantitative structure-aaivity relationships. This could be either an extension of the wide use of two-dimensional fragments for substructural analysis stud-ies 8 8i to three-dimensional fragments, or the automatic generation of data for a prediction of potency, e.g., the recent work of Cramer et al. ° ... [Pg.254]

Figure 3 Structural alignments with discrete properties. Methods are based on discrete properties using the DG algorithm (1) or clique-detection (11) as implemented in distance comparisons (DISCO), and Apex-3D. The structure representation, based on discrete properties, resorts to one atomic descriptor (I), usually the atom type, or multiple atomic or site descriptors (II). In the first method (I), the conformational analysis is restricted to the generation of molecular geometries which allow a common arrangement of selected phaimacophoric moieties present in a rigid compound used as template. In the second method (II), the conformational analysis procedure may involve a systematic enumeration of all the possible conformadons for each ligand. The search similarity is directed towards the confirmation of a predefined pharmacophore postulated by the modeler or from some classical SAR in the case of the active analog approach (1), or the automated identification of pharmacophores and bioacdve conformations (II)...

Stereochemistry. Some systems represent stereochemistry explicitly throughout all levels of Figure 2 some represent it only graphically, or only with text descriptors. Some systems do not treat the stereochemistry of double bonds. Few systems treat noncarbon stereochemistry. Some systems allow structure and substructure searching with stereochemistry in query structures, others do not. Different systems that represent stereochemistry have different ways of handling relative and absolute stereochemistry. No systems perceive the stereochemistry implicit in the biphenyl system 1, although it can be represented graphically. [Pg.33]

Multivariate data analysis usually starts with generating a set of spectra and the corresponding chemical structures as a result of a spectrum similarity search in a spectrum database. The peak data are transformed into a set of spectral features and the chemical structures are encoded into molecular descriptors [80]. A spectral feature is a property that can be automatically computed from a mass spectrum. Typical spectral features are the peak intensity at a particular mass/charge value, or logarithmic intensity ratios. The goal of transformation of peak data into spectral features is to obtain descriptors of spectral properties that are more suitable than the original peak list data. [Pg.534]

On the other hand, there is considerable interest to quantify the similarities between different molecules, in particular, in pharmacology [7], For instance, the search for a new drug may include a comparative analysis of an active molecule with a large molecular library by using combinatorial chemistry. A computational comparison based on the similarity of empirical data (structural parameters, molecular surfaces, thermodynamical data, etc.) is often used as a prescreening. Because the DFT reactivity descriptors measure intrinsic properties of a molecular moiety, they are in fact chemical fingerprints of molecules. These descriptors establish a useful scale of similarity between the members of a large molecular family (see in particular Chapter 15) [18-21],... [Pg.332]

Chemical Information, Irvine CA Tripos, Inc. St. Louis MO), similarity searching can be carried out around a well-defined compound class using local descriptors such as atom pairs [46, 47] or topomeric shape [48, 49]. Also, ligand-based pharmacophore searches are able to identify follow-up compounds that are less obvious and more diverse than similarity searches [30, 50-54]. The problem with the latter methods is defining the molecular shape or pharmacophore specifically enough to be useful when there are few hits within a compound class and they cannot be reliably aligned (as is often the case for NMR hits in the absence of detailed structural information). [Pg.399]

To demonstrate the use of binary substructure descriptors and Tanimoto indices for cluster analysis of chemical structures we consider the 20 standard amino acids (Figure 6.3) and characterize each molecular structure by eight binary variables describing presence/absence of eight substructures (Figure 6.4). Note that in most practical applications—for instance, evaluation of results from searches in structure databases—more diverse molecular structures have to be handled and usually several hundred different substructures are considered. Table 6.1 contains the binary substructure descriptors (variables) with value 0 if the substructure is absent and 1 if the substructure is present in the amino acid these numbers form the A-matrix. Binary substructure descriptors have been calculated by the software SubMat (Scsibrany and Varmuza 2004), which requires as input the molecular structures in one file and the substructures in another file, all structures are in Molfile format (Gasteiger and Engel 2003) output is an ASCII file with the binary descriptors. [Pg.270]

For this task, easily accessible properties of mixtures or pure metabolites are compared with literature data. This may be the biological activity spectrum against a variety of test organisms. Widely used also is the comparison of UV [90] or MS data and HPLC retention times with appropriate reference data collections, a method which needs only minimal amounts and affords reliable results. Finally, there are databases where substructures, NMR or UV data and a variety of other molecular descriptors can be searched using computers [91]. The most comprehensive data collection of natural compounds is the Dictionary of Natural Products (DNP) [92], which compiles metabolites from all natural sources, also from plants. More appropriate for dereplication of microbial products, however, is our own data collection (AntiBase [93]) that allows rapid identification using combined structural features and spectroscopic data, tools that are not available in the DNP. [Pg.228]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...