Substructure search with descriptors

RI is a program that supports the development and execution of rule scripts, written in rule description language (RDL) [25]. RDL combines substructure search with descriptor-oriented selection, incorporates Boolean logic, and allows the generation of a tree-like decision structure. RI was written in Delphi code under MS Windows and derived from the OASIS SAR system [26,27]. [Pg.56]

Rule Description Language (RDL) is a language developed for the Rule Interpreter (RI) software that combines substructure search with descriptor-oriented selection. [Pg.58]

Stereochemistry. Some systems represent stereochemistry explicitly throughout all levels of Figure 2 some represent it only graphically, or only with text descriptors. Some systems do not treat the stereochemistry of double bonds. Few systems treat noncarbon stereochemistry. Some systems allow structure and substructure searching with stereochemistry in query structures, others do not. Different systems that represent stereochemistry have different ways of handling relative and absolute stereochemistry. No systems perceive the stereochemistry implicit in the biphenyl system 1, although it can be represented graphically. [Pg.33]

A similarity search provides a way forward by retrieving the structures that are similar, but not identical, to a lead compound (94). Therefore, it overcomes some limitations of substructure search, for example, not requiring specific knowledge about the substructures responsible for activity, and being able to rank the output structures accordingto the overall similarity. The search query usually involves a set of descriptors that collectively specify the whole structure of the lead compound. This set of descriptors is compared with the corresponding set of descriptors for... [Pg.67]

The FP originate from molecular structure database software where they are used to speed up substructure searches. For this reason, routines to calculate FP are implemented in many commercial software packages and play an historically important role as a first attempt to describe diversity. Even though their applicability as descriptors is limited, the use of FP in library design is still widespread (see Section 9.7 for examples and comparison with other descriptors). In particular, similarity searches in huge databases can be performed very quickly using FPs. [Pg.575]

Static atomic properties are helpful to simplify interpretation rules for RDF descriptors. The product p p in Equation 5.13 for a given atom pair can be easily calculated, and the relations between the heights of individual peaks can be predicted. This approach is valuable for structure or substructure search in a database of descriptors. If a descriptor is calculated for a query molecule and if molecules with similar skeleton structures exist in the database, they will be found due to the unique... [Pg.125]

Descriptors based on pattern functions are helpful tools for a quick recognition of substructures. A pattern-search algorithm based on binary pattern descriptors can then be used for substructure search. However, patterns and other characteristics of descriptors that seem to indicate unique features should be investigated carefully. With these descriptors 3D similarity searches for complete structures or substructures in large databases are possible and computationally very efficient. In addition, descriptors can serve as the basis for a measure for the diversity of compounds in large data sets, a topic that is of high interest in combinatorial chemistry. [Pg.162]

CHEMLINE (CHEMical dictionary on-LINE) is a file of chemical descriptors created by NLM s Toxicology Information Program in collaboration with Chemical Abstracts Service (CAS). This file contains nearly 500,000 chemical substance names representing over 246,000 unique substances. Because of CHEMLINE s unique file design, it has capabilities which support both full structure and substructure searching. [Pg.58]

To demonstrate the use of binary substructure descriptors and Tanimoto indices for cluster analysis of chemical structures we consider the 20 standard amino acids (Figure 6.3) and characterize each molecular structure by eight binary variables describing presence/absence of eight substructures (Figure 6.4). Note that in most practical applications—for instance, evaluation of results from searches in structure databases—more diverse molecular structures have to be handled and usually several hundred different substructures are considered. Table 6.1 contains the binary substructure descriptors (variables) with value 0 if the substructure is absent and 1 if the substructure is present in the amino acid these numbers form the A-matrix. Binary substructure descriptors have been calculated by the software SubMat (Scsibrany and Varmuza 2004), which requires as input the molecular structures in one file and the substructures in another file, all structures are in Molfile format (Gasteiger and Engel 2003) output is an ASCII file with the binary descriptors. [Pg.270]

Searches were carried out as described previously, and the results are shown in Table 4. Inspection of these results reveals the general superiority of the circular substructure descriptors (with the notable exception of the FCFP 2), with the EFCP 4 fingerprints being the best for virtual screening of the sort advocated here. [Pg.142]

For this task, easily accessible properties of mixtures or pure metabolites are compared with literature data. This may be the biological activity spectrum against a variety of test organisms. Widely used also is the comparison of UV [90] or MS data and HPLC retention times with appropriate reference data collections, a method which needs only minimal amounts and affords reliable results. Finally, there are databases where substructures, NMR or UV data and a variety of other molecular descriptors can be searched using computers [91]. The most comprehensive data collection of natural compounds is the Dictionary of Natural Products (DNP) [92], which compiles metabolites from all natural sources, also from plants. More appropriate for dereplication of microbial products, however, is our own data collection (AntiBase [93]) that allows rapid identification using combined structural features and spectroscopic data, tools that are not available in the DNP. [Pg.228]

Similarity Search. A type of "fuzzy" structure searching in which molecules are compared with respect to the degree of overlap they share in terms of topological and/or physicochemical properties. Topological descriptors usually consist of substructure keys or fingerprints, in which case a similarity coefficient like the Tanimoto coefficient is computed. In the case of calculated properties, a simple correlation coefficient may be used. The similarity coefficient used in a similarity search can also be used in various types of cluster analysis to group similar structures. [Pg.410]

Third, there is a need to be able to integrate the basic searching facilities with the more sophisticated routines for three-dimensional structure matching that have been described in this review. ° ° ° An obvious related area is the use of three-dimensional structures to derive descriptors for quantitative structure-aaivity relationships. This could be either an extension of the wide use of two-dimensional fragments for substructural analysis stud-ies 8 8i to three-dimensional fragments, or the automatic generation of data for a prediction of potency, e.g., the recent work of Cramer et al. ° ... [Pg.254]

The team decided that the system software should be selected from a source that was well-established in the field of technical information. In addition, if one software and one command language could be used for the entire system, i.e., both text and chemical structure, it would be advantageous to the technical community. Because the thesaurus, a hierachical list of controlled terms, was the key to the text or document file, thesaurus software was also necessary. Search software for chemicals had to have the capability to search by substructure or full structure, by name, by compound number, by molecular formula, and by class descriptor. Continuity in both systems support and staff was a very important consideration. Another criterion was that the system be kept up-to-date with enhancements resulting from ongoing research in information science. [Pg.146]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...