Processing Substructures Searching

To obtain an effective algorithm for substructure searching the factorial degree of the brute force algorithm has to be drastically deaeased. In the next sections we discuss several approaches where combination leads to a much more effective and apphcable approach for substructure searching. In the process of searching the isomorphism between Gq and a substructure of Gx, the partial mappings Gq —> Gj can be used as well. In these cases, not all atoms from Gq are mapped and, for those which are not, the array value Mj is set to 0. [Pg.297]

The pre-processing concepts have been a more recent development of substructure searching systems. These approaches have become popular since the mid-1980s, when the cost of the storage devices (hard disks and CD-ROMs) decreased. [Pg.298]

One early step in the workflow of the medicinal chemist is to computationally search for similar compounds to known actives that are either available in internal inventory or commercially available somewhere in the world, that is, to perform similarity and substructure searches on the worldwide databases of available compounds. It is in the interest of all drug discovery programs to develop a formal process to search for such compounds and place them into the bioassays for both lead generation and analog-based lead optimization. To this end, various similarity search algorithms (both 2D and 3D) should be implemented and delivered directly to the medicinal chemist. These algorithms often prove complementary to each other in terms of the chemical diversity of the resulted compounds [8]. [Pg.307]

We will highlight this system by first giving a brief overview of the architecture, followed by some practical examples that cover several common tasks in the drug discovery process. The goal is not to give a detailed account of the methods employed, but rather to illustrate how the system functions in practice. We will present as examples some of the most widely used chemoinformatics applications customized database access, similarity and substructure searching, reactant selection, and library design. [Pg.67]

The algorithm developments that support these system processes are summarized, and sample algorithms are provided in the appendix to illustrate supporting system processes in areas of registration, substructure searching, and interconversions. [Pg.129]

With the variety of chemical substance representations, i.e., fragment codes, systematic nomenclature, linear notations, and connection tables, a diversity of approaches and techniques are used for substructure searching. Whereas unique, unambiguous representations are essential for some registration processes, it is important to note that this often cannot be used to advantage in substructure searching. With connection tables, there is no assurance that the atoms cited in the substructure will be cited in the same order as the corresponding atoms in the structure. With nomenclature or notation representation systems, a substructural unit may be described by different terms or... [Pg.135]

Index. A secondary data field generated from one or more primary data fields, to enhance the searching and retrieval of the primary data. An index in a chemical database may be a characteristic of the database, such as Oracle indexes, or it may be a chemistiy-specific index such as a tree index for substructure searching. Indexes require extra space, and they typically must be created and maintained by some administrative process in the database. [Pg.405]

Another crucial aspect of the validation process is the test of how well described and represented the molecule is in the map of the chemical toxicity space that the regression equation represents. If the substructural key does not exist in the database used to build the model, then it is unlikely that the compound can be accurately estimated. In addition, if compounds similar to the test compound do not exist, then a comparison as was done above cannot be conducted and a measure of the performance of the model with compounds similar to the test material cannot be made. This type of validation requires a large database and a substructural search algorithm, and should be included in a QSAR estimate. [Pg.142]

However, a unique enumeration does not solve the problem of substructure search. Superimposition of substructures on structures would require the mapping of any combination of molecular graphs to find a graph isomorphism this is a tedious and time-consuming process. Because the rate of search is always one of the most important limitations for database applications, substructure search should incorporate additional preprocessing steps that restrict the number of molecules to be compared in an atom-by-atom matching algorithm. [Pg.64]

Graph-theoretical algorithms and data structures provide the basis for all modem 2-D chemical information systems, which offer three main types of searching facility. Structure search involves the search of a file of compounds for the presence or absence of a specified query compound. Such a search is required when there is a need to retrieve data associated with some compound or when a new molecule is to be added to a database and one needs to establish that it is not already present (a process that is normally referred to as registration). Substructure search involves the search of a file of compounds for all molecules containing some specified query substructure, irrespective of the environment in which the query substructure occurs. Finally, similarity search involves the search of a file of compounds for those molecules that are most similar to an input query molecule, using some quantitative definition of structural similarity. These three types of retrieval mechanism are considered now. [Pg.471]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...