Searching Files of Two-Dimensional Chemical Structures

The first solution uses some algorithm that transforms any connection table of a molecule into a unique, canonical, form. The best known of these, the Morgan algorithm, chooses the numbering based on the numbers and properties of the neighbors of each atom of the structure. It is the basis of the Chemical Abstracts System Chemical Registry Service. There is also a canonicalization scheme for the SMILES notation of a chemical structure. [Pg.220]

An alternative, and highly efficient, structure search uses a hashing function, a computational procedure that takes some data record, such as a connection table or unique SMILES, and converts it to the computer address at which that record is stored. Although hashing functions may lead to more than one record with the same address, the detailed molecular comparisons need to be carried out for just those few molecules. Because of canonicalization and hashing functions, identity searching is generally very fast. [Pg.220]

Usually the canonicalization and hashing procedures do not distinguish the stereoisomers of a molecule. However, Wipke and Dyott describe the SEMA (Stereochemically Extended Morgan Algorithm), which incorporates information relating to double bonds and tetrahedral stereocenters into the Morgan algorithm. [Pg.220]

Screens traditionally denote the presence or absence of predefined atom-, bond-, or ring-centered substructural fragments. However, one may also use subgraphs of the molecules to generate a set of molecular fingerprints to use as the screen. The screen search checks each structure for those screens present in the query substructure. For maximum effeaiveness the fragments included should occur independently and with equal frequency in the database. [Pg.221]

Recently, Wipke and Rogers described superstructure searching, which retrieves those molecules that are contained within the query structure, rather than the inverse. The same bit maps are used the difference is that for a substructure search the bit screen of the hit must contain all of the query bits, whereas for a superstructure search the query must contain all of the bits present in the hit. This type of search has applications in computer-aided design of synthetic pathways and in building three-dimensional structures from three-dimensional fragments. [Pg.221]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...