Similarity searching precision

Structure and substructure searching are very powerful ways of accessing a database, but they do assume that the searcher knows precisely the information that is needed, that is, a specific molecule or a specific class of molecules, respectively. The third approach to database searching, similarity searching, is less precise in nature because it searches the database for molecules that are similar to the user s query, without formally defining exactly how the molecules should be related (Fig. 8.3). [Pg.193]

One element of database generation that is a key consideration is whether to expand the representative compounds to include alternative tautomers, protonated and deprotonated forms of the molecule, and also to enumerate stereochemistry fully if not specified in the input. Depending on the molecules in question and the options considered, these can lead to a 10-fold increase in the size of the database to be explored. However, such an expansion is necessary if methods are used that are sensitive to such chemical precision (e.g., docking). For 3D similarity searching, it is sometimes more efficient to consider various modifications to the query, leading to multiple searches against a smaller database. [Pg.92]

It is inconvenient to have to specify two measures, i.e., recall and precision, to quantify the effectiveness of a search. The Merck group have made extensive use of the enrichment factor, i.e., the number of actives retrieved relative to the number that would have been retrieved if compounds had been picked from the database at random (12). Thus, using the notation of Table 1, the enrichment factor at some point, n, in the ranking resulting from a similarity search is given by... [Pg.55]

The two similarity searching procedures discussed thus far in this section make no attempt to align the target molecule with each of the database structures the elimination of this step is one of the main reasons why these programs are so fast in operation. The remaining procedures, called DOCK, CLIX, and SPERM, all consider the precise alignment of the target structure in the calculation of similarity. [Pg.490]

Substructure and 3D pharmacophore searching involve the specification of a precise query, which is then used to search a database in order to identify molecules for screening. In such an approach, either a molecule matches the query or it does not Similarity searching offers a complementary approach, in that the query is typically an entire molecule. This query molecule is compared to all molecules in the database and a similarity coefficient calculated. The top-scoring database molecules (based on the similarity coefficient) are the hits from the search. In a typical scenario the query molecule would be known to possess some desirable activity and the objective would be to identify molecules which will hopefully show the same activity. We therefore require some method for deciding how to compute the similarity between two molecules. In order to achieve this we need to choose a set of molecular descriptors for the compounds. These descriptors are then used to compute the similarity coefficient. [Pg.668]

While we portray the structure of matter in an hierarchical scheme, we also recognise that searching the structures is done at various levels of precision exact match, non-exact match with false positives allowed, non-exact match with false negatives allowed, fuzzy searches, heuristic searches, and similarity searches. Similarity searches themselves can be done at many adjustable levels, and similarity itself is multidimensional. [Pg.12]

CAS has developed a molecular property similarity search capability that will allow the rapid scanning of millions of substances with good precision. The technique can be combined with other techniques such as topological similarity searching to provide more of a structure focus or it can act as a screening step for a subsequent, more computationally intensive technique to accomplish, for example, electrostatic similarity. [Pg.303]

Basic conditions for efficient structure elucidation are the collection of spectroscopic data in centralized databases and easy access by all spectroscopic laboratories. Besides the classical similarity searches for structures and spectra, the prediction of data is of increasing importance. Database-supported spectrum predictions not only have the advantage of being very precise but they also enhance their precision automatically when new data are introduced. Another application of structure-oriented spectral databases is their use in partially or fully automated structure elucidation software. [Pg.2632]

Whatever the precise fragment definition that is used, the resulting molecular representations are input to dictionary-or fingerprint-based procedures analogous to those described previously for 2D similarity searching, with the final global similarity measure being obtained by means of a Tanimoto-like calculation. [Pg.2752]

A difficulty arises in describing the precise chemical nature of many inhibitor formulations that are actually used in practice. With the advancing technology of inhibitor applications there are an increasing number of formulations that are marketed under trade names. The compositions of these are, for various reasons, frequently not disclosed. A similar problem arises in describing the composition of many inhibitor formulations used in the former Soviet Union. Here the practice is to use an abbreviated classification system and it is often difficult to trace the actual composition, although in many cases a judicious literature search will provide the required information. [Pg.785]

Because of the high precision with which the frequencies of the interstellar lines can be measured (better than 1 part in 10s) there remains usually little doubt about the positive identification of the molecular species, despite the fact that only a few transitions out of the whole rotational spectrum of any one given molecule have been observed to date in the radio frequency range. Confirmation is obtained from observations of other rotational transitions, or from the detection of possible fine-structure components, or from observations of corresponding transitions of isotopically substituted species. However, some uncertainty still remains in the identification of formic acid, HCOOH, whose 1 io-ln transition is located in between two 18OH resonances. An independent search for the l0i — 0Oo transition for formic acid was negative (Snyder and Buhl, 1972). Similarly the identification of H2S and H20 still rests on only one observed interstellar radio transition and awaits further confirmation by the detection of other transitions. [Pg.39]

The ability of MS/MS to search for classes of compounds in a mixture will be as valuable in food and flavor analyses as it is in other complex mixture analyses, such as the pharmaceutical or environmental fields. Flavors are complex mixtures, but often consist of groups of chemically similar compounds. It is precisely the identification of these groups for which parent ion and neutral loss MS/MS experiments are particularly adept. This is a characteristic that is patently not available with GC/MS, which has been the usual method of analyses of these mixtures. [Pg.137]

As Hall and Campbell (1986) have stressed, about one-third of all human metastatic breast carcinomas regress in response to some form of endocrine therapy yet, despite much research, there is still no reliable way of identifying this group prior to treatment. One approach has been to search for milk proteins, particularly a-lactalbumin, within breast tumors or serum. Despite much effort. Hall and collaborators were unable to find a-lactalbumin being expressed in any of the breast tumors examined (see, e.g.. Hall et al., 1981). However, they did find a peptide that was similar, but not identical, to pre-a-lactalbumin. It is to be hoped that the precise nature of the peptide will be determined. [Pg.298]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...