Database similarity search

The workflow with WODCA starts with entering a target structure, the reaction product. The software automatically performs an identity search in the database to identify suitable starting materials. If no starting materials are found, the user can start a similarity search in the database. Similarity searches include 40 different criteria, such as the following ... [Pg.234]

Figure 8.10. Database similarity search on the World Wide Web. The figure illustrates the use of the NCBI BLAST Web front end. The query sequence should be pasted from the clipboard into the large text field (where the sequence of U43746 is shown in this figure). Other essential elements of the search are the name of the search program and the database, both of which may be selected from drop-down lists. Additional optional parameters may be set if desired. In addition to this "Advanced BLAST" form, there is also a "Basic BLAST" form in which the advanced options are hidden. In either case, simply press the Submit Query button to begin the search.

The protein sequence database is also a text-numeric database with bibliographic links. It is the largest public domain protein sequence database. The current PIR-PSD release 75.04 (March, 2003) contains more than 280 000 entries of partial or complete protein sequences with information on functionalities of the protein, taxonomy (description of the biological source of the protein), sequence properties, experimental analyses, and bibliographic references. Queries can be started as a text-based search or a sequence similarity search. PIR-PSD contains annotated protein sequences with a superfamily/family classification. [Pg.261]

Similarity searching is the database implementation of the similarity concept. Some of the steps involved in similarity searching are overviewed next, in the context of chemoinformatics. [Pg.310]

Following the similar structure - similar property principle", high-ranked structures in a similarity search are likely to have similar physicochemical and biological properties to those of the target structure. Accordingly, similarity searches play a pivotal role in database searches related to drug design. Some frequently used distance and similarity measures are illustrated in Section 8.2.1. [Pg.405]

Multivariate data analysis usually starts with generating a set of spectra and the corresponding chemical structures as a result of a spectrum similarity search in a spectrum database. The peak data are transformed into a set of spectral features and the chemical structures are encoded into molecular descriptors [80]. A spectral feature is a property that can be automatically computed from a mass spectrum. Typical spectral features are the peak intensity at a particular mass/charge value, or logarithmic intensity ratios. The goal of transformation of peak data into spectral features is to obtain descriptors of spectral properties that are more suitable than the original peak list data. [Pg.534]

Downs G M, P Willett and W Fisanick 1994. Similarity Searching and Qustering of Chemical Structure Databases using Molecular Property Data, journal of Chemical Information and Computer Sciences 34 1094-1102. [Pg.523]

Downs G M and Peter Willett 1995. Similarity Searching in Databases of Chemical Structures. In Lipkowitz K B and D B Boyd (Editors) Reviews in Computational Chemistry Volume 7. New York, VCH Publishers, pp. 1-66. [Pg.735]

Structure and substructure searching are very powerful ways of accessing a database, but they do assume that the searcher knows precisely the information that is needed, that is, a specific molecule or a specific class of molecules, respectively. The third approach to database searching, similarity searching, is less precise in nature because it searches the database for molecules that are similar to the user s query, without formally defining exactly how the molecules should be related (Fig. 8.3). [Pg.193]

Similarity searching requires the specification of an entire molecule, called the target structure or reference structure, rather than the partial structure that is required for substructure searching. The target molecule is characterized by a set of structural features, and this set is compared with the corresponding sets of features for each of the database structures. Each such comparison enables the calculation of a measure of similarity between the... [Pg.193]

Schuffenhauer A, Gillet VJ, Willett P. Similarity searching in files of 3D chemical structures analysis of the BIOSTER database using 2D fingerprints and molecular field descriptors. J Chem Inf Comput Sci 2000 40 295-307. [Pg.208]

One early step in the workflow of the medicinal chemist is to computationally search for similar compounds to known actives that are either available in internal inventory or commercially available somewhere in the world, that is, to perform similarity and substructure searches on the worldwide databases of available compounds. It is in the interest of all drug discovery programs to develop a formal process to search for such compounds and place them into the bioassays for both lead generation and analog-based lead optimization. To this end, various similarity search algorithms (both 2D and 3D) should be implemented and delivered directly to the medicinal chemist. These algorithms often prove complementary to each other in terms of the chemical diversity of the resulted compounds [8]. [Pg.307]

Downs GM, Willett P. Similarity searches in databases of chemical strnctnres. Rev Comput Chem 1995 7 1-66. [Pg.370]

Cramer RD, Jilek RJ, Andrews KM. Dbtop topomer similarity searching of conventional structure databases. J Mol Graph Model 2002 20 447-62. [Pg.371]

The E-state indices may define chemical spaces that are relevant in similarity/ diversity search in chemical databases. This similarity search is based on atom-type E-state indices computed for the query molecule [55]. Each E-state index is converted to a z score, Z =(% -p )/0 , where is the ith E-state atomic index, p is its mean and O is its standard deviation in the entire database. The similarity was computed with the EucHdean distance and with the cosine index and the database used was the Pomona MedChem database, which contains 21000 chemicals. Tests performed for the antiinflamatory drug prednisone and the antimalarial dmg mefloquine as query molecules demonstrated that the chemicals space defined by E-state indices is efficient in identifying similar compounds from drug and drug-tike databases. [Pg.103]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...