Structure/substructure retrieval

Most corporate databases of chemical compounds (libraries) are of the 2D type. The databases are managed using software that allows fast registration of new structures, fast retrieval of previously stored compounds, and fast substructure searching. (For more information about chemical database management software, see www.mdl.com or www.daylight.com.)... [Pg.362]

InfoChem cooperates with several well-known scientific publishing houses in the concept development and implementation of Internet/Intranet versions of printed major chemistry reference works (MRW). InfoChem applications provide standard text search as well as tools to search by structure, substructure and by reactions. Additionally, InfoChem s global Major Reference Work application (gMRW) allows the simultaneous retrieval of structures, reactions and text in several major reference works, in. Key capabilities and offerings ... [Pg.157]

John Wiley, Science of Synthesis from Thieme Verlag, Comprehensive Asymmetric Catalysis - CAC from Springer) has been developed by InfoChem and allows the retrieval of structures, reactions and text. InfoChem s application global Major Reference Works (gMRW) makes the individual MRWs available in one application that enables global searches over the various MRW databases simultaneously. Currently scientist can perform structure, substructure, and reaction retrieval in approximately 250,000 reactions. [Pg.159]

II. Product Summary Jubilant Biosys has created some content products in the bioinformatics and chemoinformatics area. These products leverage Jubilant s curation services to incorporate extensive curated databases with structured query modules and front-ends for data retrieval. The content is for the drug discovery process, specifically in the areas of target prioritization and lead identification. The databases are available in Oracle, SD format, and ISIS/Base DB formats and can be exported. The database can be queried across text, structure, substructure and sequences with built-in query modules. Some of the key parameters on which information is curated are ... [Pg.164]

The chemicals stored in the inventory can be searched by exact structure, substructure, or similarity [26], Similarity searching aims at retrieving compounds that are similar to a query compound by one or more measures of similarity. A set of structural features of the target molecule is compared with those of each chemical in the database, generating a similarity measure by a chosen metric such as the Tanimoto coefficient [27]. More details about chemical similarity are given below in relation to the chemical similarity tool. [Pg.761]

ChemSmart, and PSIDOM, offer the capability not only to draw and store chemical structures, but also retrieve records via structure, substructure and text search techniques. The graphic quality of the records, system expertise, and price vary considerably among these products and as such offer the user a wide selection of products to choose from to meet his needs. [Pg.29]

The Registry File (2D) may be searched via such services as STN (Science and Technology Network), using a variety of retrieval modes including exact structure, substructure, and similarity search. [Pg.2782]

The performance of SESAMl benefits greatly from information-rich substructures provided by INFERCNMR. Since SESAMl incorporates each of the predicted. substructures in every candidate structure generated, if any one of the substructures is invalid every candidate will be incorrect. Thus, SESAMl has a requirement for predicted substructures that are high in both information content and accuracy. Within a limited range (up to approximately 3.0 ppm), the number and size of valid substructures retrieved increases as the width of the matching tolerance increases however, the proportion of false-positive results also increases (i.e., accuracy decreases). To strike an acceptable compromise between the two conflicting objectives, the wider tolerance was retained and a step was added to distinguish between valid and invalid predictions. [Pg.2791]

Computer-based systems like SESAMl benefit from high information content as well as high accuracy of the input to the structure generator. Network performance was therefore also measured in terms of information retrievability, in this study at an accuracy of 90% (Rgo values), i.e., the percentage of the valid substructures retrieved at a prediction accuracy of 90%. A value of Y is selected (solid line) such that 90% of all predictions with output values equal to or greater than... [Pg.2793]

Besides structure and substructure searches, Gmclin provides a special search strategy for coordiuation compouuds which is found in no other database the ligand search system, This superior search method gives access to coordination compounds from a completely different point of view it is possible to retrieve all coordination compounds with the same ligand environment, independently of the central atom or the empirical formula of the compound. [Pg.249]

The next abstraction level of reaction retrieval is a so-called reaction substructure search in which both query structures arc considered as substructures. In the case of a reaction substructure search, no hydrogen atoms arc added internally during the execution of the search. Atoms which have their valencies not completely saturated are considered as open sites, where any hind ofelement could be bonded. [Pg.265]

A useful empirical method for the prediction of chemical shifts and coupling constants relies on the information contained in databases of structures with the corresponding NMR data. Large databases with hundred-thousands of chemical shifts are commercially available and are linked to predictive systems, which basically rely on database searching [35], Protons are internally represented by their structural environments, usually their HOSE codes [9]. When a query structure is submitted, a search is performed to find the protons belonging to similar (overlapping) substructures. These are the protons with the same HOSE codes as the protons in the query molecule. The prediction of the chemical shift is calculated as the average chemical shift of the retrieved protons. [Pg.522]

The similarity of the retrieved protons to those of the query structure, and the distribution of chemical shifts among protons with the same HOSE codes, can be used as measures of prediction reliability. When common substructures cannot be found for a given proton (within a predefined number of bond spheres) interpolations are applied to obtain a prediction proprietary methods are often used in commercial programs. [Pg.522]

To seat ch for available starting materials, similarity searches, substructure searches, and some classical retrieval methods such as full structure searches, name searches, empirical formula searches, etc., have been integrated into the system. All searches can be applied to a number of catalogs of available fine chemicals (c.g, Fluka 154]. In addition, compound libraries such as in-housc catalogs can easily be integrated. [Pg.579]

Auto Search This button initiates from a structure query two or three automated series of search exact and substructure searches in local desktop versions exact, substructure and similarity searches in network version (under ISIS/Host). All the result lists are saved in CHIRBASE using exact-auto , SSS-auto and SIMXX %-auto names. XX is the highest similarity search value (from 80 % to 40 %) allowing to retrieve hits in CHIRBASE. The records in all the lists are unique. The SSS-auto list does not include records that are in the exact-auto list. The SIMXX %-auto list does not include records that are in exact and SSS-auto lists. [Pg.104]

As shown by the first prompt there are four types of search, of which we will discuss two exact and substructure (SSS). In an exact search, only information regarding exactly the stracture given will be retrieved, but even so there may well be several answers, because CA treats stereoisomers and isotopically substituted compounds as separate answers. At the conclusion of the search the system gives the number of answers, (e.g., 4). We may now look at the four answers by using the display command. As in the CA File, there is a choice of display formats, but if we choose SUB we will get (1) the Registry Number, (2) the approved CA index name, (3) other names that have appeared in CA for that compound, (4) a structural formula, and (5) the number of CA references since 1967, along with a notation as to... [Pg.1635]

Although an exact search can be useful, in most cases it does not give any more information than can be obtained from the printed CA. Substructure searches (SSS) are far more important, because there is no other way to get this information. If we do a substructure search on 4 in Figure A.l, we not only get all the answers we would get in an exact search, but all substances that contain, anywhere within their structure, the arrangement of atoms and bonds shown in 4. For example, 5,6,7, and 8 would all be retrieved in this search, but 9 would not be. The SSS searches typically retrieve from tens to hundreds of times as many answers as exact searches of the same stracture. Furthermore, the scope can be widened by the use of variable nodes. For example, the symbol X means any halogen, the symbol M any metal, and the symbol G allows the user to specify his or her own variable at that point (e.g., G =C1 or NO2 or Ph). As with an exact search, each answer can be displayed as described above. [Pg.1636]

Current chemical information systems offer three principal types of search facility. Structure search involves the search of a file of compounds for the presence or absence of a specified query compound, for example, to retrieve physicochemical data associated with a particular substance. Substructure search involves the search of a file of compounds for all molecules containing some specified query substructure of interest. Finally, similarity search involves the search of a file of compounds for those molecules that are most similar to an input query molecule, using some quantitative definition of structural similarity. [Pg.189]

Structural data are readily available from the database, but it must be assumed that they were not originally collected with the requirements of the particular correlation in mind. It is essential that the substructure investigated be tightly defined, and that this is confirmed by careful checks of structures in the data set retrieved. (This requirement is squarely at odds with that for statistical respectability, for which the largest possible data set is generally desirable.) A last resort, if the question is important enough, is to collect new data. The structures examined can then be designed to answer specific questions. The answers to the questions will not, however, be available for months or even years. [Pg.92]

After the spectral matching process has been completed, the list of compounds with the top matching daughter spectra are identified and retrieved for each daughter spectrum in the reference compound. The molecular structures of the compounds with best matching spectra are drawn and compared for common substructures. The common substructures yield candidate spectrum/substructure correlations. Additional compounds are then tested to confirm or modify each correlation. Once the daughter spectrum is correlated with one or more substructures, this daughter spectrum is stored in the spectrum data base and is linked to the associated substructures stored in the structure data base. [Pg.328]

Hagadone, T.R. Molecular substructure similarity searching efficient retrieval in two-dimensional structure databases. [Pg.138]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...