Query molecule

In order to perform a database search the structural key of the query molecule or substructure is compared with the stored structural keys of the database entries. This implies that each array element in the structural key has to be defined initi-... [Pg.403]

A useful empirical method for the prediction of chemical shifts and coupling constants relies on the information contained in databases of structures with the corresponding NMR data. Large databases with hundred-thousands of chemical shifts are commercially available and are linked to predictive systems, which basically rely on database searching [35], Protons are internally represented by their structural environments, usually their HOSE codes [9]. When a query structure is submitted, a search is performed to find the protons belonging to similar (overlapping) substructures. These are the protons with the same HOSE codes as the protons in the query molecule. The prediction of the chemical shift is calculated as the average chemical shift of the retrieved protons. [Pg.522]

Current chemical information systems offer three principal types of search facility. Structure search involves the search of a file of compounds for the presence or absence of a specified query compound, for example, to retrieve physicochemical data associated with a particular substance. Substructure search involves the search of a file of compounds for all molecules containing some specified query substructure of interest. Finally, similarity search involves the search of a file of compounds for those molecules that are most similar to an input query molecule, using some quantitative definition of structural similarity. [Pg.189]

Figure 8.3 Example of a 2D similarity search, showing a query molecule and five of its nearest neighbors. The similarity measure for the search is based on 2D fragment bit-strings and the Tanimoto coefficient.

The E-state indices may define chemical spaces that are relevant in similarity/ diversity search in chemical databases. This similarity search is based on atom-type E-state indices computed for the query molecule [55]. Each E-state index is converted to a z score, Z =(% -p )/0 , where is the ith E-state atomic index, p is its mean and O is its standard deviation in the entire database. The similarity was computed with the EucHdean distance and with the cosine index and the database used was the Pomona MedChem database, which contains 21000 chemicals. Tests performed for the antiinflamatory drug prednisone and the antimalarial dmg mefloquine as query molecules demonstrated that the chemicals space defined by E-state indices is efficient in identifying similar compounds from drug and drug-tike databases. [Pg.103]

Atom-based methods [40, 57-68] cut molecules down to single atoms and commonly do not apply correction rules. According to Eq. (9) they work by summing the products of the contribution of an atom type i times the frequency of its presence in a query molecule ... [Pg.371]

The pragmatic beauty of the chemical fingerprint is that the more common features of two molecules that there are, the more common bits are set. The mathematic approach used to translate the fingerprint comparison data into a measure of similarity tunes the molecular comparison [5]. The Tanimoto similarity index works well when a relatively sparse fingerprint is used and when the molecules to be compared are broadly comparable in size and complexity [5]. If the nature of the molecules or the comparison desired is not adequately met by the Tanimoto index, multiple other indices are available to the researcher. For example, the Daylight software offers the user over ten similarity metrics, and the Pipeline Pilot as distributed offers at least three. Some of these metrics (e.g., Tversky, Cosine) offer better behavior if the query molecule is significantly smaller than the molecule compared to it. [Pg.94]

As stated before, PGVL is too large to be fully enumerated practically. Therefore our strategy is to find a way to focus in a just-in-time manner on much smaller sub-regions ( 104) of PGVL for subsequent on-the-fly enumeration followed by standard similarity search against the same query molecule. It is intuitively evident that a virtual compound space built from parallel synthesis reaction protocols has inherent array structures in the form of implicit arrays of related just-in-time enumerated compounds, even if those compounds do not have their molecular structures yet enumerated at the time this inherent array structure is exploited. [Pg.256]

Stepl Automatic scan over all PGVL reactions for retro-synthetic feasibility of the incoming query molecule and Disassemble the query molecule into combinations of virtual reactants. [Pg.257]

Step4 Perform standard similarity searches against those explicit virtual molecules using the query molecule. [Pg.257]

Identify suitable reactants most similar to the corresponding virtual reactants obtained from step 1 in order to focus on the most relevant sub-regions. But the disconnection does not necessarily result in bona fide known and available starting materials, after just step 1. Consider as an example a two-component reaction which in the PGVL has M suitable bona fide reactants for the first reaction component and N suitable bona fide reactants for the second reaction component. Two similarity searches are used in the step to select m (out of M) and n (out of N) reactants based on two virtual reactants as seeds, which arose from the exact disconnection of the query molecule. In most cases, M and N are 103, and m and n are 102. Here extra search parameters need to be specified and/or optimized for each reaction component. [Pg.258]

Fig. 13.3. Comparison of symmetric and asymmetric similarity scores. A virtual product from VRXN-2-00051 is used as a query molecule. The two corresponding Basis Products are VRXN-2-00051 A 1 and VRXN-2-00051 B 1. In reference to the query molecule, their corresponding similarity scores are listed under SS and AS (see equations [1] and...

Search a database of Basis Products using Asymmetric Similarity measure. Here this search is done using the query molecule against a database of 106 explicit enumerated Basis Products. The asymmetry similarity search in the BP database is implemented using MDL Keys finger print (24) with ISIS host technology (25). [Pg.262]

The output is a set of Basis Products with high asymmetric similarity (AS) values (the default cutoff value is set to 90%) when they are mapped into the query molecule. The reaction schemes and reactants encoded by those Basis Products are then extracted, ranked, and used to form sub-regions of PGVL for subsequent just-in-time enumeration and symmetric similarity search against the query molecule. [Pg.263]

Perform standard similarity search using the original query molecule against the enumerated products from the sub-region (s) obtained in step 2. This is identical to step 4 of LEAPT... [Pg.263]

Given a set of molecules known to be inside PGVL as query molecules, what is the success rate for returning the expected molecules identical to the query molecules (100% similarity threshold) This is by definition a baseline test that a validated search strategy must pass. [Pg.263]

Given a sub-region of PGVL that can be enumerated explicitly and a query molecule, compare search results obtained by a LEAP search with the reference sets obtained by the exhaustive search against the fully enumerated... [Pg.263]

Given a set of drug-like molecules as query molecules, what is the success rate of a search method in returning interesting search results not only similar to the input queries but also pertinent to lead optimization and/or lead hopping ... [Pg.264]

Test Three For the third validation test, we selected 24 known drugs on the market as query molecules (see Fig. 13.5). This is a very realistic and challenging set in terms of diversity in their molecular structures and complexities required for their synthesis. The top 10 most similar virtual compounds to each query molecule were identified and plotted as color dots in Fig. 13.6 for both LEAPl and LEAP2. [Pg.266]

For every query molecule, LEAP2 is able to return top 10 molecules with best similarity scores ranging from 0.4 for sertraline to 0.9 for celecoxib. Five of 24 searches return PGVL hits 80% or more similar to the query molecules for meaningful follow-up. If the threshold is relaxed to 0.7, then 11 of 24... [Pg.266]

Fig. 13.6. Results from the third validation study. The -axis represents the Tanimoto similarity score of returned hits with respect to their corresponding query molecule, calculated based on the FCFP4 molecular fingerprints (31). The x-axis are drug molecules in Fig. 13.5. Search hits are color coded by the PGVL reactions (VRXN) where they are originated from.

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...