Molecular fingerprinting Tanimoto similarity

The pragmatic beauty of the chemical fingerprint is that the more common features of two molecules that there are, the more common bits are set. The mathematic approach used to translate the fingerprint comparison data into a measure of similarity tunes the molecular comparison [5]. The Tanimoto similarity index works well when a relatively sparse fingerprint is used and when the molecules to be compared are broadly comparable in size and complexity [5]. If the nature of the molecules or the comparison desired is not adequately met by the Tanimoto index, multiple other indices are available to the researcher. For example, the Daylight software offers the user over ten similarity metrics, and the Pipeline Pilot as distributed offers at least three. Some of these metrics (e.g., Tversky, Cosine) offer better behavior if the query molecule is significantly smaller than the molecule compared to it. [Pg.94]

Because of the numerous choices for molecular descriptors, weighting factors, and similarity coefficients, there are many ways in which the similarities between pairs of molecules can be calculated. The most used molecular descriptors for defining similarity are probably the 2D fingerprints (22). The bit strings of the molecular fingerprints are used to calculate similarity coefficients. Table 2.3 lists several selected similarity coefficients that can be used with various 2D fingerprints (23). The Tanimoto coefficient is the most popular one (22). [Pg.38]

Fig. 13.6. Results from the third validation study. The -axis represents the Tanimoto similarity score of returned hits with respect to their corresponding query molecule, calculated based on the FCFP4 molecular fingerprints (31). The x-axis are drug molecules in Fig. 13.5. Search hits are color coded by the PGVL reactions (VRXN) where they are originated from.

For the library design, we have also used the Tanimoto coefficient (11) computed based on the molecular fingerprints from SciTegic Pipeline Pilot (14) as the measure of molecular similarity. [Pg.326]

Our approach to selecting a diverse subset is based on utilizing a minimum similarity between each molecule and all other molecules in the virtual library. For the 2-D fingerprints, the similarity is measured by a Tanimoto coefficient20 which measures similarity on a pair-wise basis. A Tanimoto coefficient for any pair of molecular structures lies in the range of zero (dissimilar) to one (similar). It is defined as the ratio of the number of common bits (in this case molecular fragments) set in two molecules divided by the number of bits set in either. [Pg.229]

As is well known, the Tanimoto similarity coefficient, which is the most widely used similarity measure, exhibits size-dependent behavior [5, 92-95] that can significantly influence the results of similarity searches. A significant part of the problem can be traced to the terms in the denominator of the Tanimoto function that counts the number of elements that are common to both molecular fingerprints. Thus, when molecules of widely varying sizes are treated, the number of elements in fingerprint... [Pg.360]

FIGURE 15.7 3D projections of PCA-based chemical spaces generated from a set of 2250 compounds obtained from nine datasets of 250 compounds each using four different molecular fingerprints (Atom pairs, MACCS keys, TGD, and piDAPH4) and the Tanimoto similarity function (see text for further details). For color details, please see color plate section. [Pg.381]

As described before, it is due to the vast amount of possible structures that one can never get an adequate sample of chemical space. One question is if the entire chemical space is relevant for finding pharmacologically active compounds and how to predict this for future targets [19]. Another question is how to sample a part of chemical space in a uniform, systematic fashion. Often, the answer is considered to be a diverse selection. However, what is diversity [20] The usual method to describe diversity is to determine Tanimoto distances. These coefficients are calculated by comparing the number of shared and unique molecular fingerprints within a pair of structures. Usually, compounds with Tanimoto >0.85 are considered to be similar. The lower the Tanimoto coefficients in a compound set are, the more structurally diverse the set can be... [Pg.101]

While the first application of this technique involved continuous molecular descriptors, the programs were later extended to include other molecular representations and molecular similarity metrics such as substructure keys, hashed fingerprints, Tanimoto coefficients, etc. ... [Pg.759]

A molecular similarity kernel, the Tanimoto similarity kernel, was used by Lind and Maltseva in SVM regression to predict the aqueous solubility of three sets of organic compounds. The Taniomto similarity kernel was computed from molecular fingerprints. The RMSE and q cross-validation statistics for the three sets show a good performance of SVMR with the Tanimoto kernel set 1 (883 compounds), RMSE = 0.62 and q = 0.88 set 2 (412 compounds), RMSE = 0.77 and = 0.86 and set 3 (411 compounds), RMSE = 0.57 and 17 = 0.88. An SVMR model was trained on set 1 and then tested on set 2 with good results, i.e., RMSE = 0.68 and q = 0.89. [Pg.377]

Similarity Comparison of molecules using molecular descriptors and a measure of similarity, for example a 2D fingerprint and the Tanimoto coefficient... [Pg.32]

Analysis of molecular similarity is based on the quantitative determination of the overlap between fingerprints of the query structure and all database members. As descriptors of a given molecule can be considered as a vector of real or binary attributes, most of the similarity measures are derived as vectorial distances. Tanimoto and Cosine coefficients are the most popular measures of similarity.Definitions of similarity metrics are collected in Table 3. [Pg.4017]

Godden, J.W., Xue, L. and Bajorath, J. (2000). Combinatorial Preferences Affect Molecular Similarity/Diversity Calculations Using Binary Fingerprints and Tanimoto Coefficients. J.Chem. lnf.Comput.Sci., 40,126-134. [Pg.572]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...