Tanimoto similarity index

The pragmatic beauty of the chemical fingerprint is that the more common features of two molecules that there are, the more common bits are set. The mathematic approach used to translate the fingerprint comparison data into a measure of similarity tunes the molecular comparison [5]. The Tanimoto similarity index works well when a relatively sparse fingerprint is used and when the molecules to be compared are broadly comparable in size and complexity [5]. If the nature of the molecules or the comparison desired is not adequately met by the Tanimoto index, multiple other indices are available to the researcher. For example, the Daylight software offers the user over ten similarity metrics, and the Pipeline Pilot as distributed offers at least three. Some of these metrics (e.g., Tversky, Cosine) offer better behavior if the query molecule is significantly smaller than the molecule compared to it. [Pg.94]

Chemical structures are often characterized by binary vectors in which each vector component (with value 0 or 1) indicates absence or presence of a certain substructure (binary substructure descriptors). An appropriate and widely used similarity measure for such binary vectors is the Tanimoto index (Willett 1987), also called Jaccard similarity coefficient (Vandeginste et al. 1998). Let xA and xB be binary vectors with m components for two chemical structures A and B, respectively. The Tanimoto index fAB is given by... [Pg.269]

The Tanimoto index is a similarity measure a corresponding distance measure <7Tani—also called asymmetric binary distance—is... [Pg.269]

It is now universally acceptable to measure the similarity of two molecules using the Tanimoto Index of their database keys or fingerprints. There is no standard fingerprint and the similarity measures consequently can vary. Most fingerprints are dependent upon which functional groups are... [Pg.190]

Figure 5.14 Similarity/dissimilarity comparison of two molecules the Tanimoto index.

Tanimoto index > 0.9 but may be very different in terms of activity (chemically similar, biologically diverse), while completely different strucmres are known to have the same biological activity (chemically diverse, biologically similar). This intrinsic drawback to the computational screening of virtual libraries should always be considered when interpreting screening results of a computationally designed library, and real data should be used to refine any virtual SAR information based on chemical similarity or dissimilarity. [Pg.189]

These relate to bias in the data sets arising from the presence of closely related analogs, which by their nature have high 2D substruc-tural similarities, and the way the 3D pharmacophoric descriptors were generated (single conformation only) and used (bin setting, Tanimoto index). [Pg.211]

The Tanimoto index is the most common similarity index implemented in a number of structure searchable interfaces, where one compound is compared to another on the basis of fingerprints. The structure (most commonly, 2D structure) of a molecule is encoded as a pattern of bits set within a bit string (fingerprint) if a particular fragment is present at least once, then a corresponding bit is set in the bit string. [Pg.765]

A series of different indexes describes various part of the similarity and a single number is unable to capture all possible information. So, the similarity between partial orders is a multi-dimensional problem and any onedimensional representation in a single number will discard information. The principle of the modified Tanimoto index as a similarity index, Tand the linkage to other concepts are shown by Sorensen et al. (2003). [Pg.267]

To sort a compound database according to the similarity to a query stracture, fingerprints of each compound within the database are compared with the fingerprint of the query structure. This means that similarity is searched on a higher level of abstraction, and can be expressed for example by the Tanimoto index [Eq. (2)]. [Pg.1777]

Figure 15.6 "Chemical space" plots illustrating the chemical diversity of a screening library. In the chemical space plots, each point represents a compound and the proximity of two points is indicative of the structural similarity (as defined by two-dimensional fingerprints and a Tanimoto index [69]) between the corresponding compounds. In (a) 13 hit series, in which active compounds were identified, are highlighted. Three of these are circled, corresponding to series 8,11, and 13, which are analyzed in more detail in Figure 15.7. The...

Fig. 1. An example of two hydrogen-suppressed graphs G1 G2 and a common substructure CSIG,, G2) and the maximum common substructure MCS(G1 G2) are shown above. The Tanimoto similarity index and the distance between the two chemical graphs are computed below.

Fig. 4. (A) The other asymmetric Tversky similarity index, S VC, has a value of 0.69. Exchanging the roles of the query and target molecules (Q<=>T) gives (B), which shows that smaller target molecules are more likely to be retrieved from a large query structure using the asymmetric Tversky similarity index than the Tanimoto similarity index.

Fig. 3. Coverage of chemistry space by four overlapping sublibraries. (A) Different diversity libraries cover similar chemistry space but show little overlap. This shows three libraries chosen using different dissimilarity measures to act as different representations of the available chemistry space. The compounds from these libraries are presented in this representation by first calculating the intermolecular similarity of each of the compounds to all of the other compounds using fingerprint descriptors and the Tanimoto similarity index. Principal component analysis was then conducted on the similarity matrix to reduce it to a series of principal components that allow the chemistry space to be presented in three dimensions.

Any type of selected descriptor will provide a more or less complex characterization of each virtual library component. The use of similarity indices offers a straightforward method to evaluate similarities between virtual compounds. These indices use a bit-string representation for any descriptor (distances, fingerprints, pharmacophores, and so on) and, by simply counting the presence or the absence of specific bins and comparing the bit strings of virtual compounds, provide a numerical similarity index. The formula for the commonly used Tanimoto similarity index (71, 43), which can readily be transformed into the complementary diversity index, is reported in Fig. 5.14... [Pg.183]

There are many different types of similarity indexes, including the association coefficients (e.g., Tanimoto coefficient [27], Jaccard coefficient [38], Hodgkin-Richards coefficient [39,40]), the correlation coefficients or cosinelike indexes, and the distance coefficients or dissimilarity indexes (e.g., Hamming distance) [26],... [Pg.765]

Fligner, M.A., Verducd, J.S. and Blower, P.E. (2002) A modification of the Jaccard—Tanimoto similarity index for diverse selection of chemical compounds using binary strings. Technometrics, 44, 110-119. [Pg.1038]

A third definition, with a number of practical advantages, was suggested by a quantity utilised in statistical analysis. Our Tanimoto-like similarity index takes the form ... [Pg.100]

The variation of 7 (n) with < b( ) is illustrated in Fig. 5, from which it is evident that the Tanimoto-like index is somewhat more discriminating than the Hodgkin-like index when dealing with high values (i.e. with very similar systems). Of course, this conclusion is not specific to p-space indices. [Pg.100]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...