Tanimoto function similarity

The SQL statement above computes the Tanimoto similarity between all pairs of compounds using fingerprint bitstrings stored in the column gfp. The tanimoto function is described in Chapter 8 and shown in the Appendix. This SQL statement uses the Case conditional clause. This is done in order to avoid computing elements unnecessarily. The matrix of similarities is symmetric and the diagonal elements are exactly 1. The sqlQuery R function reads the rows of the similarity matrix into an R data.frame named tani. This is coerced into a matrix of the correct number of rows and columns using the matrix function and further coerced into a distance R object. The R distance object is the lower half of a symmetric distance matrix. Since the tanimoto similarity is used, the distance (or dissimilarity) is represented by 1.0 minus the tanimoto... [Pg.147]

The tanimoto function is used to compute the Tanimoto similarity of two bitstrings. The input bit strings would have been computed with the publicl66keys function or another equivalent fragment key or fingerprint function. [Pg.176]

As is well known, the Tanimoto similarity coefficient, which is the most widely used similarity measure, exhibits size-dependent behavior [5, 92-95] that can significantly influence the results of similarity searches. A significant part of the problem can be traced to the terms in the denominator of the Tanimoto function that counts the number of elements that are common to both molecular fingerprints. Thus, when molecules of widely varying sizes are treated, the number of elements in fingerprint... [Pg.360]

The Jaccard similarity coefficient is then computed with eq. (30.13), where m is now the number of attributes for which one of the two objects has a value of 1. This similarity measure is sometimes called the Tanimoto similarity. The Tanimoto similarity has been used in combinatorial chemistry to describe the similarity of compounds, e.g. based on the functional groups they have in common [9]. Unfortunately, the names of similarity coefficients are not standard, so that it can happen that the same name is given to different similarity measures or more than one name is given to a certain similarity measure. This is the case for the Tanimoto coefficient (see further). [Pg.65]

When molecules are represented by high-dimensional descriptors such as 2D fingerprints or several hundred topological indices, then the diversity of a library of compounds is usually calculated using a function based on the pairwise (dis)similarities of the molecules. Pairwise similarity can be quantified using a similarity or distance coefficient. The Tanimoto coefficient is most often used with binary fingerprints and is given by the formula below ... [Pg.340]

It is now universally acceptable to measure the similarity of two molecules using the Tanimoto Index of their database keys or fingerprints. There is no standard fingerprint and the similarity measures consequently can vary. Most fingerprints are dependent upon which functional groups are... [Pg.190]

Given a table of redundant properties, one could calculate dissimilarities as Euclidean distances and use MDS instead of PCA. Whereas the results would be similar, this is usually wasteful, because the number of molecules is typically much greater than the number of properties. However, if the similarities are best calculated from a nonlinear function of the properties, such as Tanimoto coefficients computed from two bit strings, the results would not be similar and nonlinear MDS should be used. One then gets back a set of latent properties (dimensions) for which Euclidean distance approximates the desired similarities. Thus, a simple rule of thumb is For redundant properties as input use PCA, for metric similarities as input use classical MDS, for nonmetric similarities use numerically refined MDS. [Pg.79]

Both exact structure- and substracture-searching options are available. Functional groups or atoms can be locked to prevent substitution or ring fusion. The answer sets are determined by the Tanimoto similarity metric. A reaction search for a typical Diels-Alder reaction between a diene and a dienophile having a cyano functional group yielded 253 reactions (Fig. 6.51). [Pg.361]

WiUett, Barnard, and Downs provide an extensive Usting of many types of similarity functions [61]. Five similarity functions will be considered here to illustrate how the current formulation provides a unified description Tanimoto ( Tan ), Cosine ( Cos ), Dice, and the related pair Max and Min. Table 15.3 summarizes the various forms. Importantly, the formulas apply to all three types of sets—classical, fuzzy, and multisets—providing a unity to the set-based approach. [Pg.358]

All of the set-based similarity functions in Table 15.3 are symmetric, have identical numerators, and are bounded by 0 and 1. Except for the Tanimoto similarity function, the denominators of and 5 are aggregation functions [50, 90] that... [Pg.360]

Vector- and function-based similarity functions also have ordering properties similar to those given in Eqnation 15.4.16 for set-based functions [99], although Tanimoto similarity can be an outlier in some cases. There is, however, a caveat,... [Pg.363]

How does this example apply to the use of multiple similarity methods Each of the similarity methods can be considered to be equivalent to an independent judge, since none of the values produced by the other methods have an explicit impact on the value produced by a given method. This may not always be the case, for example, if two methods use MACCS key fingerprints, but one uses the Tanimoto (Jacard) and the other a closely related similarity function (see Table 15.3). As shown by Gower [76], some molecular similarity functions are monotonically related. Thus, comparisons of these functions based on the same molecular representation will produced linear correlations of the values computed by the two functionally similarity functions. Hence, only one of the functions should be used. [Pg.374]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...