Tanimoto coefficient

Compute similarity coefficients (Tanimoto, Dice, Ochia, Hamming) between the active site fingerprint and the fingerprint for each molecule in the Catalyst database. The top N% of compounds can be selected by ranking the compound collection in descending order based on the similarity coefficient. [Pg.199]

Consequently, we can construct a similarity measure intuitively in the following way all matches c -i- d relative to all possibilities, i.e., matches plus mismatches (c+ d) + (a -I- h), yields (c -t- d) / a + b+ c + d), which is called the simple matching coefficient [18], and equal weight is given to matches and mismatches. (Normalized similarity measures are called similarity indices or coefficients see, e.g.. Ref. [19].) When absence of a feature in both objects is deemed to convey no information, then d should not occur in a similarity measure. Omitting d from the above similarity measure, one obtains the Tanimoto (alias Jaccard) similarity measure (Eq. (8) see Ref. [16] and the citations therein) ... [Pg.304]

If the binary descriptors for the objects s and t are substructure keys the Hamming distance Eq. (6)) gives the number of different substructures in s and t (components that are 1 in either s or but not in both). On the other hand, the Tanimoto coefficient (Eq. (7)) is a measure of the number of substructures that s and t have in common (i.e., the frequency a) relative to the total number of substructures they could share (given by the number of components that are 1 in either s or t). [Pg.407]

Tanimoto similarity coefficient Also known as the Jaccard c. E =1 W/B ... [Pg.693]

Another important feature of the Tanimoto coefficient when used with bitstring data is that small molecules, which tend to have fewer bits set, will have only a small number of bits in common and so can tend to give inherently low similarity values. This can be important when selecting dissimilar compounds, as a bias towards small molecules can result. [Pg.693]

The similarity matrices are constructed by one in-house program developed inside CHIRBASE using the application development kit of ISIS. They contain the similarity coefficients as expressed by the Tanimoto method. In ISIS, the Tanimoto coefficients are calculated from a set of binary descriptors or molecular keys coding the structural fragments of the molecules. [Pg.113]

These structural key descriptors incorporate a remarkable amount of pertinent molecular arrangements covering each type of interaction involved in ligand-receptor bindings [26]. Since every structure in a database is represented by one or more of the 960 key codes available in ISIS, suppose that two molecules include respectively A and B key codes, then the Tanimoto coefficient is given by ... [Pg.113]

Figure 8.3 Example of a 2D similarity search, showing a query molecule and five of its nearest neighbors. The similarity measure for the search is based on 2D fragment bit-strings and the Tanimoto coefficient.

Clustering is the process of dividing a collection of objects into groups (or clusters) so that the objects within a cluster are highly similar whereas objects in different clusters are dissimilar [41]. When applied to databases of compounds, clustering methods require the calculation of all the pairwise similarities of the compounds with similarity measures such as those described previously, for example, 2D fingerprints and the Tanimoto coefficient. [Pg.200]

The Jaccard similarity coefficient is then computed with eq. (30.13), where m is now the number of attributes for which one of the two objects has a value of 1. This similarity measure is sometimes called the Tanimoto similarity. The Tanimoto similarity has been used in combinatorial chemistry to describe the similarity of compounds, e.g. based on the functional groups they have in common [9]. Unfortunately, the names of similarity coefficients are not standard, so that it can happen that the same name is given to different similarity measures or more than one name is given to a certain similarity measure. This is the case for the Tanimoto coefficient (see further). [Pg.65]

The first of these two is also called the Tanimoto coefficient by some authors. It can be verified that, since distance = 1 - similarity, this is equal to the simple matching coefficient. Clearly, confusion is possible and authors using a certain distance or similarity measure should always define it unambiguously. [Pg.66]

Similarity Comparison of molecules using molecular descriptors and a measure of similarity, for example a 2D fingerprint and the Tanimoto coefficient... [Pg.32]

Chemical structures are often characterized by binary vectors in which each vector component (with value 0 or 1) indicates absence or presence of a certain substructure (binary substructure descriptors). An appropriate and widely used similarity measure for such binary vectors is the Tanimoto index (Willett 1987), also called Jaccard similarity coefficient (Vandeginste et al. 1998). Let xA and xB be binary vectors with m components for two chemical structures A and B, respectively. The Tanimoto index fAB is given by... [Pg.269]

Several other approaches with the goal of simultaneous optimization of several criteria have been reported. One such approach is the generation of a library that is both focused and diverse via the dual fingerprint metric described by Bajorath [94], In this method, individual compounds are randomly generated and their similarity to a known inhibitor is evaluated by comparison of their minifingerprints [95] using the Tanimoto coefficient. Those molecules that are above a similarity threshold are then... [Pg.184]

Table 1 MDDR Activity Classes used in this Study. MPS is the mean pair-wise similarity, computed using the Tanimoto coefficient and Unity 2D fingerprints, averaged over all of the molecules in an activity class.

The most widely used similarity measure by far is the Tanimoto similarity coefficient SXan, which is given in set-theoretic language as (cf. Eq. 2.13 for the graph-theoretical case)... [Pg.11]

Distances in these spaces should be based upon an Zj or city-block metric (see Eq. 2.18) and not the Z2 or Euclidean metric typically used in many applications. The reasons for this are the same as those discussed in Subheading 2.2.1. for binary vectors. Set-based similarity measures can be adapted from those based on bit vectors using an ansatz borrowed from fuzzy set theory (41,42). For example, the Tanimoto similarity coefficient becomes... [Pg.17]

The inner-product terms (, is the labeled graph corresponding to Zth basis fragment, vA is the labeled graph corresponding to molecule A, and STan(G ,GA) is the chemical graph-theoretical Tanimoto similarity coefficient. [Pg.26]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...