The Tanimoto coefficient or Jaccard similarity coefficient is a statistic rrsed for comparing the similarity and dissimilarity of stractmes [74]. It is one of the most commonly used similarity coefficient used in chemoinformatics, because it allows rapid calculation due to its simple nature and absence of complex mathematical operators. In general, the complement of the Tanimoto coefficient does not follow the triangular inequality. The Tanimoto coefficient is calculated as follows [Pg.53]

Tanimoto similarity coefficient Also known as the Jaccard c. E =1 W/B [Pg.693]

A weighted version of the Jaccard-Tanimoto association measure is the Tversky similarity coefficient, given as [Tversky, 1977] [Pg.698]

There are many different types of similarity indexes, including the association coefficients (e.g., Tanimoto coefficient [27], Jaccard coefficient [38], Hodgkin-Richards coefficient [39,40]), the correlation coefficients or cosinelike indexes, and the distance coefficients or dissimilarity indexes (e.g., Hamming distance) [26], [Pg.765]

However, the limitation of this measure is that it accounts for both presences and absences equally (Tan et al., 2006). For sparse data, the similarity result from SMC will be dominated by the number of 0-0 matches. As a result, the Jaccard coefficient [Eq. (5.6)] was proposed to calculate the similarity coefficient based on the number of shared attributes while ignoring 0-0 matches [Pg.94]

Chemical structures are often characterized by binary vectors in which each vector component (with value 0 or 1) indicates absence or presence of a certain substructure (binary substructure descriptors). An appropriate and widely used similarity measure for such binary vectors is the Tanimoto index (Willett 1987), also called Jaccard similarity coefficient (Vandeginste et al. 1998). Let xA and xB be binary vectors with m components for two chemical structures A and B, respectively. The Tanimoto index fAB is given by [Pg.269]

Consequently, we can construct a similarity measure intuitively in the following way all matches c -i- d relative to all possibilities, i.e., matches plus mismatches (c+ d) + (a -I- h), yields (c -t- d) / a + b+ c + d), which is called the simple matching coefficient [18], and equal weight is given to matches and mismatches. (Normalized similarity measures are called similarity indices or coefficients see, e.g.. Ref. [19].) When absence of a feature in both objects is deemed to convey no information, then d should not occur in a similarity measure. Omitting d from the above similarity measure, one obtains the Tanimoto (alias Jaccard) similarity measure (Eq. (8) see Ref. [16] and the citations therein) [Pg.304]

© 2019 chempedia.info