Jaccard similarity coefficient

The Jaccard similarity coefficient is then computed with eq. (30.13), where m is now the number of attributes for which one of the two objects has a value of 1. This similarity measure is sometimes called the Tanimoto similarity. The Tanimoto similarity has been used in combinatorial chemistry to describe the similarity of compounds, e.g. based on the functional groups they have in common [9]. Unfortunately, the names of similarity coefficients are not standard, so that it can happen that the same name is given to different similarity measures or more than one name is given to a certain similarity measure. This is the case for the Tanimoto coefficient (see further). [Pg.65]

Chemical structures are often characterized by binary vectors in which each vector component (with value 0 or 1) indicates absence or presence of a certain substructure (binary substructure descriptors). An appropriate and widely used similarity measure for such binary vectors is the Tanimoto index (Willett 1987), also called Jaccard similarity coefficient (Vandeginste et al. 1998). Let xA and xB be binary vectors with m components for two chemical structures A and B, respectively. The Tanimoto index fAB is given by... [Pg.269]

The Tanimoto coefficient or Jaccard similarity coefficient is a statistic rrsed for comparing the similarity and dissimilarity of stractmes [74]. It is one of the most commonly used similarity coefficient used in chemoinformatics, because it allows rapid calculation due to its simple nature and absence of complex mathematical operators. In general, the complement of the Tanimoto coefficient does not follow the triangular inequality. The Tanimoto coefficient is calculated as follows ... [Pg.53]

Consequently, we can construct a similarity measure intuitively in the following way all matches c -i- d relative to all possibilities, i.e., matches plus mismatches (c+ d) + (a -I- h), yields (c -t- d) / a + b+ c + d), which is called the simple matching coefficient [18], and equal weight is given to matches and mismatches. (Normalized similarity measures are called similarity indices or coefficients see, e.g.. Ref. [19].) When absence of a feature in both objects is deemed to convey no information, then d should not occur in a similarity measure. Omitting d from the above similarity measure, one obtains the Tanimoto (alias Jaccard) similarity measure (Eq. (8) see Ref. [16] and the citations therein) ... [Pg.304]

Tanimoto similarity coefficient Also known as the Jaccard c. E =1 W/B ... [Pg.693]

A weighted version of the Jaccard-Tanimoto association measure is the Tversky similarity coefficient, given as [Tversky, 1977]... [Pg.698]

However, the limitation of this measure is that it accounts for both presences and absences equally (Tan et al., 2006). For sparse data, the similarity result from SMC will be dominated by the number of 0-0 matches. As a result, the Jaccard coefficient [Eq. (5.6)] was proposed to calculate the similarity coefficient based on the number of shared attributes while ignoring 0-0 matches ... [Pg.94]

There are many different types of similarity indexes, including the association coefficients (e.g., Tanimoto coefficient [27], Jaccard coefficient [38], Hodgkin-Richards coefficient [39,40]), the correlation coefficients or cosinelike indexes, and the distance coefficients or dissimilarity indexes (e.g., Hamming distance) [26],... [Pg.765]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...