Jaccard coefficient

For binary descriptors, the most commonly used distance function is the Tanimoto (or Jaccard) coefficient given in (5). Here x and y are two binary sets (encoded molecules). AND is the bitwise and operation (a bit in the result is set if both corresponding bits in the two operands are set), and lOR the bitwise inclusive or operation (a bit in the result is set if either corresponding bits in the two operands is set). The result, T, is a measure of the number of features shared by the two molecules relative to the ones they could have in common. [Pg.139]

Derived from the Jaccard coefficient applied to 3D distributed properties in the form ... [Pg.402]

There are many different types of similarity indexes, including the association coefficients (e.g., Tanimoto coefficient [27], Jaccard coefficient [38], Hodgkin-Richards coefficient [39,40]), the correlation coefficients or cosinelike indexes, and the distance coefficients or dissimilarity indexes (e.g., Hamming distance) [26],... [Pg.765]

Tanimoto similarity coefficient Also known as the Jaccard coefficient Complement equals the Soergel distance for dichotomous data r X)/ = 1 s ... [Pg.677]

However, the limitation of this measure is that it accounts for both presences and absences equally (Tan et al., 2006). For sparse data, the similarity result from SMC will be dominated by the number of 0-0 matches. As a result, the Jaccard coefficient [Eq. (5.6)] was proposed to calculate the similarity coefficient based on the number of shared attributes while ignoring 0-0 matches ... [Pg.94]

Figure 30.1 Cluster analysis on presence-absence of genera UPGMA, Jaccard coefficient used. Data from Potter and Boucot (1992 table I) and from the Bardahessiagh Formation (Candela, 2000) see Potter and Boucot (1992) for key to the assemblages names.

Figure 30.5 Q-mode (collections) and R-mode (genera) cluster analysis, using the Jaccard coefficient, of abundance data from Patzkowsky s (1995) field data and field data from the Mitchell Siltstone Member (Candela, 2000) key for taxa as on Fig. 30.2.

Figure 36.2 Dendrograms of Sakmarian and Roadian-Wordian OGUs derived from UPGMA respectiveiy based on the Simpson index and on the Jaccard coefficient. The cophenetic correiation vaiue of the Sakmarian dendrogram is 0.828, whereas that of the Roadian-Wordian dendrogram is 0.899. Dissimiiarity vaiues are reported on the scaie bars.

Consequently, we can construct a similarity measure intuitively in the following way all matches c -i- d relative to all possibilities, i.e., matches plus mismatches (c+ d) + (a -I- h), yields (c -t- d) / a + b+ c + d), which is called the simple matching coefficient [18], and equal weight is given to matches and mismatches. (Normalized similarity measures are called similarity indices or coefficients see, e.g.. Ref. [19].) When absence of a feature in both objects is deemed to convey no information, then d should not occur in a similarity measure. Omitting d from the above similarity measure, one obtains the Tanimoto (alias Jaccard) similarity measure (Eq. (8) see Ref. [16] and the citations therein) ... [Pg.304]

The Jaccard similarity coefficient is then computed with eq. (30.13), where m is now the number of attributes for which one of the two objects has a value of 1. This similarity measure is sometimes called the Tanimoto similarity. The Tanimoto similarity has been used in combinatorial chemistry to describe the similarity of compounds, e.g. based on the functional groups they have in common [9]. Unfortunately, the names of similarity coefficients are not standard, so that it can happen that the same name is given to different similarity measures or more than one name is given to a certain similarity measure. This is the case for the Tanimoto coefficient (see further). [Pg.65]

Chemical structures are often characterized by binary vectors in which each vector component (with value 0 or 1) indicates absence or presence of a certain substructure (binary substructure descriptors). An appropriate and widely used similarity measure for such binary vectors is the Tanimoto index (Willett 1987), also called Jaccard similarity coefficient (Vandeginste et al. 1998). Let xA and xB be binary vectors with m components for two chemical structures A and B, respectively. The Tanimoto index fAB is given by... [Pg.269]

A weighted version of the Jaccard-Tanimoto association measure is the Tversky similarity coefficient, given as [Tversky, 1977]... [Pg.698]

It must be noted that comparing distances for binary and continuous variables, the Hamming distance coincides vith the Manhattan distance, square root Hamming distance is the Euclidean distance, Tanimoto distance coincides vdth average Manhattan distance and squared Tanimoto vith the average Euclidean distance. Moreover, the Watson nonmetric distance corresponds to the Lance-Williams distance and is the complement of the Sorenson coefficient the Soergel binary distance corresponds to the Soergel distance and is the complement of the Jaccard/ Tanimoto coefficient. [Pg.700]

Figure 8. Concentration dependence of the distribution coefficient for NHs, HF, and NH F after Jaccard and Levi (84). Two values, for 2.5 X lO M HF and NHj F, respectively, measured by G. W. Gross, are shown for comparison. The one for NH F lies tolerably close to the extension of Jaccard and Levis curve the HF point is about three orders of magnitude lower. See text...

Distribution Coefficient Constant. Investigations by Jaccard Levi (84) and by G. W. Gross (68) indicate that k is more or less strongly concentration-dependent (Figure 8). Since D is also dependent on the concentration, the effects cannot be separated. At suflBciently high freezing rates, however, a quasisteady interface concentration is established quickly so that k and D may be considered as changing slowly. [Pg.49]

The Tanimoto coefficient or Jaccard similarity coefficient is a statistic rrsed for comparing the similarity and dissimilarity of stractmes [74]. It is one of the most commonly used similarity coefficient used in chemoinformatics, because it allows rapid calculation due to its simple nature and absence of complex mathematical operators. In general, the complement of the Tanimoto coefficient does not follow the triangular inequality. The Tanimoto coefficient is calculated as follows ... [Pg.53]

Two distance measures were used in this study, based on the work of Holliday et al. [29] Euclidean and Soergel. The Soergel distance is the complement of the Tanimoto (or Jaccard) association coefficient [30, 31]. [Pg.148]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...