Tanimoto Dissimilarity

Snb-stmctnre diversity is most easily defined using metrics such as the Tanimoto Dissimilarity Index. These metrics are based on linear bitmaps (fingerprints) generated from the molecnlar fragments or compound sub-structures (Figure 3). This approach has been developed extensively by Daylight Chemical Information Systems. ... [Pg.119]

Figure 9.2 shows plots of the structural dissimilarities, AS, and biological activity differences, AA, between the seeds and the array members for three arrays in Project A. The shape of an array plot depends on (1) the choice of the descriptor/fmgerprint and (2) the choice of the seed compound used for the dissimilarity and activity difference comparisons, hi this case, pairwise structural dissimilarities were calculated using Tanimoto dissimilarity and MolPrint 2D fingerprints [14] using Pipeline Pilot [15] and the corresponding activity differences are in the primary assay pIC values. [Pg.182]

Since the denominators, which normalize the similarity and dissimilarity values, in Eqs. (1.8) and (1.21), respectively, are the same for both coefficients, it is their numerators that provide the interpretation for these coefficients. In the case of Tanimoto similarity, the numerator, N j, gives the number of features in common to both molecules, while the numerator for Tanimoto dissimilarity gives the number of features unique to M, - N, j, and the number of features unique to M, ... [Pg.14]

It can also be shown that Tanimoto dissimilarity formally satisfies the three properties of an abstract distance [59]. In fact, the numerator is identical to the Hamming distance between two finite, classical sets [60] and the denominator ensures that the dissimilarity values satisfy 0 < < 1, as required by Eq. (1.20). [Pg.14]

Vector-based dissimilarity coefficients can also be defined in analogy to those given in general for FP-based dissimilarities in Eq. (1.19). Tanimoto dissimilarities are... [Pg.21]

Since the set of cells in a cell-based CS are analogous to binaiy structural FPs, other similarity measures such as those based on the Tanimoto or Dice similarity coefficients given in Eqs. (1.8) and (1.9) can be used. Alternatively, the corresponding dissimilarity coefficients given in Eqs. (1.21) and (1.22) also can be used. As noted in Sect. 1.2.1.3, the numerator of the Tanimoto dissimilarity coefficient is just the Hamming distance, which is a measure of the number of differences between the two DB FPs. [Pg.58]

We should mention here that using just similarity or dissimilarity in a similarity measure might be misleading. Therefore, some composite measures using both similarity and dissimilarity have been developed. These are the Hamann and the Yule measures (Table 6-2). A simple product of (1 - Tanimoto) and squared Eucli-... [Pg.304]

Asymmetry in a similarity measure is the result of asymmetrical weighing of a dissimilarity component - multiplication is commutative by definition, difference is not. By weighing a and h, one obtains asymmetric similarity measures, including the Tversky similarity measure c j aa 4- fih + c), where a and fi are user-defined constants. The Tversky measure can be regarded as a generalization of the Tanimoto and Dice similarity measures like them, it does not consider the absence matches d. A particular case is c(a + c), which measures the number of common features relative to all the features present in A, and gives zero weight to h. [Pg.308]

Another important feature of the Tanimoto coefficient when used with bitstring data is that small molecules, which tend to have fewer bits set, will have only a small number of bits in common and so can tend to give inherently low similarity values. This can be important when selecting dissimilar compounds, as a bias towards small molecules can result. [Pg.693]

Clustering is the process of dividing a collection of objects into groups (or clusters) so that the objects within a cluster are highly similar whereas objects in different clusters are dissimilar [41]. When applied to databases of compounds, clustering methods require the calculation of all the pairwise similarities of the compounds with similarity measures such as those described previously, for example, 2D fingerprints and the Tanimoto coefficient. [Pg.200]

Fig. 3. Coverage of chemistry space by four overlapping sublibraries. (A) Different diversity libraries cover similar chemistry space but show little overlap. This shows three libraries chosen using different dissimilarity measures to act as different representations of the available chemistry space. The compounds from these libraries are presented in this representation by first calculating the intermolecular similarity of each of the compounds to all of the other compounds using fingerprint descriptors and the Tanimoto similarity index. Principal component analysis was then conducted on the similarity matrix to reduce it to a series of principal components that allow the chemistry space to be presented in three dimensions.

D-score Dissimilarity score based on Tanimoto-like similarity among compounds in the set. This will ensure that compounds from different structural classes are prioritized higher. [Pg.115]

The D-score is computed using the maximum dissimilarity algorithm of Lajiness (20). This method utilizes a Tanimoto-like similarity measure defined on a 360-bit fragment descriptor used in conjunction with the Cousin/ChemLink system (21). The important feature of this method is that it starts with the selection of a seed compound with subsequent compounds selected based on the maximum diversity relative to all compounds already selected. Thus, the most obvious seed to use in the current scenario is the compound that has the best profile based on the already computed scores. Thus, one needs to compute a preliminary consensus score based on the Q-score and the B-score using weights as defined previously. To summarize this, one needs to... [Pg.121]

In order to apply the SA protocol, one of the keys is to design a mathematical function that adequately measures the diversity of a subset of selected molecules. Because each molecule is represented by molecular descriptors, geometrically it is mapped to a point in a multidimensional space. The distance between two points, such as Euclidean distance, Tanimoto distance, and Mahalanobis distance, then measures the dissimilarity between any two molecules. Thus, the diversity function to be designed should be based on all pairwise distances between molecules in the subset. One of the functions is as follows ... [Pg.382]

Our approach to selecting a diverse subset is based on utilizing a minimum similarity between each molecule and all other molecules in the virtual library. For the 2-D fingerprints, the similarity is measured by a Tanimoto coefficient20 which measures similarity on a pair-wise basis. A Tanimoto coefficient for any pair of molecular structures lies in the range of zero (dissimilar) to one (similar). It is defined as the ratio of the number of common bits (in this case molecular fragments) set in two molecules divided by the number of bits set in either. [Pg.229]

Brown,22 following earlier work by Willett,23 published results that indicated that 85% of compounds having a Tanimoto coefficient of 0.85 to any active compounds are themselves active. Taylor24 in a simulation study adopted a Tanimoto coefficient of 0.80 as a threshold to distinguish similar from dissimilar compounds and was more recently adopted in related studies by Delaney.25... [Pg.231]

The similarity, or dissimilarity, of a pair of compounds is quantified using a similarity or distance coefficient that is applied to the descriptor representation of the molecules. As mentioned, the Tanimoto coefficient is commonly used when the molecules are represented by binary bitstrings. The Tanimoto similarity between compounds A and B, SAB is ... [Pg.350]

Figure 5.14 Similarity/dissimilarity comparison of two molecules the Tanimoto index.

Tanimoto index > 0.9 but may be very different in terms of activity (chemically similar, biologically diverse), while completely different strucmres are known to have the same biological activity (chemically diverse, biologically similar). This intrinsic drawback to the computational screening of virtual libraries should always be considered when interpreting screening results of a computationally designed library, and real data should be used to refine any virtual SAR information based on chemical similarity or dissimilarity. [Pg.189]

Equation 1 Calculation of Tanimoto coefficients for similarity and dissimilarity... [Pg.120]

With the reference compound as the starting point, pair-wise comparison of every compound with the reference structure identifies the most dissimilar compound, which is added to the collection. The software calculates the mean Tanimoto coefficient from the growing diverse collection (jc = Zx/n The sum of all coefficients divided by the number of compounds in the growing collection), and the compound with the coefficient furthest from that mean is added. This process is repeated until the collection either reaches the desired size or no further compounds can be added without, for example, exceeding some lower threshold for the difference in coefficients. [Pg.120]

There are many different types of similarity indexes, including the association coefficients (e.g., Tanimoto coefficient [27], Jaccard coefficient [38], Hodgkin-Richards coefficient [39,40]), the correlation coefficients or cosinelike indexes, and the distance coefficients or dissimilarity indexes (e.g., Hamming distance) [26],... [Pg.765]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...