Tanimoto metric

Although for binary data other distance, metrics are in general more appropriate (e.g., Tanimoto metrics), for simplicity we can compute the standardized (to the mean and standard deviation of the distribution) Pearson correlation matrix, which contains the correlation coefficients between each of the five assays. These data can then be used to duster the chemicals based on their correlation as a metric of similarity The groupings depicted in Fig. 6-14(b)... [Pg.332]

The pragmatic beauty of the chemical fingerprint is that the more common features of two molecules that there are, the more common bits are set. The mathematic approach used to translate the fingerprint comparison data into a measure of similarity tunes the molecular comparison [5]. The Tanimoto similarity index works well when a relatively sparse fingerprint is used and when the molecules to be compared are broadly comparable in size and complexity [5]. If the nature of the molecules or the comparison desired is not adequately met by the Tanimoto index, multiple other indices are available to the researcher. For example, the Daylight software offers the user over ten similarity metrics, and the Pipeline Pilot as distributed offers at least three. Some of these metrics (e.g., Tversky, Cosine) offer better behavior if the query molecule is significantly smaller than the molecule compared to it. [Pg.94]

Several other approaches with the goal of simultaneous optimization of several criteria have been reported. One such approach is the generation of a library that is both focused and diverse via the dual fingerprint metric described by Bajorath [94], In this method, individual compounds are randomly generated and their similarity to a known inhibitor is evaluated by comparison of their minifingerprints [95] using the Tanimoto coefficient. Those molecules that are above a similarity threshold are then... [Pg.184]

Distances in these spaces should be based upon an Zj or city-block metric (see Eq. 2.18) and not the Z2 or Euclidean metric typically used in many applications. The reasons for this are the same as those discussed in Subheading 2.2.1. for binary vectors. Set-based similarity measures can be adapted from those based on bit vectors using an ansatz borrowed from fuzzy set theory (41,42). For example, the Tanimoto similarity coefficient becomes... [Pg.17]

One or more lead molecules may be used as a focusing target. Similarity metrics include Daylight fingerprint Tanimoto similarity. The penalty score for each compound in the library is defined as the distance between it and the most similar lead molecule. The penalty score for the library is the average of the individual compound penalty scores. QSAR predictions and docking scores can also be used in this term. [Pg.385]

In the NN method, the property F of the target compound is calculated as an average (or weighted average) of that for its NN in the space of descriptors selected for the model. Different metrics (Euclidian distances, Tanimoto similarity coefficients, etc.), can be used to identify the neighbors. Their number k is optimized using a cross-validation procedure for the training set. [Pg.325]

The values of these similarity coefficients range from zero (i.e., no overlap no similarity) to one (i.e., complete overlap identical or very similar molecules). In chemoinformatics, the most widely used metric is the Tanimoto coefficient. [Pg.8]

For presence or absence of features in the molecules, represented by binary bit strings x and y as descriptors, the Tanimoto coefficient is a popular metric for similarity ... [Pg.82]

Now consider d(a, b) to be a generic distance metric of which Tanimoto, Euclidean, and Mahalanobis are three cases. Then, the distance between molecule a and the set of molecules B is defined as follows,... [Pg.82]

Snb-stmctnre diversity is most easily defined using metrics such as the Tanimoto Dissimilarity Index. These metrics are based on linear bitmaps (fingerprints) generated from the molecnlar fragments or compound sub-structures (Figure 3). This approach has been developed extensively by Daylight Chemical Information Systems. ... [Pg.119]

With respect to metric properties, the Tanimoto coefficient obeys all four properties if dichotomous variables are used. Complement does not obey the triangular inequality in the Dice coefficient. [Pg.139]

Analysis of molecular similarity is based on the quantitative determination of the overlap between fingerprints of the query structure and all database members. As descriptors of a given molecule can be considered as a vector of real or binary attributes, most of the similarity measures are derived as vectorial distances. Tanimoto and Cosine coefficients are the most popular measures of similarity.Definitions of similarity metrics are collected in Table 3. [Pg.4017]

The chemicals stored in the inventory can be searched by exact structure, substructure, or similarity [26], Similarity searching aims at retrieving compounds that are similar to a query compound by one or more measures of similarity. A set of structural features of the target molecule is compared with those of each chemical in the database, generating a similarity measure by a chosen metric such as the Tanimoto coefficient [27]. More details about chemical similarity are given below in relation to the chemical similarity tool. [Pg.761]

Given a table of redundant properties, one could calculate dissimilarities as Euclidean distances and use MDS instead of PCA. Whereas the results would be similar, this is usually wasteful, because the number of molecules is typically much greater than the number of properties. However, if the similarities are best calculated from a nonlinear function of the properties, such as Tanimoto coefficients computed from two bit strings, the results would not be similar and nonlinear MDS should be used. One then gets back a set of latent properties (dimensions) for which Euclidean distance approximates the desired similarities. Thus, a simple rule of thumb is For redundant properties as input use PCA, for metric similarities as input use classical MDS, for nonmetric similarities use numerically refined MDS. [Pg.79]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...