Tanimoto similarity analysis

Fig. 3. Coverage of chemistry space by four overlapping sublibraries. (A) Different diversity libraries cover similar chemistry space but show little overlap. This shows three libraries chosen using different dissimilarity measures to act as different representations of the available chemistry space. The compounds from these libraries are presented in this representation by first calculating the intermolecular similarity of each of the compounds to all of the other compounds using fingerprint descriptors and the Tanimoto similarity index. Principal component analysis was then conducted on the similarity matrix to reduce it to a series of principal components that allow the chemistry space to be presented in three dimensions.

A principal component analysis (PCA) using simple molecular descriptors showed that the training and test sets overlapped. Focusing on compounds with a Tanimoto similarity greater than 0.7 resulted in a test set of 28 compounds, which had Matthews correlation coefficient and concordance statistics that... [Pg.332]

Fig. 1.9 Depictions of CSs generated from Tanimoto similarity coefficients computed with respect to binary FPs associated with four different types of descriptors—APF, MACCS key, TGD, and piDAPH4. (Adapted from Medina-Franco Maggiora, Molecular Similarity Analysis [10])...

If the donor and acceptor molecules are unable to rotate in the solvent during the donor fluorescence time, the value of R0 considered above is too large. Steinberg [139] has analysed Forster kinetics in this limit. Allinger and Blumen [153] have developed a more detailed analysis of dipole—dipole energy transfer from excited donors to acceptors in liquids and obtained essentially similar results to those of Yokota and Tanimoto. [Pg.85]

The choice of representation, of similarity measure and of selection method are not independent of each other. For example, some types of similarity measure (specifically the association coefficients as exemplified by the well-known Tanimoto coefficient) seem better suited than others (such as Euclidean distance) to the processing of fingerprint data [12]. Again, the partition-based methods for compound selection that are discussed below can only be used with low-dimensionality representations, thus precluding the use of fingerprint representations (unless some drastic form of dimensionality reduction is performed, as advocated by Agrafiotis [13]). Thus, while this chapter focuses upon selection methods, the reader should keep in mind the representations and the similarity measures that are being used recent, extended reviews of these two important components of diversity analysis are provided by Brown [14] and by Willett et al. [15]. [Pg.116]

Similarity Search. A type of "fuzzy" structure searching in which molecules are compared with respect to the degree of overlap they share in terms of topological and/or physicochemical properties. Topological descriptors usually consist of substructure keys or fingerprints, in which case a similarity coefficient like the Tanimoto coefficient is computed. In the case of calculated properties, a simple correlation coefficient may be used. The similarity coefficient used in a similarity search can also be used in various types of cluster analysis to group similar structures. [Pg.410]

Analysis of molecular similarity is based on the quantitative determination of the overlap between fingerprints of the query structure and all database members. As descriptors of a given molecule can be considered as a vector of real or binary attributes, most of the similarity measures are derived as vectorial distances. Tanimoto and Cosine coefficients are the most popular measures of similarity.Definitions of similarity metrics are collected in Table 3. [Pg.4017]

A third definition, with a number of practical advantages, was suggested by a quantity utilised in statistical analysis. Our Tanimoto-like similarity index takes the form ... [Pg.100]

Molecular similarity The degree of similarity between molecules, although quantitatively measurable, very much depends on what molecular features are used to establish the degree of similarity. One of the many comparators is the electron density of a pair of molecules. Other comparators include electrostatic potentials, reactivity indices, hydrophobicity potentials, molecular geometry such as distances and angles between key atoms, solvent accessible surface area, etc. It is an open question as to how much or what part(s) of the molecular structure is to be compared. The Tanimoto coefficient which compares dissimilarity to similarity is often used in molecular diversity analysis. [Pg.759]

The analysis in the Appendix provides at least some rationale for why the IDF weighting scheme should give better results than a simple binary weighting of the query screens in a similarity searching system based on fragment co-occurrence data. However, the experimental results that we have obtained do not provide unequivocal support for this finding. The most widely used similarity measure is the Binary/Tanimoto combination this performed well in both sets of analyses and we have accordingly chosen to use this for the more extended studies reported in the remainder of this paper. [Pg.414]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...