Binary fingerprints, molecular similarity

A special case of cell-based methods is a diversity measure proposed for binary fingerprints. Unlike continuous descriptors, binary descriptors such as structural keys and hashed fingerprints can be compared using fast binary operations to give rapid estimates of molecular similarity, diversity, and complementarity. The most common example of a diversity measure applied to binary descriptors is the binary union (inclusive or ). This can be exploited in a number of different ways elegant examples can be found in the following references. ... [Pg.142]

Analysis of molecular similarity is based on the quantitative determination of the overlap between fingerprints of the query structure and all database members. As descriptors of a given molecule can be considered as a vector of real or binary attributes, most of the similarity measures are derived as vectorial distances. Tanimoto and Cosine coefficients are the most popular measures of similarity.Definitions of similarity metrics are collected in Table 3. [Pg.4017]

In addition to binary descriptors, molecular holograms are also useful for similarity searches. Similar to fingerprint, hologram is a vector that contains numerical... [Pg.4017]

Godden, J.W., Xue, L. and Bajorath, J. (2000). Combinatorial Preferences Affect Molecular Similarity/Diversity Calculations Using Binary Fingerprints and Tanimoto Coefficients. J.Chem. lnf.Comput.Sci., 40,126-134. [Pg.572]

Godden JW, Xue L, Bajorath J. Combinatorial preferences affect molecular similarity/ diversity calculations using binary fingerprints and Tanimoto coefficients. J Chem Inf Comput Sci 2000 40 163-166. [Pg.394]

The first application using MDS in molecular diversity analysis was introduced by a group at Chiron as a means of reducing the enormous dimensionality of binary chemical descriptors They found that 2048-bit Daylight fingerprints associated with 721 commercially available primary amines could be reduced to only five dimensions that reproduced all 260,000 original dissimilarities with a standard deviation of only 10%. Similarly, only seven dimensions were required to reduce the 642,000 pairwise similarities among a set of 1133 carboxylic acids and acid chlorides to the same standard deviation. [Pg.150]

Figure 5.4. Three- and four-point (triplet/quartet) pharmacophore fingerprint creation. Assignment is often binary (on or off), although a count can be kept, and has been used in more recent studies. The large difference in bin numbers between three- and four-point pharmacophores provides additional shape information, thus increasing molecular separation in similarity and diversity studies.

Molecular holograms are thus very similar to hashed fingerprints, but rather than using a binary bit string containing either 0 or 1 in each bin, the bins of molecular holograms contain information about the number of fragments hashed to each bin. [Pg.766]

Cluster analysis was considered in our discussion of conformational analysis (see Section 9.13) for compound selection one would typically want to select a representative molecule or molecules from each cluster. A practical consideration when deciding which cluster analysis method to use is that for large numbers of molecules some algorithms may not be feasible because they require an excessive amount of memory or may have a long execution time. Another consideration with cluster analysis (and with some of the other methods that we will discuss) is the need to calculate the distance between each pair of molecules from the vector of descriptors (or from their scaled derivatives or from a set of principal components, if these are being used). For binary descriptors such as molecular fingerprints this distance is often given by 1 — S, where S is the similarity coefficient (Table 12.3). [Pg.682]

As mentioned earlier, there are a number of similarity coefficients and distance matrices. Most of the coefficients can be calculated by two different formnlas one is used for continuous variables, whereas the other one is used for binary variables or dichotomous variables. Similarity can be better defined when continnons variables are used as descriptors rather that the ON OFF bits of fingerprints. The descriptors, on the other hand, are basically molecular properties, which have a wide range of values. So, they are normalized in the range of zero to one. [Pg.53]

Clearly, Equation 15.5.1 only strictly applies to set-based representations, although closely related asymmetric similarity functions can also be defined for graph-, vector-, and function-based representations (see Table 15.3 and Table 15.4 and the associated discussions in Sections 15.4.2 and 15.4.4). Because most applications employ set-based similarity functions and binary molecular fingerprints, the discussion in this section focuses on this category of MSA. Equivalent analyses can, however, be carried out with respect to other similarity measures (see e.g., [46]). [Pg.366]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...