Similarity index distance

The technology of proximity indices has been available and in use for some time. There are two general types of proximity indices (Jain and Dubes, 1988) that can be distinguished based on how changes in similarity are reflected. The more closely two patterns resemble each other, the larger their similarity index (e.g., correlation coefficient) and the smaller their dissimilarity index (e.g., Euclidean distance). A proximity index between the ith and th patterns is denoted by D(i, j) and obeys the following three relations ... [Pg.59]

Fig. 1. An example of two hydrogen-suppressed graphs G1 G2 and a common substructure CSIG,, G2) and the maximum common substructure MCS(G1 G2) are shown above. The Tanimoto similarity index and the distance between the two chemical graphs are computed below.

Any type of selected descriptor will provide a more or less complex characterization of each virtual library component. The use of similarity indices offers a straightforward method to evaluate similarities between virtual compounds. These indices use a bit-string representation for any descriptor (distances, fingerprints, pharmacophores, and so on) and, by simply counting the presence or the absence of specific bins and comparing the bit strings of virtual compounds, provide a numerical similarity index. The formula for the commonly used Tanimoto similarity index (71, 43), which can readily be transformed into the complementary diversity index, is reported in Fig. 5.14... [Pg.183]

There are many different types of similarity indexes, including the association coefficients (e.g., Tanimoto coefficient [27], Jaccard coefficient [38], Hodgkin-Richards coefficient [39,40]), the correlation coefficients or cosinelike indexes, and the distance coefficients or dissimilarity indexes (e.g., Hamming distance) [26],... [Pg.765]

Kotani, T. and Higashiura, K (2002) Rapid evaluation of molecular shape similarity index using pairwise calculation of the nearest atomic distances. [Pg.1095]

For a given hexapeptide, similar peptides are generated by allowing for all possible substitutions of amino acids within a similarity distance threshold of 0.3 according to Figure 17.1 a. To obtain a measure for the similarity between two hexapeptides, we define the rms difference of the six distances between amino acids in respective positions. This rms distance ranges between 0.0 for identical peptides and the threshold distance if all amino acids are replaced by congeners at the threshold limit. For practical purposes we define a peptide similarity index which is 1.0 for identical or nearly identical peptides (rms distances <0.1) and decreases linearly to 0.0 as the rms distance increases to the threshold limit. [Pg.691]

Peptide Similar peptides From protein Peptide distance Similarity index Secondary structure Folding pattern Pattern weight Fold score... [Pg.693]

Molecular Similarity and QSAR. - In a first contribution on the design of a practical, fast and reliable molecular similarity index Popelier107 proposed a measure operating in an abstract space spanned by properties evaluated at BCPs, called BCP space. Molecules are believed to be represented compactly and reliably in BCP space, as this space extracts the relevant information from the molecular ab initio wave functions. Typical problems of continuous quantum similarity measures are hereby avoided. The practical use of this novel method is adequately illustrated via the Hammett equation for para- and me/a-substituted benzoic acids. On the basis of the author s definition of distances between molecules in BCP space, the experimental sequence of acidities determined by the well-known a constant of a set of substituted congeners is reproduced. Moreover, the approach points out where the common reactive centre of the molecules is. The generality and feasibility of this method will enable predictions in medically related Quantitative Structure Activity Relationships (QSAR). This contribution combines the historically disparate fields of molecular similarity and QSAR. [Pg.150]

It is very clear that the larger the hidex X, Y) is, the smaller the Hellinger distance between X and Y will be Therefore tndexiX.Y) directly reflects their similarity For example, Anise 2 (9) and Anise /f4(/0) have many volatile compounds in common (Table 11). The similarity index between these two spices is 0.96d indicating high degree of similarity. On the contrary, when volatile compounds of Anise U2 are compared with those of coriander If I (/ /), as shown in Table ill, an index value of 0 is obtained, indicating no similarity between these two spices... [Pg.214]

Indices for measuring the extent of isostructurality of two or more organic crystal structures can be applied not only to homomolecular crystals but also to molecular associates. such as inclusion compounds. A qualitative summary of these descriptors follows, and the reader is referred to a recent account for explicit mathematical definitions. Earlier descriptors of isostructurality included the " d-gree of isostructurahty" Ii(n) (based on the distance differences ARj between the crystal coordinates of identical nonhydrogen atoms within the same section of the asymmetric units of two or more related structures) the packing coefficient increment. A(pc) and the unit cell similarity index... [Pg.768]

Following the distance measure of molecular (dis)similarity, another important similarity index was introduced. Consider two vectors a and b. The classic scalar product between two vectors is calculated as... [Pg.136]

This index is not, however, very convenient for the direct calculation of similarity index. This is due to the fact that the values of calculated using eq. (138) are not invariant with respect to the distance and the mutual position of the corresponding molecules. The origin of this noninvariance arises from the presence of generally multicenter integrals (139) in which the integration is performed over the orbitals centered on the molecules A and B... [Pg.115]

We will use x(HD) as a similarity index on the 35 CgHjo nonane isomers. In Table 9.4, we show listed lexicographically the ordered row sums of the Hamming distance matrices for all 35 nonane isomers, which can be viewed as nine-component vectors. A way to characterize similarities is to consider the absolute differences of the corresponding terms in two ordered row sums. In this way one finds that among 595 pairs of nonanes the most similar are 2,6-dimethylheptane and 2,5-dimethyl-heptane. Their nine-component vectors differ only in the last component by one. In... [Pg.256]

This concept could be extended to any other linear and nonlinear QSAR relationships, by calculating either n x n distance matrices D (especially suited for nonlinear relationships) or n X n covariance matrices C as similarity measures. For this purpose, all or only several relevant properties of the compounds are used to calculate the corresponding similarity matrices. No superposition of the molecules is necessary. If a distance matrix D is calculated from the X matrix of explanatory physicochemical properties n rows, m columns), then all Xij values must be normalized before, i.e., mean-value-centered and standardized, column by column. The great advantage of distance similarity index matrices is that no special models need to be defined in the case of nonlinear relationships on the other hand, problems may arise from significant intercorrelations between the different columns of the similarity matrices. [Pg.2319]

Full structure search can be developed by using similar approaches to those employed in the case of 2D structure search. Thus, some topological indices can be modified in such a way that they include geometrical information. For example, the global index given by Eq. (4) can be modified to Eq. (11), where are real interatomic distances. [Pg.314]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...