Similarity measures

To become familiar with the basics of chemical structure similarity, similarity measures, and different approaches exploited within the similarity search process. [Pg.291]

Similarity is often used as a general term to encompass either similarity or dissimilarity or both (see Section 6.4.3, on similarity measures, below). The terms "proximity" and distance are used in statistical software packages, but have not gained wide acceptance in the chemical literature. Similarity and dissimilarity can in principle lead to different rankings. [Pg.303]

Usually, the denominator, if present in a similarity measure, is just a normalizet it is the numerator that is indicative of whether similarity or dissimilarity is being estimated, or both. The characteristics chosen for the description of the objects being compared are interchangeably called descriptors, properties, features, attributes, qualities, observations, measurements, calculations, etc. In the formiilations above, the terms matches and mismatches" refer to qualitative characteristics, e.g., binary ones (those which take one of two values 1 (present) or 0 (absent)), while the terms overlap and difference" refer to quantitative characteristics, e.g., those whose values can be arranged in order of magnitude along a one-dimensional axis. [Pg.303]

In order to compare two chemical (or any other) objects, e.g., two molecules, we need a measure. Plenty of similarity measures have been proposed they are listed in Table 6-2. Generally speaking these measures can be divided into two cases one of qualitative characteristics, and the other of quantitative characteristics. Here we consider these two cases. [Pg.304]

Following Bradshaw [17], we can give the definition of a similarity measure as follows Consider two objects A and B, a is the number of features (characteristics) present in A and absent in B, b is the number of features absent in A and present in B, c is the number of features common to both objects, and d is the number of features absent from both objects. Thus, c and d measure the present and the absent matches, respectively, i.e., similarity while a and b measure the corresponding mismatches, i.e., dissimilarity. The total ntunber of features is n = a + b + c + d. [Pg.304]

The total number of bits set on A is a + c. and the total number of bits set on B is b + c. These totals form the basis of an alternative notation that uses a instead of a + c, and b instead oib + c [16]. This notation, however, lumps together similarity and dissimilarity components" - a disadvantage when interpreting a similarity measure. [Pg.304]

Consequently, we can construct a similarity measure intuitively in the following way all matches c -i- d relative to all possibilities, i.e., matches plus mismatches (c+ d) + (a -I- h), yields (c -t- d) / a + b+ c + d), which is called the simple matching coefficient [18], and equal weight is given to matches and mismatches. (Normalized similarity measures are called similarity indices or coefficients see, e.g.. Ref. [19].) When absence of a feature in both objects is deemed to convey no information, then d should not occur in a similarity measure. Omitting d from the above similarity measure, one obtains the Tanimoto (alias Jaccard) similarity measure (Eq. (8) see Ref. [16] and the citations therein) ... [Pg.304]

For examples of different types of similarity measures, see Table 6-2. The Tanimoto similarity measure is monotonic with that of Dice (alias Sorensen, Czekanowski), which uses an arithmetic-mean normaJizer, and gives double weight to the present matches. Russell/Rao (Table 6-2) add the matching absences to the nor-malizer in Tanimoto the cosine similarity measure [19] (alias Ochiai) uses a geometric mean normalizer. [Pg.304]

We should mention here that using just similarity or dissimilarity in a similarity measure might be misleading. Therefore, some composite measures using both similarity and dissimilarity have been developed. These are the Hamann and the Yule measures (Table 6-2). A simple product of (1 - Tanimoto) and squared Eucli-... [Pg.304]

Asymmetry in a similarity measure is the result of asymmetrical weighing of a dissimilarity component - multiplication is commutative by definition, difference is not. By weighing a and h, one obtains asymmetric similarity measures, including the Tversky similarity measure c j aa 4- fih + c), where a and fi are user-defined constants. The Tversky measure can be regarded as a generalization of the Tanimoto and Dice similarity measures like them, it does not consider the absence matches d. A particular case is c(a + c), which measures the number of common features relative to all the features present in A, and gives zero weight to h. [Pg.308]

The resulting similarity measures are overlap-like Sa b = J Pxi ) / B(r) Coulomblike, etc. The Carbo similarity coefficient is obtained after geometric-mean normalization Sa,b/ /Sa,a Sb,b (cosine), while the Hodgkin-Richards similarity coefficient uses arithmetic-mean normalization Sa,b/0-5 (Saa+ b b) (Dice). The Cioslowski [18] similarity measure NOEL - Number of Overlapping Electrons (Eq. (10)) - uses reduced first-order density matrices (one-matrices) rather than density functions to characterize A and B. No normalization is necessary, since NOEL has a direct interpretation, at the Hartree-Fodt level of theory. [Pg.308]

In order to apply the similarity measures to the objects, the latter must be described by some characteristics. [Pg.309]

In general, different similarity measures yield different rankings, except when they are monotonic. Improved results are obtained by using data fusion methods to combine the rankings resulting from different coefficients. [Pg.312]

D similarity search methods are quite well developed. Thus, methods which attempt to find overlapping parts (atoms and functional groups) of the molecular moieties studied were reported first [31]. As discussed above for the case of 2D searching, these methods are of combinatorial complexity. To reduce this complexity some field-based methods have been introduced. In this case, the overlap of the fields of two structures is considered as a similarity measure. [Pg.314]

Following the similar structure - similar property principle", high-ranked structures in a similarity search are likely to have similar physicochemical and biological properties to those of the target structure. Accordingly, similarity searches play a pivotal role in database searches related to drug design. Some frequently used distance and similarity measures are illustrated in Section 8.2.1. [Pg.405]

As the scalar product of two vectors is related to the cosine of the angle included by these vectors by Eq. (4), a frequently used similarity measure is the cosine coefficient (Eq. (5)). [Pg.406]

Jarvis R A and E A Patrick 1973. Clustering Using a Similarity Measure Based on Shared Near Neighbours. IEEE Transactions in Computers C-22 1025-1034. [Pg.523]

Karfunkel H R, B Rohde, F J J Leusen, R J Gdanitz, emd G Rihs 1993. Continuous Similarity Measure Between Nonoverlapping X-ray Powder Diagrams of Different Crystal Modifications. Journal oj Computational Chemistry 14 1125-1135. [Pg.523]

Bradshaw J 1997. Introduction to Tversky Similarity Measure. At http //www.daylight.com meetings / mu g97 / Bradshaw / MUG97 / tv tversky.html. [Pg.737]

A, J S Mason and I M McLay 1997. Similarity Measures for Rational Set Selection and Analysis lombinatorial Libraries The Diverse Property-Derived (DPD) Approach. Journal of Chemical irtnation and Computer Science 37 599-614. [Pg.740]

The standardized variable (the z statistic) requires only the probability level to be specified. It measures the deviation from the population mean in units of standard deviation. Y is 0.399 for the most probable value, /x. In the absence of any other information, the normal distribution is assumed to apply whenever repetitive measurements are made on a sample, or a similar measurement is made on different samples. [Pg.194]

Accurate mass measurement on a molecular ion of any substance gives directly the molecular formula for fragment ions, similar measurement gives their elemental compositions. [Pg.416]

Surfactants aid dewatering of filter cakes after the cakes have formed and have very Httle observed effect on the rate of cake formation. Equations describing the effect of a surfactant show that dewatering is enhanced by lowering the capillary pressure of water in the cake rather than by a kinetic effect. The amount of residual water in a filter cake is related to the capillary forces hoi ding the Hquids in the cake. Laplace s equation relates the capillary pressure (P ) to surface tension (cj), contact angle of air and Hquid on the soHd (9) which is a measure of wettabiHty, and capillary radius (r ), or a similar measure appHcable to filter cakes. [Pg.21]

The rapid development of microelectronics has enabled many similar measurements to be made with data collecting systems and then stored electronically. The raw data can then be downloaded to the data processing installation, where they can be plotted and evaluated at any time [1]. This applies particularly to monitoring measurements on pipelines for intensive measurements, see Section 3.7. Figure 3-1 shows an example of a computer-aided data storage system. [Pg.79]

See also in sourсe #XX -- [ Pg.304 , Pg.405 ]

See also in sourсe #XX -- [ Pg.153 ]

See also in sourсe #XX -- [ Pg.416 ]

See also in sourсe #XX -- [ Pg.189 ]

See also in sourсe #XX -- [ Pg.69 ]

See also in sourсe #XX -- [ Pg.135 , Pg.172 , Pg.173 , Pg.286 , Pg.287 , Pg.288 ]

See also in sourсe #XX -- [ Pg.273 , Pg.295 ]

See also in sourсe #XX -- [ Pg.4 ]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...