Distance and Similarity Measures

Following the similar structure - similar property principle , high-ranked structures in a similarity search are likely to have similar physicochemical and biological properties to those of the target structure. Accordingly, similarity searches play a pivotal role in database searches related to drug design. Some frequently used distance and similarity measures are illustrated in Section 8.2.1. [Pg.405]

The similarity between compounds is estimated in terms of a distance measure between two diSerent objects s and t. The objects s and t are described by the vectors Xg = (Xjj, Xj2,. .., Xg ) and x, = (x j, x,2,. .., Xj j) where m denotes the number of real variables and x, and x,j are each the jth element of the corresponding vector. For calculation of the distance and similarity of two compounds, the variables Xj should have a comparable magnitude. Otherwise scaling or normalization of the variables has to be performed. Two of the most prominent distance measures are given by Eqs. (1) and (2) [Pg.405]

As the scalar product of two vectors is related to the cosine of the angle included by these vectors by Eq. (4), a frequently used similarity measure is the cosine coefficient (Eq. (5)). [Pg.406]

The calculation of a distance measure for two objects s and t represented by binary desaiptors and jq with m binary values is based on the frequencies of common and different components. For this purpose we define the frequencies a, h, c, and d as follows [Pg.406]

The frequencies a and d reflect the similarity between two object s and t, whereas h and c provide information about their dissimilarity (Eqs. (6)- 8)) [Pg.406]

Because of the continuous nature of the vector components described in this section, other types of distance and similarity measures have been used. [Pg.20]

The most important relationships between distance and similarity measures are given below. [Pg.695]

When variables are represented by binary descriptors, that is, variables whose values are either zero or one, different appropriate distance and similarity measures must be used. [Pg.697]

To learn about methods for data preprocessing and for calculating distances and similarity measures... [Pg.135]

Spectra and chemical structure searches are based on distance and similarity measures as introduced in Section 5.2. Different strategies are known sequential search, search based on inverted lists, and hierarchical search trees. The strategies are explained for search of spectra. [Pg.286]

Balopoulos, V., Hatzimichailidis, A. G. Papadoupoulos, B. K. (2007). Distance and similarity measures for fuzzy operators. Science Direct-Information Sciences (177) 2336-2348. [Pg.339]

The first of these two is also called the Tanimoto coefficient by some authors. It can be verified that, since distance = 1 - similarity, this is equal to the simple matching coefficient. Clearly, confusion is possible and authors using a certain distance or similarity measure should always define it unambiguously. [Pg.66]

A fundamental idea in multivariate data analysis is to regard the distance between objects in the variable space as a measure of the similarity of the objects. Distance and similarity are inverse a large distance means a low similarity. Two objects are considered to belong to the same category or to have similar properties if their distance is small. The distance between objects depends on the selected distance definition, the used variables, and on the scaling of the variables. Distance measurements in high-dimensional space are extensions of distance measures in two dimensions (Table 2.3). [Pg.58]

Depending of the kind of variables (continuous, binary, ranks, angles, etc.), several different measures of distance and similarity were defined. [Pg.695]

Before delving into the specific similarity calculation, we start our discussion with the characteristics of attributes in multidimensional data objects. The attributes can be quantitative or qualitative, continuous or binary, nominal or ordinal, which determines the corresponding similarity calculation (Xu and Wunsch, 2005). Typically, distance-based similarity measures are used to measure continuous features, while matching-based similarity measures are more suitable for categorical variables. [Pg.90]

Distance-Based Similarity Measures Similarity measm-es determine the proximity (or distance) between two data objects. Multidimensional objects can be formalized as numerical vectors O, = oy = 1 data object and p is the number of dimensions for the data object Oy. Figure 5.1 provides an intuitive view of multidimensional data. The similarity between two objects can be measured by a distance function of corresponding vectors Oj and (. ... [Pg.90]

Matching-Based Similarity Measures For categorical attributes, distance-based similarity measures cannot be performed directly. The most straightforward way is to compare the similarity of objects for pairs of categorical attributes (Zhao and Karypis, 2005). For two objects that contain simple binary attributes, ... [Pg.92]

In the simplest case, a discriminant analysis is performed in order to check the affiliation (yes/no decision) of an unknown to a particular class, e.g. in case of a pur-ity/quality check or a substance identification. A sample may equally well be assigned between various classes (e.g., quahty levels) if a corresponding series of mathematical models has been estabhshed. Models are based on a series of test spectra, which has to completely cover the variations of particular substances in particular chemical classes. From this series of test spectra, classes of similar objects are formed by means of so-called discriminant functions. The model is optimized with respect to the separation among the classes. The evaluation of the assignment of objects to the classes of an established model is performed by statistically backed distance and scattering measures. [Pg.1048]

The efiectiveness of the BNB and NBN measures was assessed by simulated property-prediction experiments. These experiments involved the QSAR data sets studied previously by Pepperrell and Willetti" for the evaluation of distance-based similarity measures and a large set of 6-deoxyhexopyranose carbohydrates, which had previously been classified into 14 shape classes using numerical clustering methods based on torsional dissimilarity coefficients. The comparison encompassed the Bemis-Kuntz and Lederle measures, including not just the atom-triplet but also the atom-pair and atom-quadruplet versions of the former measure. The results were equivocal, in that it was impossible to... [Pg.36]

The distance between objects is considered as a measure for their similarity. Distance and similarity are reciprocal. The distance between two objects depends on (a) the chosen features, (b) scaling and normalization, and (c) the mathematical definition used for the distance. A distance dab between the objects... [Pg.349]

Similarity is often used as a general term to encompass either similarity or dissimilarity or both (see Section 6.4.3, on similarity measures, below). The terms "proximity" and distance are used in statistical software packages, but have not gained wide acceptance in the chemical literature. Similarity and dissimilarity can in principle lead to different rankings. [Pg.303]

A similarity measure is required for quantitative comparison of one strucmre with another, and as such it must be defined before the analysis can commence. Structural similarity is often measured by a root-mean-square distance (RMSD) between two conformations. In Cartesian coordinates the RMS distance dy between confonnation i and conformation j of a given molecule is defined as the minimum of the functional... [Pg.84]

It is up to the researcher to decide whether to use a Cartesian similarity measure or a dihedral measure and what elements to include in the summation [29]. It should be stressed that while the RMS distances perfonn well and are often used, there are no restrictions against other similarity measures. Eor example, similarity measures that emphasize chemical interactions, hydrophobicity, or the relative orientation of large molecular domains rather than local geometry may serve well if appropriately used. [Pg.84]

The distance matrix A, which holds the relative distances (by whatever similarity measure) between the individual confonnations, is rarely informative by itself. For example, when sampling along a molecular dynamics trajectory, the A matrix can have a block diagonal form, indicating that the trajectory has moved from one conformational basin to another. Nonetheless, even in this case, the matrix in itself does not give reliable information about the size and shape of the respective basins. In general, the distance matrix requires further processing. [Pg.85]

Another principal difficulty is that the precise effect of local dynamics on the NOE intensity cannot be determined from the data. The dynamic correction factor [85] describes the ratio of the effects of distance and angular fluctuations. Theoretical studies based on NOE intensities extracted from molecular dynamics trajectories [86,87] are helpful to understand the detailed relationship between NMR parameters and local dynamics and may lead to structure-dependent corrections. In an implicit way, an estimate of the dynamic correction factor has been used in an ensemble relaxation matrix refinement by including order parameters for proton-proton vectors derived from molecular dynamics calculations [72]. One remaining challenge is to incorporate data describing the local dynamics of the molecule directly into the refinement, in such a way that an order parameter calculated from the calculated ensemble is similar to the measured order parameter. [Pg.270]

Price (1968) made similar measurements, but placed a m.onolith right over the bed. This eliminated the radial components of the flow velocity after the fluid left the bed. Price also operated at an Rep an order of magnitude higher than Schwartz and Smith and his conclusion was that the maximum is at a half-pellet diameter distance from the wall. Vortmeyer and Schuster (1983) investigated the problem by variational calculation and found a steep maximum near the wall inside the bed. This was considerably steeper than those measured experimentally above the bed. [Pg.17]

In TLC, plate efficiency is measured in a similar manner to that in GC and LC employing a similar expression. However, the retention distance from the sampling point to the position of the spot center is substituted for the retention volume, time or distance and the spot width as an alternative to the peak base width, viz.. [Pg.450]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...