Manhattan distance measure

Manhattan distances can be used also for continuous variables, but this is rarely done, because one prefers Euclidean distances in that case. Figure 30.6 compares the Euclidean and Maiihattan distances for two variables. While the Euclidean distance between i and i is measured along a straight line connecting the two points, the Manhattan distance is the sum of the distances parallel to the axes. The equations for both types of distances are very similar in appearance. In fact, they both belong to the Minkowski distances given by ... [Pg.67]

FIGURE 2.10 Euclidean distance and city block distance (Manhattan distance) between objects represented by vectors or points xA and xB. The cosine of the angle between the object vectors is a similarity measure and corresponds to the correlation coefficient of the vector... [Pg.59]

Distance measures were already discussed in Section 2.4. The most widely used distance measure for cluster analysis is the Euclidean distance. The Manhattan distance would be less dominated by far outlying objects since it is based on absolute rather than squared differences. The Minkowski distance is a generalization of both measures, and it allows adjusting the power of the distances along the coordinates. All these distance measures are not scale invariant. This means that variables with higher scale will have more influence to the distance measure than variables with smaller scale. If this effect is not wanted, the variables need to be scaled to equal variance. [Pg.268]

The USR (Ultrafast Shape Recognition) Method. This method was reported by Ballester and Richards (53) for compound database search on the basis of molecular shape similarity. It was reportedly capable of screening billions of compounds for similar shapes on a single computer. The method is based on the notion that the relative position of the atoms in a molecule is completely determined by inter-atomic distances. Instead of using all inter-atomic distances, USR uses a subset of distances, reducing the computational costs. Specifically, the distances between all atoms of a molecule to each of four strategic points are calculated. Each set of distances forms a distribution, and the three moments (mean, variance, and skewness) of the four distributions are calculated. Thus, for each molecule, 12 USR descriptors are calculated. The inverse of the translated and scaled Manhattan distance between two shape descriptors is used to measure the similarity between the two molecules. A value of 1 corresponds to maximum similarity and a value of 0 corresponds to minimum similarity. [Pg.124]

In a subsequent study, we examined the influence of seven similarity indices on the enrichment of actives using the topological CATS descriptor and the 12 COBRA datasets [31]. In particular, we evaluated to what extent different similarity measures complement each other in terms of the retrieved active compounds. Retrospective screening experiments were carried out with seven similarity measures Manhattan distance, Euclidian distance, Tanimoto coefficient, Soergel distance, Dice coefficient, cosine coefficient, and spherical distance. Apart from the GPCR dataset, considerable enrichments were achieved. Enrichment factors for the same datasets but different similarity measures differed only slightly. For most of the datasets the Manhattan and the Soergel distance... [Pg.60]

Note that the Minkowski distance represents a family of distance measures, for which the higher the value of r, the greater the importance given to large differences. For r=l, the Minkowski distance is the Manhattan distance, for r = 2 is the Euclidean distance, and for r oc is the Lagrange distance. [Pg.696]

Analogous to the Euclidean distance matrix are the data distance matrices obtained using different distance measures, such as Manhattan distance, Canberra distance, Lagrange distance, and so on. Moreover, an Euclidean-distance map matrix was defined, which encodes information about graphs used to describe proteomics maps. [Pg.704]

In the case of full spectra or of other analytical signal curves, both the Eucfidean distance and the Manhattan distance are used as similarity measures. In the case of the Manhattan distance, the differences between the unknown and the library spectrum are summed. As a result of comparison, a hit list ranked according to distances or similarities of spectra is again obtained. [Pg.288]

Essentially, the similarity between compounds is estimated in terms of a distance measure between two different objects, described by vectors. Scaling of the variables is advisable if they do not have comparable magnitude. The most prominent distance measures are the Euclidean distance, the average Euclidian distance, and the Manhattan distance. A comprehensive overview of methods for chemical similarity searching has been published by Wfilet et alF Except for similarity searches among compounds in databases, such similarity measures are frequently applied in the design and analysis of combinatorial libraries. ... [Pg.217]

An ordinal variable is a nominal attribute with multiple states ordered in a meaningful sequence. Consider an attribute that measures the degree of suicide risk on the scale low, moderate, high. Obviously, the values of the ordinal attribute can be mapped to successive integers. The dissimilarity between two objects X and Y with ordinal attributes is measured as the Manhattan distance [Eq. (5.2)] divided by the number of variables for both objects (Kaufman and Rousseeuw, 1990). [Pg.95]

Figure 16.5 (a) Illustration of the Manhattan distance between two feature combinations. The distance is calculated separately for each feature and the totals added, coming in this case to 2.1 + 1.0 = 3.1.(b) Illustration of the Euclidean distance between the same two features. In this case, the shortest line between the two points is foimd and its distance measured. In this case, the distance is (2.1) + 1.0 = 1.76. [Pg.489]

In the case of k = 2, Eq. [8] corresponds to the well-known Euclidean distance of which Eq. [7] is an integral version. In the case of k = 1, we find the Manhattan or city-block distance. The choice of Euclidean distance is a computationally interesting choice, but it is by no means the only one possible. In this context, it is appropriate to mention the four requirements that should be associated with a true distance measure ... [Pg.135]

In this matrix the most similar pair of (different) objects is (1.4), while the most divergent pair is (3.4). Apart from the classical Euclidean distance defined by Equation 8.2, some further relevant measures exist such as Mahalanobis or Manhattan distance. The Mahalanobis distance, for instance, which is important in classification (see Chapter 3.10). is computed according to... [Pg.53]

In the first step of HCA, a distance matrix is calculated that contains the complete set of interspectral distances. The distance matrix is symmetric along its diagonal and has the dimension nxn, with n as the number of patterns. Spectral distance can be obtained in different ways depending on how the similarity of two patterns is calculated. Popular distance measures are Euclidean distances, including the city-block distance (Manhattan block distance), Mahalanobis distance, and so-called differentiation indices (D-values, see also Appendix B) . [Pg.211]

To construct dissimilarity measures, one uses mismatches Here a + b is the Hamming (Manhattan, taxi-cab, city-block) distance, and a + h) is the Euclidean distance. [Pg.304]

For those variables that are measured on a scale of integer values consisting of more than two levels, one uses the Manhattan or city-block distance. This is also referred to as the L,-norm. It is given for variable j by ... [Pg.66]

Distances with C = 1 are especially useful in the classification of local data as simple as in Fig. 5-12, where simply d( 1, 2) = a + b. They are also known as Manhattan, city block, or taxi driver metrics. These distances describe an absolute distance and may be easily understood. With C = 2 the distance of Eq. 5-7, the EUCLIDean distance, is obtained. If one approaches infinity, C = oo, in the maximum metric the measurement pairs with the greatest difference will have the greatest weight. This metric is, therefore, suitable in outlier recognition. [Pg.154]

Bicego used the similarity-based representation of electronic nose measurements for odor classification with the SVM method.In the similarity-based representation, the raw data from sensors are transformed into pairwise (dis)similarities, i.e., distances between objects in the dataset. The electronic nose is an array of eight carbon black-polymer detectors. The system was tested for the recognition of 2-propanol, acetone, and ethanol, with 34 experiments for each compound. Two series of 102 experiments were performed, the first one with data recorded after 10 minutes of exposure, whereas in the second group of experiments, the data were recorded after 1 second of exposure. The one-versus-one cross-validation accuracy of the first group of experiments was 99% for similarity computed using the Euclidean metric. For the second group of experiments, the accuracy was 79% for the Euclidean metric and 80% for the Manhattan metric. [Pg.383]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...