Distance measures city-block

The distance between object points is considered as an inverse similarity of the objects. This similarity depends on the variables used and on the distance measure applied. The distances between the objects can be collected in a distance matrk. Most used is the euclidean distance, which is the commonly used distance, extended to more than two or three dimensions. Other distance measures (city block distance, correlation coefficient) can be applied of special importance is the mahalanobis distance which considers the spatial distribution of the object points (the correlation between the variables). Based on the Mahalanobis distance, multivariate outliers can be identified. The Mahalanobis distance is based on the covariance matrix of X this matrix plays a central role in multivariate data analysis and should be estimated by appropriate methods—mostly robust methods are adequate. [Pg.71]

FIGURE 2.10 Euclidean distance and city block distance (Manhattan distance) between objects represented by vectors or points xA and xB. The cosine of the angle between the object vectors is a similarity measure and corresponds to the correlation coefficient of the vector... [Pg.59]

To construct dissimilarity measures, one uses mismatches Here a + b is the Hamming (Manhattan, taxi-cab, city-block) distance, and a + h) is the Euclidean distance. [Pg.304]

For those variables that are measured on a scale of integer values consisting of more than two levels, one uses the Manhattan or city-block distance. This is also referred to as the L,-norm. It is given for variable j by ... [Pg.66]

Similarity and Distance. Two sequences of subgraphs m and n such as those in Table 1 have the property that there is a built-in one-to-one correspondence between the elements of one sequence (m,) and those of the other (/i,). Accordingly, it is straightforward to calculate various well-known (17) measures of the distance d between the sequences, e.g. Euclidean distance [2,( Wi city block distance... [Pg.170]

Bit vectors live in an -dimensional, discrete hypercubic space, where each vertex of the hypercube corresponds to a set. Figure 2 provides an example of sets with three elements. Distances between two bit vectors, vA and vB, measured in this space correspond to Hamming distances, which are based on the city-block Zj metric... [Pg.11]

Distances in these spaces should be based upon an Zj or city-block metric (see Eq. 2.18) and not the Z2 or Euclidean metric typically used in many applications. The reasons for this are the same as those discussed in Subheading 2.2.1. for binary vectors. Set-based similarity measures can be adapted from those based on bit vectors using an ansatz borrowed from fuzzy set theory (41,42). For example, the Tanimoto similarity coefficient becomes... [Pg.17]

Distances with C = 1 are especially useful in the classification of local data as simple as in Fig. 5-12, where simply d( 1, 2) = a + b. They are also known as Manhattan, city block, or taxi driver metrics. These distances describe an absolute distance and may be easily understood. With C = 2 the distance of Eq. 5-7, the EUCLIDean distance, is obtained. If one approaches infinity, C = oo, in the maximum metric the measurement pairs with the greatest difference will have the greatest weight. This metric is, therefore, suitable in outlier recognition. [Pg.154]

When m=. Equation (4) defines the city-block metric, and if m = 2 then the Euclidean distance is defined. Figure 5 illustrates these measures on two-dimensional data. [Pg.100]

Figure 5.2 Demonstration of using different distance measures. The dataset used for this demonstration is the subset of genes from microarray study by Bhattachaijee et al. (2001). The clustering method used is the T-means algorithm. Different distance measures result in different clustering solutions a) Euclidean, b) city block, (c) Pearson, and (rf) cosine distances. (See color insert.)...

In the case of k = 2, Eq. [8] corresponds to the well-known Euclidean distance of which Eq. [7] is an integral version. In the case of k = 1, we find the Manhattan or city-block distance. The choice of Euclidean distance is a computationally interesting choice, but it is by no means the only one possible. In this context, it is appropriate to mention the four requirements that should be associated with a true distance measure ... [Pg.135]

In the first step of HCA, a distance matrix is calculated that contains the complete set of interspectral distances. The distance matrix is symmetric along its diagonal and has the dimension nxn, with n as the number of patterns. Spectral distance can be obtained in different ways depending on how the similarity of two patterns is calculated. Popular distance measures are Euclidean distances, including the city-block distance (Manhattan block distance), Mahalanobis distance, and so-called differentiation indices (D-values, see also Appendix B) . [Pg.211]

The first three searches compared the performance of different similarity measures with the full INBS frequency data. When the performance of the frequency Tanimoto measure and the city block metric was compared, it was found that the hits ranked to the top using the Tanimoto measure were not as similar to the query structure as those ranked top by the city block metric. Generally it was found that the city block and Euchdean distance measures ranked the hits equally well, however in two cases the city block ranking was better than the Euclidean ranking, so use of the city block metric may be preferred. The fourth search used only the paths from the most connected atom in the structure in a city block metric calculation. A comparison of the performance of the INBS search for individual query structures using the city block metric for all paths and for the Morgan root atom alone is discussed below. [Pg.370]

Figure 5 Mass spectra can be considered as points or vectors in a multidimensional spectral space. For simplicity only two mass numbers (43, 58) have been selected in this example. A , abundance (peak height in %B) at mass m U, spectrum from unknown R1, reference spectrum of propanal R2, reference spectrum of acetone. Measures for spectral similarity are the Euclidean distance (d), the city block distance (A43 + Ajs) or the cosine of angle a (equivalent to S, in Eqn [2]).

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...