City block metrics

The same idea can be developed in the case of a non-Euclidean metric such as the city-block metric or L,-norm (Section 31.6.1). Here we find that the trajectories, traced out by the variable coefficient kj are curvilinear, rather than linear. Markers between equidistant values on the original scales of the columns of X are usually not equidistant on the corresponding curvilinear trajectories of the nonlinear biplot (Fig. 31.17b). Although the curvilinear trajectories intersect at the origin of space, the latter does not necessarily coincide with the centroid of the row-points of X. We briefly describe here the basic steps of the algorithm and we refer to the original work of Gower [53,54] for a formal proof. [Pg.152]

Distances in these spaces should be based upon an Zj or city-block metric (see Eq. 2.18) and not the Z2 or Euclidean metric typically used in many applications. The reasons for this are the same as those discussed in Subheading 2.2.1. for binary vectors. Set-based similarity measures can be adapted from those based on bit vectors using an ansatz borrowed from fuzzy set theory (41,42). For example, the Tanimoto similarity coefficient becomes... [Pg.17]

When m=. Equation (4) defines the city-block metric, and if m = 2 then the Euclidean distance is defined. Figure 5 illustrates these measures on two-dimensional data. [Pg.100]

Figure 5 The Euclidean and city-block metrics for two-dimensional data...

All INBSs with a city block metric similarity measure (3) ... [Pg.368]

The first three searches compared the performance of different similarity measures with the full INBS frequency data. When the performance of the frequency Tanimoto measure and the city block metric was compared, it was found that the hits ranked to the top using the Tanimoto measure were not as similar to the query structure as those ranked top by the city block metric. Generally it was found that the city block and Euchdean distance measures ranked the hits equally well, however in two cases the city block ranking was better than the Euclidean ranking, so use of the city block metric may be preferred. The fourth search used only the paths from the most connected atom in the structure in a city block metric calculation. A comparison of the performance of the INBS search for individual query structures using the city block metric for all paths and for the Morgan root atom alone is discussed below. [Pg.370]

It was found that the best method currently tested is that which compares the incremental rates of change in the distribution of the INBSs in a structure using the city block metric similarity measure. The performance of this measure did differ shghtly for each of the chosen query structures which were selected to represent a variety of molecular sizes and structural patterns. However, the success of the pattern search appears to be on the same level as the current 2-D similarity search available in the CSD System based on bit-screens. Interestingly, those structures which are considered most similar to the query are not the same in the two searches presumably because they are based on different attribute sets. [Pg.371]

Non-linear PCA can be obtained in many different ways. Some methods make use of higher order terms of the data (e.g. squares, cross-products), non-linear transformations (e.g. logarithms), metrics that differ from the usual Euclidean one (e.g. city-block distance) or specialized applications of neural networks [50]. The objective of these methods is to increase the amount of variance in the data that is explained by the first two or three components of the analysis. We only provide a brief outline of the various approaches, with the exception of neural networks for which the reader is referred to Chapter 44. [Pg.149]

The chemical constitution of a molecule or an ensemble of molecules (EM) of n atoms is representable by a symmetric n X n BE-matrix and corresponds accordingly to a point P in TR ( +D/a an n(n +1)/2 dimensional Euclidean space, the Dugundji space of the FIEM(A). The "city block distance of two points P i and P 2 is twice the number of electrons that are involved in the interconversion EMi EM2 of those EM that belong to the points Pi and P2. This chemical metric on the EM of an FIEM provides not only a formalism for constitutional chemistry, but also allows us to use the properties of Euclidean spaces in expressing the logical structure of the FIEM, and thus of constitutional chemistry 3e>32c>. [Pg.35]

Bit vectors live in an -dimensional, discrete hypercubic space, where each vertex of the hypercube corresponds to a set. Figure 2 provides an example of sets with three elements. Distances between two bit vectors, vA and vB, measured in this space correspond to Hamming distances, which are based on the city-block Zj metric... [Pg.11]

A simple City Block or LI metric has been found to be fairly successful in comparing candidate splice points. One simply computes... [Pg.466]

Distances with C = 1 are especially useful in the classification of local data as simple as in Fig. 5-12, where simply d( 1, 2) = a + b. They are also known as Manhattan, city block, or taxi driver metrics. These distances describe an absolute distance and may be easily understood. With C = 2 the distance of Eq. 5-7, the EUCLIDean distance, is obtained. If one approaches infinity, C = oo, in the maximum metric the measurement pairs with the greatest difference will have the greatest weight. This metric is, therefore, suitable in outlier recognition. [Pg.154]

Figure 7.1 Authentication of monovarietal virgin olive oils results of applying clustering analysis to volatile compounds. The Mahattan (city block) distance metric and Ward s amalgamation methods were used in (a) the Squared Euclidean distance and (b) complete linkage amalgamation methods. Note A, cv. Arbequina (6) C, cv. Coratina (6) K, cv. Koroneiki (6) P, cv. Picual (6) 1, harvest 1991 2, harvest 1992. Olives were harvested at three levels of maturity (unripe, normal, overripe) (source SEXIA Group-Instituto de la Grasa, Seville, Spain).

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...