Similarity measures Manhattan distance

In a subsequent study, we examined the influence of seven similarity indices on the enrichment of actives using the topological CATS descriptor and the 12 COBRA datasets [31]. In particular, we evaluated to what extent different similarity measures complement each other in terms of the retrieved active compounds. Retrospective screening experiments were carried out with seven similarity measures Manhattan distance, Euclidian distance, Tanimoto coefficient, Soergel distance, Dice coefficient, cosine coefficient, and spherical distance. Apart from the GPCR dataset, considerable enrichments were achieved. Enrichment factors for the same datasets but different similarity measures differed only slightly. For most of the datasets the Manhattan and the Soergel distance... [Pg.60]

Manhattan distances can be used also for continuous variables, but this is rarely done, because one prefers Euclidean distances in that case. Figure 30.6 compares the Euclidean and Maiihattan distances for two variables. While the Euclidean distance between i and i is measured along a straight line connecting the two points, the Manhattan distance is the sum of the distances parallel to the axes. The equations for both types of distances are very similar in appearance. In fact, they both belong to the Minkowski distances given by ... [Pg.67]

FIGURE 2.10 Euclidean distance and city block distance (Manhattan distance) between objects represented by vectors or points xA and xB. The cosine of the angle between the object vectors is a similarity measure and corresponds to the correlation coefficient of the vector... [Pg.59]

The USR (Ultrafast Shape Recognition) Method. This method was reported by Ballester and Richards (53) for compound database search on the basis of molecular shape similarity. It was reportedly capable of screening billions of compounds for similar shapes on a single computer. The method is based on the notion that the relative position of the atoms in a molecule is completely determined by inter-atomic distances. Instead of using all inter-atomic distances, USR uses a subset of distances, reducing the computational costs. Specifically, the distances between all atoms of a molecule to each of four strategic points are calculated. Each set of distances forms a distribution, and the three moments (mean, variance, and skewness) of the four distributions are calculated. Thus, for each molecule, 12 USR descriptors are calculated. The inverse of the translated and scaled Manhattan distance between two shape descriptors is used to measure the similarity between the two molecules. A value of 1 corresponds to maximum similarity and a value of 0 corresponds to minimum similarity. [Pg.124]

In the case of full spectra or of other analytical signal curves, both the Eucfidean distance and the Manhattan distance are used as similarity measures. In the case of the Manhattan distance, the differences between the unknown and the library spectrum are summed. As a result of comparison, a hit list ranked according to distances or similarities of spectra is again obtained. [Pg.288]

Essentially, the similarity between compounds is estimated in terms of a distance measure between two different objects, described by vectors. Scaling of the variables is advisable if they do not have comparable magnitude. The most prominent distance measures are the Euclidean distance, the average Euclidian distance, and the Manhattan distance. A comprehensive overview of methods for chemical similarity searching has been published by Wfilet et alF Except for similarity searches among compounds in databases, such similarity measures are frequently applied in the design and analysis of combinatorial libraries. ... [Pg.217]

In this matrix the most similar pair of (different) objects is (1.4), while the most divergent pair is (3.4). Apart from the classical Euclidean distance defined by Equation 8.2, some further relevant measures exist such as Mahalanobis or Manhattan distance. The Mahalanobis distance, for instance, which is important in classification (see Chapter 3.10). is computed according to... [Pg.53]

Bicego used the similarity-based representation of electronic nose measurements for odor classification with the SVM method.In the similarity-based representation, the raw data from sensors are transformed into pairwise (dis)similarities, i.e., distances between objects in the dataset. The electronic nose is an array of eight carbon black-polymer detectors. The system was tested for the recognition of 2-propanol, acetone, and ethanol, with 34 experiments for each compound. Two series of 102 experiments were performed, the first one with data recorded after 10 minutes of exposure, whereas in the second group of experiments, the data were recorded after 1 second of exposure. The one-versus-one cross-validation accuracy of the first group of experiments was 99% for similarity computed using the Euclidean metric. For the second group of experiments, the accuracy was 79% for the Euclidean metric and 80% for the Manhattan metric. [Pg.383]

In the first step of HCA, a distance matrix is calculated that contains the complete set of interspectral distances. The distance matrix is symmetric along its diagonal and has the dimension nxn, with n as the number of patterns. Spectral distance can be obtained in different ways depending on how the similarity of two patterns is calculated. Popular distance measures are Euclidean distances, including the city-block distance (Manhattan block distance), Mahalanobis distance, and so-called differentiation indices (D-values, see also Appendix B) . [Pg.211]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...