Similarity and distance

To be able to cluster objects, one must measure their similarity. From our introduction, it is clear that distance may be such a measure. However, many types of similarity coefficients may be applied. While the terms similarity or dissimilarity have no unique definitions, the definition of distance is much clearer [6]. A dissimilarity between two objects i and i is a distance if [Pg.60]

Equation (30.1) shows that distances are zero or positive eq. (30.2) that they are symmetric. Equation (30.3), where a is another object, is called the metric inequality. It states that the sum of the distances from any object to objects i and i can never be smaller than the distance between i and T. [Pg.60]

Similarity and Distance. Two sequences of subgraphs m and n such as those in Table 1 have the property that there is a built-in one-to-one correspondence between the elements of one sequence (m,) and those of the other (/i,). Accordingly, it is straightforward to calculate various well-known (17) measures of the distance d between the sequences, e.g. Euclidean distance [2,( Wi city block distance... [Pg.170]

In order to find structures in a data set or to reveal similarities of samples, organisms,. .. which in the following are called objects, first of all one needs a similarity measure. The simplest similarity measure can be derived from geometry. Without proof one intuitively accepts that similarity and distance are complementary in nature and remember the law of PYTHAGORAS about the distance d of two points Ox and C)2 in a rectangular system of two axes y and x ... [Pg.153]

Comparison and ranking of sites according to chemical composition or toxicity is done by multivariate nonparametric or parametric statistical methods however, only descriptive methods, such as multidimensional scaling (MDS), principal component analysis (PCA), and factor analysis (FA), show similarities and distances between different sites. Toxicity can be evaluated by testing the environmental sample (as an undefined complex mixture) against a reference sample and analyzing by inference statistics, for example, t-test or analysis of variance (ANOVA). [Pg.145]

However, these descriptor-based similarity definitions present only one class of available similarity and distance measures. Approaches to molecular... [Pg.126]

The fundamental difference between similarity and distance measures is that the latter... [Pg.202]

Similarity and distance measures form the basis for most of the analysis and selection methods described in the next section and the reader is referred to the reviews by Willett et al. (2, and references therein) for a fuller discussion of the characteristics and specific properties of these measures. [Pg.202]

Similarity and distance (or dissimilarity) measures provide the means for converting the attributes of the objects into a relevant numerical score. [Pg.134]

Similarity and distance between objects are complementary concepts for which there is no single formal definition. In practice, distance as a measure of dissimilarity is a much more clearly defined quantity and is more extensively used in cluster analysis. [Pg.96]

Virtual screening techniques require the definition of a chemistry space in order that the similarity (and distance) between compounds within the space can be quantified. Once a space has been defined, a diverse subset is one that covers the chemistry space well, whereas a focused subset is one that is restricted to a localized region within the space. A chemistry space is defined through the use of numerical descriptors, which can be calculated for molecules, as shown schematically in Fig. 1. The similarity (and... [Pg.618]

Distance is complementary to similarity. A few lines have been discussed on distance coefficients in the previous sections. The complemerrtary relationship between the similarity and distance coefficients allows the ealcirlation of one from the value provided for the other by subtracting it form one, that is. [Pg.54]

It is generally accepted without proof that similarity and distance are complementary objects close together in multidimensional space are more alike than those further apart. Most of the similarity measures used in practice are based on some distance function, and whilst many such functions are referenced in the literature the most common is the simple Euclidean distance metric. [Pg.584]

In simplistic terms, the concept of CS can be considered to be a multidimensional extension of the concept of a congeneric series. However, an important distinction between the two is that CS involves a pairwise relation that specifies the relationship of the molecules to each other, generally in terms of a molecular similarity or CS-distance function. A set of objects and a pairwise relation among them are the basic ingredients of a mathematical space. In the present case, the objects are molecules and the pairwise relation characterizes the similarity or distance of separation of each pair of molecules in the CS. Similarity and distance are inversely related the more similar a pair of molecules, the closer they are in CS, and vice versa. [Pg.4]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...