Similarity Measures and Data Preprocessing

In order to find structures in a data set or to reveal similarities of samples, organisms,. .. which in the following are called objects, first of all one needs a similarity measure. The simplest similarity measure can be derived from geometry. Without proof one intuitively accepts that similarity and distance are complementary in nature and remember the law of PYTHAGORAS about the distance d of two points Ox and C)2 in a rectangular system of two axes y and x [Pg.153]

This situation is shown in Fig. 5-12. The extension of this law to more than two dimensions, to spatial PYTHAGORAS leads to the EUCLIDean distance of any two objects Oj and Ok which in the following we will simply write as d(i, k) [Pg.154]

Clearly for more than m = 3 features we cannot visualize the distance. [Pg.154]

5-7 appears as a special case of the so-called MINKOWSKI metrics where m still denotes the dimension of the space spanned by the m features and C is a special parameter [Pg.154]

Distances with C = 1 are especially useful in the classification of local data as simple as in Fig. 5-12, where simply d( 1, 2) = a + b. They are also known as Manhattan, city block, or taxi driver metrics. These distances describe an absolute distance and may be easily understood. With C = 2 the distance of Eq. 5-7, the EUCLIDean distance, is obtained. If one approaches infinity, C = oo, in the maximum metric the measurement pairs with the greatest difference will have the greatest weight. This metric is, therefore, suitable in outlier recognition. [Pg.154]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...