Similarity measures multidimensional objects

Before delving into the specific similarity calculation, we start our discussion with the characteristics of attributes in multidimensional data objects. The attributes can be quantitative or qualitative, continuous or binary, nominal or ordinal, which determines the corresponding similarity calculation (Xu and Wunsch, 2005). Typically, distance-based similarity measures are used to measure continuous features, while matching-based similarity measures are more suitable for categorical variables. [Pg.90]

Distance-Based Similarity Measures Similarity measm-es determine the proximity (or distance) between two data objects. Multidimensional objects can be formalized as numerical vectors O, = oy = 1 data object and p is the number of dimensions for the data object Oy. Figure 5.1 provides an intuitive view of multidimensional data. The similarity between two objects can be measured by a distance function of corresponding vectors Oj and (. ... [Pg.90]

It is generally accepted without proof that similarity and distance are complementary objects close together in multidimensional space are more alike than those further apart. Most of the similarity measures used in practice are based on some distance function, and whilst many such functions are referenced in the literature the most common is the simple Euclidean distance metric. [Pg.584]

On the other hand, factor analysis involves other manipulations of the eigen vectors and aims to gain insight into the structure of a multidimensional data set. The use of this technique was first proposed in biological structure-activity relationship (i. e., SAR) and illustrated with an analysis of the activities of 21 di-phenylaminopropanol derivatives in 11 biological tests [116-119, 289]. This method has been more commonly used to determine the intrinsic dimensionality of certain experimentally determined chemical properties which are the number of fundamental factors required to account for the variance. One of the best FA techniques is the Q-mode, which is based on grouping a multivariate data set based on the data structure defined by the similarity between samples [1, 313-316]. It is devoted exclusively to the interpretation of the inter-object relationships in a data set, rather than to the inter-variable (or covariance) relationships explored with R-mode factor analysis. The measure of similarity used is the cosine theta matrix, i. e., the matrix whose elements are the cosine of the angles between all sample pairs [1,313-316]. [Pg.269]

Cluster analysis (Everitt et al. 2001) is a tool for grouping various objects on the basis of their distance in a multidimensional space. In chemistry, cluster analysis is used for the interpretation of analytical results. For example, in food or drink samples, the concentrations of many chemicals are measured, and the question is which of the samples are similar on the basis of the analytical results. The first step is always the transformation of the raw measurement data into a distance matrix. The general features of a distance matrix are that the diagonal elements are zero (everything is at zero distance from itself), all matrix elements are non-negative (negative distance cannot be interpreted) and the matrix is symmetrical (to and from distances are identical). It is clear that the distance matrix defined by Eq. (8.29) fulfils these requirements. [Pg.328]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...