Euclidean distance measure problem

Spectral similarity search is a routine method for identification of compounds, and is similar to fc-NN classification. For molecular spectra (IR, MS, NMR), more complicated, problem-specific similarity measures are used than criteria based on the Euclidean distance (Davies 2003 Robien 2003 Thiele and Salzer 2003). If the unknown is contained in the used data base (spectral library), identification is often possible for compounds not present in the data base, k-NN classification may give hints to which compound classes the unknown belongs. [Pg.231]

HCA is a common tool that is used to determine the natural grouping of objects, based on their multivariate responses [75]. In PAT, this method can be used to determine natural groupings of samples or variables in a data set. Like the classification methods discussed above, HCA requires the specification of a space and a distance measure. However, unlike those methods, HCA does not involve the development of a classification rule, but rather a linkage rule, as discussed below. For a given problem, the selection of the space (e.g., original x variable space, PC score space) and distance measure (e.g.. Euclidean, Mahalanobis) depends on the specific information that the user wants to extract. For example, for a spectral data set, one can choose PC score space with Mahalanobis distance measure to better reflect separation that originates from both strong and weak spectral effects. [Pg.405]

Problem 4.8 Classification of Pottery from Pre-classical Sites in Italy, Using Euclidean and Mahalanobis Distance Measures... [Pg.261]

The manner in which sample-to-sample resemblance is defined is a key difference between the various hierarchical clustering techniques. Sample analyses may be similar to one another in a variety of ways and reflect interest in drawing attention to different underlying processes or properties. The selection of an appropriate measure of similarity is dependent, therefore, on the objectives of the research as set forth in the problem definition. Examples of different similarity measures or coefficients that have been used in compositional studies are average Euclidean distance, correlation, and cosine. Many others that could be applied are discussed in the literature dealing with cluster analysis (15, 18, 19, 36, 37). [Pg.70]

The most important distance measures are the Euclidean distance and the average Euclidean distance. However, depending on the considered problem, other distance measures can be legitimately used. Some are listed below, where p is the number of real variables and Xsj and Xtj are the values of the yth element (variable, attribute, descriptor) representing s and t objects, respectively Xj and x, are the descriptor p-dimensional vectors of the two objects. If the objects are chemical compounds, Xij are the values of the molecular descriptors chosen for their representation, such as topological indices, - physico-chemical properties, - molecular fingerprints. [Pg.396]

Perhaps a more useful means of quantifying structural data is to use a similarity measurement. These are reviewed by Ludwig and Reynolds (1988) and form the basis of multivariate clustering and ordination. Similarity measures can compare the presence of species in two sites or compare a site to a predetermined set of species derived from historical data or as an artificial set comprised of measurement endpoints from the problem formulation of an ecological risk assessment. The simplest similarity measures are binary in nature, but others can accommodate the number of individuals in each set. Related to similarity measurements are distance metrics. Distance measurements, such as Euclidean distance, have the drawbacks of being sensitive to outliers, scale, transformations, and magnitudes. Distance measures form the basis of many classification and clustering techniques. [Pg.324]

The spacing is a measure of the relative distance between consecutive (nearest neighbor) solutions in the non-dominated set. The maximum spread is the length of the diagonal of the hyper-box formed by the extreme function values in the non-dominated set. For two-objective problems, this metric refers to the Euclidean distance between the two extreme solutions in the /-space. It is given by... [Pg.111]

MVU differs from other spectral techniques in that rather than constructing a feature matrix from measurable properties (i.e. covariance, Euclidean distance), it directly learns the feature matrix by solving a convex optimisation problem. Once the feature matrix has been learnt however, MVU fits in with other spectral techniques as the low-dimensional embedding is given as the top eigenvectors of Eq.(2.1). [Pg.14]

The previous discussion subtly shifted between molecular similarity and molecular properties. It is important to elucidate the relationship between the two. If each of the molecular properties can be treated as a separate dimension in a Euclidean property space, and dissimilarity can be equated with distance between property vectors, similarity/diversity problems can be solved using analytical geometry. A set of vectors (chemical structures) in property space can be converted to a matrix of pairwise dissimilarities simply by applying the Pythagorean theorem. This operation is like measuring the distances between all pairs of cities from their coordinates on a map. [Pg.78]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...