Interpoint distance

In a typical appHcation of hierarchical cluster analysis, measurements are made on the samples and used to calculate interpoint distances using an appropriate distance metric. The general distance, is given by... [Pg.422]

This technique functions by taking observed measures of similarity or dissimilarity between every pair of M objects, then finding a representation of the objects as points in Euclidean space so that the interpoint distances in some sense match the observed similarities or dissimilarities by means of weighting constants. [Pg.948]

Referringiiacfc to the main clusters, it can be seen that tlie samples within cluster 2 are connected by vertical lines with small distance values relative to the other clisers. This is an indication that samples within this duster are more each other (i.e.. the Interpoint distances are smaller) than are... [Pg.41]

It is of interest to study the relationship bem een the samples in the row space the distances betu-een samples are used to define similarities and differences. In mathematical terms, the goal of PCA. is to describe the interpoint distances (spread or variation) using as few axes or dimensions as possible. This is accomplished b - constructing PC axes that align with the data. [Pg.225]

The HCA technique examines the interpoint distances between the samples in a data set and represents that information in the form of a two-dimensional plot called a dendrogram. The HCA method is an excellent tool for preliminary data analysis. It is useful for examining data sets for expected or unexpected clusters, including the presence of outliers. It is informative to examine the dendrogram in conjunction with PCA because they give similar information in different forms. [Pg.239]

A better approach to validating the prediction is to compare the distance from the unknown sample to the predicted class relative to an expected distance for known members of that class. From the last example, the distance from Z to it.s nearest neighbor in class B is much larger than the distances between the samples within that class. This can be flagged by calculating a measure of expected interpoint distances for samples in each class. These distances are then compared to the distance of the unknown to the different classes to validate class membership. One algorithmic approach is discussed below illustrating the classification of unknown Z with respect to class B. [Pg.241]

The two unsupervised methods examined are HCA and PCA. HCA calculates the interpoint distances between all of the rows and represents that information in the form of a two-dimensional plot called a dendrogram. PCA calculates a new axis system that maximally describes the variation in the data set. Our recommendation is to use both of the methods whe " they are available. HCA gives a broader view of the data and PCA can be used to further investigate samples and dusters that are highlighted in HCA. [Pg.274]

DISCO considers three-dimensional conformations of compounds not as coordinates but as sets of interpoint distances, an approach similar to a distance geometry conformational search. Points are calculated between the coordinates of heavy atoms labeled with interaction functions such as HBD, HBA or hydrophobes. One atom can carry more than one label. The atom types are considered as far as they determine which interaction type the respective atom would be engaged in. The points of the hypothetical locations of the interaction counterparts in the receptor macromolecule also participate in the distance matrix. These are calculated from the idealized projections of the lone pairs of participating heavy atoms or H-bond forming hydrogens. The hydrophobic points are handled in a way that the hydrophobic matches are limited to, e.g., only one atom in a hydrophobic chain and there is a differentiation between aliphatic and aromatic hydrophobes. A minimum constraint on pharmacophore point of a certain type can be set, e.g. if a certain feature is known to be required for activity [53, 54]. [Pg.26]

The Euclidean distance is the best choice for a distance metric in hierarchical clustering because interpoint distances between the samples can be computed directly (see Figure 9.6). However, there is a problem with using the Euclidean distance, which arises from inadvertent weighting of the variables in the analysis that occurs... [Pg.349]

The earliest methods for generating Cartesian coordinates from distance information were reliable only in the case of complete and precise distances.19 A more robust method was proposed by Crippen,20 subsequently revised and comprehensively described7-21 and dubbed the embed algorithm.22 The method can be understood by first considering the case where every interpoint distance is known before introducing the approximations necessary to handle real NMR data. First, a matrix D can be constructed containing the distance between every pair of points. Next, the distance from every point to the center of mass, indicated by the subscript O, can be calculated from... [Pg.147]

Like PCA, NLM or multi-dimensional scaling, is a method for visualizing relationships between objects, which in medicinal chemistry context often are compounds, but could equally be a number of measured activities." It is an iterative minimization procedure which attempts to preserve interpoint distances in multi-dimensional space in a 2D or 3D representation. Unlike PCA, however, the axes are not orthogonal and are not clearly interpretable with respect to the original variables. However, it can be valuable in cases where the first two or three PCs are influenced by outliers (extreme data points) or only explain a small percentage of the original data. NLM has been used to cluster aromatic and aliphatic substituents," " for example. [Pg.501]

In the framework of the DG method, each ligand molecule is represented as a collection of points in space, each corresponding to an atom or group of atoms, and the conformation of the molecule is described in terms of Euclidean distances between points. The matrix containing Euclidean distances between all possible pairs of points is the geometry matrix of the molecule when each point corresponds to a single atom. To account for molecular flexibility, a matrix of lower bounds on the interpoint distances and a matrix of upper bounds are also defined fixed spatial distances are represented by equal values in these matrices. [Pg.208]

Another quantitative measure of the patterns spatial characteristics is the length distribution function, p(r). The function indicates the distribution of all the interpoint distances, r j, in the pattern. Mathematically speaking, this new function p(r) is defined by... [Pg.204]

In a computer program, I use a Monte Carlo approach to compute p(r), because actually computing all the interpoint distances is computationally expensive. Instead, one can randomly select pairs of points in the stmcture and produce the final p(r) curve after 500,000 pairs are cataloged. In the C programming language, the Monte Carlo process looks like ... [Pg.204]

Interpoint distance of registered spectrum is very important parameter. Absorption spectrum obtained by spectrophotometer possess a digital structure which is the result of construction of a monochromator and a manner of registration. Spectra registered with large interpoint distance are averaged, flat without many spectral details. [Pg.265]

The most popular nonlinear display method was proposed by Sammon C401l and is called nonlinear mapping (NLH). The technique seeks to conserve interpoint distances. Let be... [Pg.100]

Minimization of this error function results in a two-dimensional display of the data set in which the distances between points are such that they best represent the distances between points in A -space. The significance of the power term, p, will be discussed later in this section it serves to alter the emphasis on the relative importance of large versus small iV-space interpoint distances. [Pg.82]

Can change the emphasis on the preservation of interpoint distances Can view multhariate data in two (or three) dimensions... [Pg.88]

Set up pharmacophore parameters interpoint distance tolerances, number of points required in pharmacophore, minimum number of structures that must match a given superposition etc. [Pg.89]

Several computational methods can identify conformations that can be superimposed with a reasonable overlap of user-supplied pharmacophoric points in each molecule. In the active analog approach, several potent, structurally diverse ligands are submitted to a systematic conformational search while pharmacophore interpoint distances are recorded for each energetically allowed conformation. Intersection of the distance maps of these molecules identifies sets of conformations with common distances between the pharmacophoric points. [Pg.187]

We can convert this into a scalar-valued expression by taking the inner product with its reverse. The resulting Gramian can be Laplace expanded to a polynomial in the inner products of the four vectors, which in turn can be converted into a polynomial in the squared interpoint distances as above. The vanishing of this polynomial is our first syzygy among the squared distances. [Pg.726]

It is this similarity matrix that distinguishes various spectral dimensionality reduction techniques. For example, F could measure the covariance of X as in Principal Components Analysis [1], or the geodesic interpoint distances as in Isomap [2]. [Pg.8]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...