Distance measures problem

HCA is a common tool that is used to determine the natural grouping of objects, based on their multivariate responses [75]. In PAT, this method can be used to determine natural groupings of samples or variables in a data set. Like the classification methods discussed above, HCA requires the specification of a space and a distance measure. However, unlike those methods, HCA does not involve the development of a classification rule, but rather a linkage rule, as discussed below. For a given problem, the selection of the space (e.g., original x variable space, PC score space) and distance measure (e.g.. Euclidean, Mahalanobis) depends on the specific information that the user wants to extract. For example, for a spectral data set, one can choose PC score space with Mahalanobis distance measure to better reflect separation that originates from both strong and weak spectral effects. [Pg.405]

Many workers have previously devoted attention to the contribution of errors in measurements to the problem of building trees from distances, as summarized in the contribution by Marshall.25 By contrast, we have not been concerned with this relatively minor source of error. Instead, our concern has been with a bigger source of error, the calibration error, which reflects the uncertainty in the relationship of distance measured with an indirect method to that measured with a more direct method. This aspect has not been addressed by previous workers. To illustrate the magnitude that this problem can assume, we note that DNA hybridization led to an estimate of 3.3% sequence divergence between the mitochondrial DNAs of two flies (Drosophila yakuba and D. teissieri).26 Restriction analysis done on the whole mitochondrial genome, in contrast, led to an estimate of 0.22%.27 Sequencing of one-seventh of these fly mitochondrial DNAs produced an estimate of 0.3%,27 similar to the latter indirect estimate but in dramatic contrast to the estimate from hybridization. [Pg.152]

Problem 4.8 Classification of Pottery from Pre-classical Sites in Italy, Using Euclidean and Mahalanobis Distance Measures... [Pg.261]

The most important distance measures are the Euclidean distance and the average Euclidean distance. However, depending on the considered problem, other distance measures can be legitimately used. Some are listed below, where p is the number of real variables and Xsj and Xtj are the values of the yth element (variable, attribute, descriptor) representing s and t objects, respectively Xj and x, are the descriptor p-dimensional vectors of the two objects. If the objects are chemical compounds, Xij are the values of the molecular descriptors chosen for their representation, such as topological indices, - physico-chemical properties, - molecular fingerprints. [Pg.396]

Perhaps a more useful means of quantifying structural data is to use a similarity measurement. These are reviewed by Ludwig and Reynolds (1988) and form the basis of multivariate clustering and ordination. Similarity measures can compare the presence of species in two sites or compare a site to a predetermined set of species derived from historical data or as an artificial set comprised of measurement endpoints from the problem formulation of an ecological risk assessment. The simplest similarity measures are binary in nature, but others can accommodate the number of individuals in each set. Related to similarity measurements are distance metrics. Distance measurements, such as Euclidean distance, have the drawbacks of being sensitive to outliers, scale, transformations, and magnitudes. Distance measures form the basis of many classification and clustering techniques. [Pg.324]

Clustering problems can have numerous formulations depending on the choices for data structure, similarity/distance measure, and internal clustering criterion. This section first describes a very general formulation, then it details special cases that corresponds to two popular classes of clustering algorithms partitional and hierarchical. [Pg.135]

There are some simple wave relationships that are useful to bear in mind, corollaries so to speak, that can often reduce otherwise imposing problems to trivial exercises. First, we must often define the relative phases at some point of two or more waves traveling through space in order to know how they combine. We can do this in a straightforward manner if we know the relative distance, measured in units of A., that the waves have traveled from their... [Pg.85]

For two-class problems (the most common ones), classification parameters can be defined using binary distance measures, based on the frequencies a, h, c, and d, which in this case may be interpreted as true positive (TP), false negative (FN), false positive (FP), and true negative (TN), respectively. [Pg.144]

Todeschini, R., Consonni, V. and Pavan, M. (2004d) Distance measure between models a tool for model similarity/diversity analysis, in Designing Drugs and Crop Protectants Processes, Problems and Solutions (eds M. Ford, D.J. Livingstone, J.C. Dearden and H. van de Waterbeemd), Blackwell, Oxford, UK, pp. 467-469. [Pg.1183]

Structure comparison methods are a way to compare three-dimensional structures. They are important for at least two reasons. First, they allow for inferring a similarity or distance measure to be used for the construction of structural classifications of proteins. Second, they can be used to assess the success of prediction procedures by measuring the deviation from a given standard-of-truth, usually given via the experimentally determined native protein structure. Formally, the problem of structure superposition is given as two sets of points in 3D space each connected as a linear chain. The objective is to provide a maximum number of point pairs, one from each of the two sets such that an optimal translation and rotation of one of the point sets (structural superposition) minimizes the rms (root mean square deviation) between the matched points. Obviously, there are two contrary criteria to be optimized the rms to be minimized and the number of matched residues to be maximized. Clearly, a smaller number of residue pairs can be superposed with a smaller rms and, clearly, a larger number of equivalent residues with a certain rms is more indicative of significant overall structural similarity. [Pg.263]

The major technical problems with the GC mode are the need for an independent distance measurement and the absence of a well-defined mass transport rate in some situations. These are linked in the sense that when mass transport is well defined, the distance dependence of the tip signal can be used to calibrate the tip-to-surface separation as well as to quantify the flux of analyte. One method of overcoming these problems is therefore the use of a microfabricated substrate where the enzymatic reaction is confined to a small disk-shaped region. The concentration profile of the products of the... [Pg.459]

In general then, NOE measurements will tend to underestimate rather then overestimate distances in such cases, which can be problematic for structure calculations. These problems are most severe in the case of small molecules, where extensive conformational averaging is to be expected, and detailed structure calculations based on quantitative distance measurements for flexible... [Pg.303]

One of the major problems facing distance determination by pulsed EPR on spin labeled membrane proteins is the short relaxation time (generally around 1 ps). Solvent deuteration is routinely used (10-20 vol% deuterated glycerol as cryopro-tectant for membrane-embedded or detergent-solubilized samples, or deuterated buffer) to slow down the nitroxide relaxation, thus extending the range of distance measurement and sensitivity. However, the problems coimected to the too fast relaxation time called for a systematic analysis of all factors modifying the... [Pg.143]

In returning to the example problem, a geometric interpretation is presented first. Figure 1 is a plot of steam consumption vs. degree-days from Table 3. The regression coefficient, b is represented by the slope of the least-squares line. It is the tangent of the angle 6. The e s whose squares are to be summed to a minimum are distances measured in the Y direction from the points to the line. They are illustrated by typical distances, and... [Pg.2269]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...