Distance metric

In a typical appHcation of hierarchical cluster analysis, measurements are made on the samples and used to calculate interpoint distances using an appropriate distance metric. The general distance, is given by... [Pg.422]

In the original kNN method, an unknown object (molecule) is classified according to the majority of the class memberships of its K nearest neighbors in the training set (Fig. 13.4). The nearness is measured by an appropriate distance metric (a molecular similarity measure as applied to the classification of molecular structures). It is implemented simply as follows ... [Pg.314]

Similarity of solubility parameters can be measured using a Enclidean distance metric. For two molecnles A and B, the distance between their solnbility parameters is given by ... [Pg.281]

Here, L2 represents the chemical property space corresponding to the VCL, d(x, rf) represents an appropriate distance metric between the lead compound ry and the compound , p xf is the relative weight that can be attached to compound (if all compounds are of equal importance, then the weights p(xy) = for each i), and Mis typically much smaller than N. That is, this problem seeks a subset of M lead compounds ry in a descriptor space such that the average distance of a compound from its nearest lead compound is minimized. Alternatively, this problem can also be formulated as finding an optimal partition of the descriptor space co into M clusters Ay and assigning to each cluster a lead compound rj such that the following cost function is minimized ... [Pg.73]

Of all local motions, v(r), of an interface that pass the same amount of volume from one side to the other, the motion that is normal to the interface with magnitude proportional to the weighted mean curvature, v f) oc /c7n, increases the interfacial energy the fastest. However, fastest depends on how distance is measured. How this distance metric alters the variational principles that generate the kinetic equations is discussed elsewhere [14]. [Pg.611]

FIGURE 1 Multivariate approaches for omics data integration. (A) The RV coefficient is a correlation measure between datasets that can be used as distance metric. (B) The 02PLS method dissects gene expression and metabolomics datasets for shared and data type-specific variation. (C) The N-way approach accommodates experimental factors in a multidimensional block. Tucker3 is used to study intradataset covariation and NPLS analyzes between-block covariation. Panel (B) Reproduced from Bylesjo et al. (23). Panel (C) Reproduced from Conesa et al. (24). [Pg.449]

Table 12 Distance Metrics Used in Unsupervised Pattern Recognition (i.e., Clustering)...

Parent distance metric where a user chooses a value for k to calculate the distance between a and b... [Pg.543]

The Euclidean distance is the best choice for a distance metric in hierarchical clustering because interpoint distances between the samples can be computed directly (see Figure 9.6). However, there is a problem with using the Euclidean distance, which arises from inadvertent weighting of the variables in the analysis that occurs... [Pg.349]

The Mahalanobis distance metric (13) is designed to determine an in-class or out-of-class value from the spectrum of an unknown sample. The metric is trained using spectra of samples known to be in the target class. In its simplest rendition, the Mahalanobis distance is the Euclidean distance of a target spectrum from the average spectrum of the training set, i.e.,... [Pg.288]

In more advanced formulation of the Mahalanobis distance metric (13), the difference spectrum diff between the target and the average is calculated and stored as a vector with /coordinates corresponding to the/frequencies. The Mahalanobis distance is given by the equation... [Pg.288]

In the preceding description of the Mahalanobis distance, the number of coordinates in the distance metric is equal to the number of spectral frequencies. As discussed earlier in the section on principal component analysis, the intensities at many frequencies are dependent, and by using the full spectrum, we fit the noise in addition to the real information. In recent years, Mahalanobis distance has been defined with PCA or PLS scores instead of the spectral frequencies because these techniques eliminate or at least reduce most of the overfitting problem. The overall application of the Mahalanobis distance metric is the same except that the rt intensity values are replaced by the scores from PCA or PLS. An example of a Mahalanobis distance calculation on a set of Raman spectra for 25 carbohydrates is shown in Fig. 5-11. The 25 spectra were first subjected to PCA, and it was found that the first three principal components could account for most of the variance in the spectra. It was first assumed that all 25 spectra belonged to the same class because they were all carbohydrates. However, as shown in the three-dimensional plot in Fig. 5-11, the spectra can be clearly divided into three separate classes, with two of the spectra almost equal distance from each of the three classes. Most of the components in the upper left class in the two-dimensional plot were sugars however, some sugars were found in the other two classes. For unknowns, scores have to be calculated from the principal components and processed in the same way as the spectral intensities. [Pg.289]

Figure 7.1 Authentication of monovarietal virgin olive oils results of applying clustering analysis to volatile compounds. The Mahattan (city block) distance metric and Ward s amalgamation methods were used in (a) the Squared Euclidean distance and (b) complete linkage amalgamation methods. Note A, cv. Arbequina (6) C, cv. Coratina (6) K, cv. Koroneiki (6) P, cv. Picual (6) 1, harvest 1991 2, harvest 1992. Olives were harvested at three levels of maturity (unripe, normal, overripe) (source SEXIA Group-Instituto de la Grasa, Seville, Spain).

Now consider d(a, b) to be a generic distance metric of which Tanimoto, Euclidean, and Mahalanobis are three cases. Then, the distance between molecule a and the set of molecules B is defined as follows,... [Pg.82]

Using aU available structural conformations, the pharmacophores represented in a single molecule can also be encoded in much the same way as the fingerprints described above for dissimilarity searching (Figure 5) using distance metrics. ... [Pg.122]

Two popular distance metrics are the Manhattan distance metric (r = 1), which represents the sum of the absolute descriptor differences, and the ultrametric (r = °o), which represents the maximum absolute descriptor difference. Both the Manhattan and Euclidean distance metrics obey all four metric properties. [Pg.138]

Other metrics include the Hamming distance metric, given by (6). XOR is the bitwise exclusive or operation (a bit in the result is set if the corresponding bits in the two operands are different), and N the number of bits in each set. The Dice coefficient is defined by (7). [Pg.139]

Figure 1 SPE maps (see Section 4) showing subset selections using the nearest-neighbor distance metric for a diverse selection (top), and a similar selection (middle). A random selection is shown (bottom) for comparison. AH selections are 100 compounds from a 10,000-member library...

Perhaps a more useful means of quantifying structural data is to use a similarity measurement. These are reviewed by Ludwig and Reynolds (1988) and form the basis of multivariate clustering and ordination. Similarity measures can compare the presence of species in two sites or compare a site to a predetermined set of species derived from historical data or as an artificial set comprised of measurement endpoints from the problem formulation of an ecological risk assessment. The simplest similarity measures are binary in nature, but others can accommodate the number of individuals in each set. Related to similarity measurements are distance metrics. Distance measurements, such as Euclidean distance, have the drawbacks of being sensitive to outliers, scale, transformations, and magnitudes. Distance measures form the basis of many classification and clustering techniques. [Pg.324]

See also in sourсe #XX -- [ Pg.150 ]

See also in sourсe #XX -- [ Pg.51 , Pg.58 ]

See also in sourсe #XX -- [ Pg.143 , Pg.145 , Pg.148 , Pg.153 , Pg.175 , Pg.179 , Pg.184 , Pg.198 , Pg.242 , Pg.260 , Pg.328 , Pg.329 , Pg.330 , Pg.333 , Pg.337 , Pg.344 , Pg.352 ]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...