Euclidean distance cluster analysis

FIGURE 3.24 Dendrogram of fatty acid concentration data from mummies and reference samples. Hierarchical cluster analysis (complete linkage) with Euclidean distances has been applied. [Pg.109]

Distance measures were already discussed in Section 2.4. The most widely used distance measure for cluster analysis is the Euclidean distance. The Manhattan distance would be less dominated by far outlying objects since it is based on absolute rather than squared differences. The Minkowski distance is a generalization of both measures, and it allows adjusting the power of the distances along the coordinates. All these distance measures are not scale invariant. This means that variables with higher scale will have more influence to the distance measure than variables with smaller scale. If this effect is not wanted, the variables need to be scaled to equal variance. [Pg.268]

A cluster analysis of the amino acid structures by PCA of the A -matrix is shown in Figure 6.5a note that PCA optimally represents the Euclidean distances. The score plot for the first two principal components (preserving 27.1% and 20.5% of the total variance) shows some clustering of similar structures. Four structure pairs have identical variables 1 (Ala) and 8 (Gly), 5 (Cys) and 13 (Met), 10 (He) and 11 (Leu), and 16 (Ser) and 17 (Thr). Objects with identical variables of course have identical scores, but for a better visibility the pairs have been artificially... [Pg.271]

Initially cluster analysis defines a measure of simUarity given by a distance or a correlation or the information content Distance can be measured as euclidean distance or Mahalanobis distance or Minkowski distance. Objects separated by a short distance are recognized as very similar, while objects separated by a great distance are dissimilar. The overall result of cluster analysis is reported as a dendrogram of the similarities obtained by many procedures. [Pg.130]

Fig. 3.20 Hierarchical cluster analysis with euclidean distance of the autoscaled variables applied to voltammetric parameters recorded for mineral and pigment specimens studied here. From data in Table 3.2 (a) including greenish natural umber and (b) excluding this pigment [139]...

Figure 4. Dendrogram showing sub-division of main chemical group according to cluster analysis (Euclidean distance Ward s method).

Etowah Mound, Ohio, colorants in archaeological textiles, 44-77 Euclidean distance for chemical grouping by cluster analysis, 407 Eygin Gol Necropolis, Northern Mongolia, biological relations from burial context, determination, 80-81... [Pg.561]

Prior to analysis, the Raman shift axes of the spectra were calibrated using the Raman spectrum of 4-acetamidophenol. Pretreatment of the raw spectra, such as vector normalization and calculation of derivatives were done using Matlab (The Mathworks, Inc.) or OPUS (Bruker) software. OPUS NT software (Bruker, Ettlingen, Germany) was used to perform the HCA. The first derivatives of the spectra were used over the range from 380 cm-1 to 1700 cm-1. To calculate the distance matrix, Euclidean distances were used and for clustering, Ward s algorithm was applied [59]. [Pg.80]

Fig. 4.3. Dendrogram resulting from cluster analysis containing 91 spectra from 15 tree species (see also Table 4.2). Cluster analysis was done on first derivatives over the spectral range 380 cm-1 to 1700 cm-1). The distance matrix was calculated using Euclidean distance and Ward s algorithm was applied for clustering. Spectra were measured after decomposition of carotenoid molecules with 633 nm irradiation. For example, spectra of each species are shown in Fig. 4.1. Reprinted with permission from [52]...

The Euclidean distance is the best choice for a distance metric in hierarchical clustering because interpoint distances between the samples can be computed directly (see Figure 9.6). However, there is a problem with using the Euclidean distance, which arises from inadvertent weighting of the variables in the analysis that occurs... [Pg.349]

FIGURE 9.6 Euclidean distance between two data points in a two-dimensional measurement space defined by the measurement variables x1 and x2. (Adapted from Massart, D.L. and Kaufman, L., The Interpretation of Analytical Chemical Data by the Use of Cluster Analysis, John Wiley Sons, New York, 1983. With permission.)... [Pg.350]

Figure 7.1 Authentication of monovarietal virgin olive oils results of applying clustering analysis to volatile compounds. The Mahattan (city block) distance metric and Ward s amalgamation methods were used in (a) the Squared Euclidean distance and (b) complete linkage amalgamation methods. Note A, cv. Arbequina (6) C, cv. Coratina (6) K, cv. Koroneiki (6) P, cv. Picual (6) 1, harvest 1991 2, harvest 1992. Olives were harvested at three levels of maturity (unripe, normal, overripe) (source SEXIA Group-Instituto de la Grasa, Seville, Spain).

Whereas the results in this section could probably be obtained fairly easily by inspecting the original data, numerical values of class membership have been obtained which can be converted into probabilities, assuming that the measurement error is normally distributed. In most real situations, there will be a much larger number of measurements, and discrimination (e.g. by spectroscopy) is not easy to visualise without further data analysis. Statistics such as %CC can readily be obtained from the data, and it is also possible to classify unknowns or validation samples as discussed in Section 4.5.1 by this means. Many chemometricians use the Mahalanobis distance as defined above, but the normal Euclidean distance or a wide range of other measures can also be employed, if justified by the data, just as in cluster analysis. [Pg.240]

By fc-mcans cluster analysis with Euclidean distance in the segmental Q-coordinates, we divided the structure ensemble of the MD unfolding trajectories into nine clusters [25]. The clustering was performed using all data obtained for the authentic and recombinant proteins, and the clusters were numbered in the order of the distance from the native structure. Figure 2.10(c)-(f) shows protein structures in four representative clusters (Clusters 1, 4, 5, and 9), in which Cluster 1 is almost identical to the native structure with all of the 17 Q-coordinates close to unity, whereas Cluster 9, which lost 84% of its native contacts, represents the unfolded state. [Pg.29]

The problem lies in the model. The Euclidean distance calculation is inappropriate for use with correlated variables because it is based only on pairwise comparisons, without regard to the elongation of data point swarms along particular axes. In effect, Euclidean distance imposes a spherical constraint on the data set (18). When correlation has been removed from the data, (by derivation of standardized characteristic vectors) Euclidean distance and average-linkage cluster analysis return the three groups. [Pg.66]

The manner in which sample-to-sample resemblance is defined is a key difference between the various hierarchical clustering techniques. Sample analyses may be similar to one another in a variety of ways and reflect interest in drawing attention to different underlying processes or properties. The selection of an appropriate measure of similarity is dependent, therefore, on the objectives of the research as set forth in the problem definition. Examples of different similarity measures or coefficients that have been used in compositional studies are average Euclidean distance, correlation, and cosine. Many others that could be applied are discussed in the literature dealing with cluster analysis (15, 18, 19, 36, 37). [Pg.70]

Before proceeding with a more detailed examination of clustering techniques, we can now compare correlation and distance metrics as suitable measures of similarity for cluster analysis. A simple example serves to illustrate the main points. In Table 4, three objects (A, B, and C) are characterized by five variates. The correlation matrix and Euclidean distance matrix are given in Tables 5 and... [Pg.101]

Table 7 A simple bivariate data set for cluster analysis ip), from Zupan, ° and the corresponding Euclidean distance matrix, (b)...

Other classical unsupervised cluster analysis methods rely on using mathematical indicators, such as distances, to quantify the similarity among pixel spectra. Thus, each pixel can be viewed as a point in the space of original wavenumbers or on other spaces, for example PC space. The coordinates of a pixel can be the spectral readings at the different wavenumbers (in the original image space) or the scores (in the PC space). Similar pixels should be close in the reference space and, therefore, distance measurements, such as Euclidean distance ( ), can be used to assess this proximity ... [Pg.81]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...