Standardized Euclidean distance

It can be shown that the standardized Euclidean distance is the Euclidean distance of the autoscaled values of X (see further Section 30.2.2.3). One should also note that in this context the standard deviation is obtained by dividing by n, instead of... [Pg.61]

Although Euclidean and Mahalanobis distances are the ones most commonly used in analytical chemistry applications, there are other distance measures that might be more appropriate for specific applications. For example, there are standardized Euclidean distances, where each of the dimensions is inversely weighted by the standard deviation of that dimension in the calibration data (standard deviation-standardized), or the range of that dimension in the calibration data (range-standardized). [Pg.288]

For any type of substrate, being it either an ordered or a disordered one, it is convenient to numerically measure distances, say, end-to-end SAWs distances, in a metric which takes into account the topology of the structural connecting paths. This so-called topological or chemical metric (where the space in which it is defined is referred to as f-space) is the natural metric of the structure, in which the distance between two substrate points equals the length of the shortest path on the structure connecting them. An example illustrating the difference between the f-distance and the standard Euclidean distance (in r-space) for a disordered structure is shown in Fig. 1. [Pg.198]

Euclidean distances (ordinary or standardized) are used very often for clustering purposes. This is not the case for Mahalanobis distance. An application of Mahalanobis distances can be found in Ref. [16]. [Pg.62]

Here y is the average and cr is the standard deviation of the Euclidean distances of the k nearest neighbors of each compound in the training set in the chemical descriptor space, and Z is an empirical parameter to control the significance level, with the default value of 0.5. If the distance from an external compound to its nearest neighbor in the training set is above Dc, we label its prediction unreliable. [Pg.443]

The problem lies in the model. The Euclidean distance calculation is inappropriate for use with correlated variables because it is based only on pairwise comparisons, without regard to the elongation of data point swarms along particular axes. In effect, Euclidean distance imposes a spherical constraint on the data set (18). When correlation has been removed from the data, (by derivation of standardized characteristic vectors) Euclidean distance and average-linkage cluster analysis return the three groups. [Pg.66]

We define D = (dij) to be the distance matrix. Since D is Euclidean, its elements automatically satisfy the three standard tests for a metric space identity, S3unmetry, and the triangle inequality. In the case of perfect correlation between two variables, the triangle inequality is violated, a situation that may be remedied (without dire consequences) by adding a small value, e = 1 x 10 , to the distance between them. The particular metric defined by eq. (7.3) is a measure of independence between two variables. If the correlation between two variables is small, then the distance between them is large. [Pg.73]

Calculations of means and standard deviations of Eij according to Eq. (5.2) were performed by a Monte Carlo simulation program. For these simulations, the means and standard deviations of Sy and 5 y were initially calculated from the multiple spectral descriptors of each cluster. Further, both the mean of each cluster and its standard deviation (assuming the normally distributed error) were entered into the program. Finally, 10,000 iterations were performed to calculate the mean and standard deviation for a given Eij. From a variety of available approaches for cluster analysis, we selected analysis of Euclidean distances because it provides the information about both the distance between clusters and the spread of each cluster. Further, although it is possible to perform calculations of Euchdean distances on raw spectra, we performed the PCA first, to reduce the noise in the data. [Pg.106]

Figure 5.10 provides plots of the calculations of Euclidean distance between the spectral descriptors of two representative 96-microreactor arrays. The largest Euclidean distances indicate the best conditions for material differentiation. The best inter-array conditions were found to be a 6-L/min flow rate of inert gas and 20-min dwell time. The best intra-array conditions were a combination of the catalyst concentration of 2 to 4 equivalents and ratio AIB of 1.2 to 1.4 (Figure 5.10B). Results for the reaction variability for these representative microreactor arrays are presented in Figure 5.11.The smallest relative standard deviation (RSD) of spectral features indicates the best reaction reproducibihty. This figure illustrates that...

The cluster method Between-groups linkage, interval Euclidean distance. Transform Values-standardize Range 0-1. [Pg.286]

Since nearest neighbor methods are based on similarity measured by some distance metric then variable scaling and the units used to characterize the data can influence results. Variables with the largest amount of scatter (greatest variance) will contribute most strongly to the Euclidean distance and in practice it may be advisable to standardize variables before performing classification analysis. [Pg.586]

Puchert et al. [20] proposed an extension of the use of PCA with the principal component score distance analysis approach. In their study, they calculated the Euclidean distance between two successive spectra considering the first three factors. A moving block standard deviation was then applied onto the distance terms. It is an extension of fhe work of Storme-Paris ef al., who considered each principal componenf independently [21]. In addition, Puchert et al. used a SIMCA-like approach by developing a successive PCA model with only the most stable spectra in calibration. It allowed them to obtain a better resolution when projecting new samples. [Pg.44]

Specifically, we have used Maple 17 software to calculate the mean and standard deviation exactly for multinomial distributions with 3 to 10 categories. In varying the underlying probabilities we used the standard deviation of the probabilities, as this is related to the Euclidean distance of the vector of probabilities from the vector of equal probabilities. Our standard deviations range from 0 to 0.5. We give two figures below to illustrate the analysis. The overall conclusions made are based on the more comprehensive investigation. [Pg.1898]

Calculating Euclidean distance. This is a standard straight-line distance between two sets of values (i.e. weights) calculated as follows ... [Pg.58]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...