Distance measures Mahalanobis

The Mahalanobis distance measures the degree to which data fit the calibration model. It is defined as... [Pg.55]

HCA is a common tool that is used to determine the natural grouping of objects, based on their multivariate responses [75]. In PAT, this method can be used to determine natural groupings of samples or variables in a data set. Like the classification methods discussed above, HCA requires the specification of a space and a distance measure. However, unlike those methods, HCA does not involve the development of a classification rule, but rather a linkage rule, as discussed below. For a given problem, the selection of the space (e.g., original x variable space, PC score space) and distance measure (e.g.. Euclidean, Mahalanobis) depends on the specific information that the user wants to extract. For example, for a spectral data set, one can choose PC score space with Mahalanobis distance measure to better reflect separation that originates from both strong and weak spectral effects. [Pg.405]

Figure 12.24 Dendrograms obtained from hierarchical cluster analysis (HCA) of the NIR. spectra of the poly(urethane) foam samples (shown in Figure 12.16), (A) using the first two PCA scores as input, (B) using the first five PCA scores as input. In both cases, the Mahalanobis distance measure and the nearest-neighbor linkage rule were used.

Problem 4.8 Classification of Pottery from Pre-classical Sites in Italy, Using Euclidean and Mahalanobis Distance Measures... [Pg.261]

The mathematical basis behind the Mahalanobis distance measurement is really quite simple, but it is much easier to understand when it is explained graphically. Consider a set of spectra of different samples of the same material as shown in Fig. 4. [Pg.171]

Applications to Quantitative Analysis. There are two aspects to this application of Mahalanobis distance measures. The first is the direct application of calculating Mahalanobis distances to performing quantitative analysis, and it is discussed in Section 15.2.3.4.1. The second is the calculation of Mahalanobis distances as an adjunct to ordinary regression analysis, and various aspects of this application are discussed in Section 15.2.3.4.2 to Section 15.2.3.4.4. [Pg.324]

So far we have been considering leverage with respect to a point s Euclidean distance from an origin. But this is not the only measure of distance, nor is it necessarily the optimum measure of distance in this context. Consider the data set shown in Figure E4. Points C and D are located at approximately equal Euclidean distances from the centroid of the data set. However, while point C is clearly a typical member of the data set, point D may well be an outlier. It would be useful to have a measure of distance which relates more closely to the similarity/difference of a data point to/from a set of data points than simple Euclidean distance.The various Mahalanobis distances are one such family of such measures of distance. Thus, while the Euclidean distances of points C and D from the centroid of the data set are equal, the various Mahalanobis distances from the centroid of the data set are larger for point D than for point C. [Pg.185]

It is a distance measure that accounts for the covariance structure, here estimated by the sample covariance matrix C. Clearly, one could also take a robust covariance estimator. The Mahalanobis distance can also be computed from each observation to the data center, and the formula changes to... [Pg.60]

Points with a constant Euclidean distance from a reference point (like the center) are located on a hypersphere (in two dimensions on a circle) points with a constant Mahalanobis distance to the center are located on a hyperellipsoid (in two dimensions on an ellipse) that envelops the cluster of object points (Figure 2.11). That means the Mahalanobis distance depends on the direction. Mahalanobis distances are used in classification methods, by measuring the distances of an unknown object to prototypes (centers, centroids) of object classes (Chapter 5). Problematic with the Mahalanobis distance is the need of the inverse of the covariance matrix which cannot be calculated with highly correlating variables. A similar approach without this drawback is the classification method SIMCA based on PC A (Section 5.3.1, Brereton 2006 Eriksson et al. 2006). [Pg.60]

The distance between object points is considered as an inverse similarity of the objects. This similarity depends on the variables used and on the distance measure applied. The distances between the objects can be collected in a distance matrk. Most used is the euclidean distance, which is the commonly used distance, extended to more than two or three dimensions. Other distance measures (city block distance, correlation coefficient) can be applied of special importance is the mahalanobis distance which considers the spatial distribution of the object points (the correlation between the variables). Based on the Mahalanobis distance, multivariate outliers can be identified. The Mahalanobis distance is based on the covariance matrix of X this matrix plays a central role in multivariate data analysis and should be estimated by appropriate methods—mostly robust methods are adequate. [Pg.71]

In literature the above diagnostic measures are known under different names. Instead of the score distance from Equation 3.27 which measures the deviation of each observation within the PCA space, often the Hotelling T2-test is considered. Using this test a confidence boundary can be constructed and objects falling outside this boundary can be considered as outliers in the PCA space. It can be shown that this concept is analogous to the concept of the score distance. Moreover, the score distances are in fact Mahalanobis distances within the PCA space. This is easily... [Pg.94]

To address this issue, another type of distance measure, called the Mahalanobis distance, has been used. This distance is defined as ... [Pg.390]

Although Euclidean and Mahalanobis distances are the ones most commonly used in analytical chemistry applications, there are other distance measures that might be more appropriate for specific applications. These are discussed in reference [75]. [Pg.391]

In order to apply the SA protocol, one of the keys is to design a mathematical function that adequately measures the diversity of a subset of selected molecules. Because each molecule is represented by molecular descriptors, geometrically it is mapped to a point in a multidimensional space. The distance between two points, such as Euclidean distance, Tanimoto distance, and Mahalanobis distance, then measures the dissimilarity between any two molecules. Thus, the diversity function to be designed should be based on all pairwise distances between molecules in the subset. One of the functions is as follows ... [Pg.382]

To generate the dendrogram, HCA methods form clusters of samples based on their nearness in row space. A common approach is to initially treat every sample as a cluster and join closest clusters together. This process is repeated until only one cluster remains. Variations of HCA use different approaches to measure distances between clusters (e.g., single vs. centroid linking, Euclidean vs. Mahalanobis distance), fhe two methods discussed below use single and centroid linking with Euclidean distances. [Pg.216]

Initially cluster analysis defines a measure of simUarity given by a distance or a correlation or the information content Distance can be measured as euclidean distance or Mahalanobis distance or Minkowski distance. Objects separated by a short distance are recognized as very similar, while objects separated by a great distance are dissimilar. The overall result of cluster analysis is reported as a dendrogram of the similarities obtained by many procedures. [Pg.130]

Although Euclidean and Mahalanobis distances are the ones most commonly used in analytical chemistry applications, there are other distance measures that might be more appropriate for specific applications. For example, there are standardized Euclidean distances, where each of the dimensions is inversely weighted by the standard deviation of that dimension in the calibration data (standard deviation-standardized), or the range of that dimension in the calibration data (range-standardized). [Pg.288]

The following statistical measures are those most commonly found in software packages. First we mention HOTELLING S T2 for the 2-class case which is based on a generalized distance measure, the MAHALANOBIS distance D2, and from which a / -test can be derived ... [Pg.187]

The leverage, / , of the z th calibration sample is the z th diagonal of the hat matrix, H. The leverage is a measure of how far the z th calibration sample lies from the other n - 1 calibration samples in X-space. The matrix H is called the hat matrix because it is a projection matrix that projects the vector y into the space spanned by the X matrix, thus producing y-hat. Notice the similarity between leverage and the Mahalanobis distance described in Chapter 4. [Pg.128]

Mahalanobis distance. This method is popular with many chemometricians and, whilst superficially similar to the Euclidean distance, it takes into account that some variables may be correlated and so measure more or less the same properties. The distance between objects k and l is best defined in matrix terms by... [Pg.227]

An important difficulty with using this distance is that the number of objects much be significantly larger than the number of measurements. Consider the case of Mahalanobis distance being used to determine within group distances. If there are J measurements than there must be at least J +2 objects for diere to be any discrimination. If there... [Pg.237]

Whereas the results in this section could probably be obtained fairly easily by inspecting the original data, numerical values of class membership have been obtained which can be converted into probabilities, assuming that the measurement error is normally distributed. In most real situations, there will be a much larger number of measurements, and discrimination (e.g. by spectroscopy) is not easy to visualise without further data analysis. Statistics such as %CC can readily be obtained from the data, and it is also possible to classify unknowns or validation samples as discussed in Section 4.5.1 by this means. Many chemometricians use the Mahalanobis distance as defined above, but the normal Euclidean distance or a wide range of other measures can also be employed, if justified by the data, just as in cluster analysis. [Pg.240]

Instead of using raw data, it is possible to use the PCs of the data. This acts as a form of variable reduction, but also simplifies the distance measures, because the variance-covariance matrix will only contain nonzero elements on the diagonals. The expressions for Mahalanobis distance and linear discriminant functions simplify dramatically. [Pg.242]

Chapter 5, we will not give a numerical example in this chapter, but if the number of variables is fairly large, approaches such as Mahalanobis distance are not effective unless there is variable reduction either by first using PCA or simply selecting some of the measurements, so the approach discussed in this section is worth trying. [Pg.249]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...