Classification, distance measurement

Specifying a distance measure Once the classification space has been defined, it is necessary to define the distance in the space that will be used to assess the proximity of samples in the space. The most straightforward distance that can be used for this purpose is the Euclidean distance between two vectors which is defined as ... [Pg.390]

The KNN method [77] is probably the simplest classification method to understand. Once the model space and distance measure are defined, its classification rule involves rather simple logic ... [Pg.393]

HCA is a common tool that is used to determine the natural grouping of objects, based on their multivariate responses [75]. In PAT, this method can be used to determine natural groupings of samples or variables in a data set. Like the classification methods discussed above, HCA requires the specification of a space and a distance measure. However, unlike those methods, HCA does not involve the development of a classification rule, but rather a linkage rule, as discussed below. For a given problem, the selection of the space (e.g., original x variable space, PC score space) and distance measure (e.g.. Euclidean, Mahalanobis) depends on the specific information that the user wants to extract. For example, for a spectral data set, one can choose PC score space with Mahalanobis distance measure to better reflect separation that originates from both strong and weak spectral effects. [Pg.405]

Goodness Value Plot (Model and Sample Diagnostic) In prediction, a unanimous classification does not guarantee that an unknown is close to the samples in the predicted class even if the classes were found to be well separated (refer back to Figure 4.44). Therefore, the goodness value in Equation 4.4 is used to evaluate the quality of the classification using a relative distance measure. The approach for validating the prediction is to evaluate the distance of the unknown to the predicted class relative to an internal measure of how diffu.se the samples are in that chiss. [Pg.243]

Once the classification space is defined and a distance measure selected, a classification rule can be developed. At this time, the calibration data, which contain analytical profiles for samples of known class, are used to define the classification rule. Classification rules vary widely depending on the specific classification method chosen, but they essentially contain two components ... [Pg.289]

Once the classification space and the distance measure are defined, one must also define a linkage rule to be used in the HCA algorithm. A linkage rule refers to the specific means by which the distance between different clusters is calculated. Some examples of these are provided below ... [Pg.307]

Problem 4.8 Classification of Pottery from Pre-classical Sites in Italy, Using Euclidean and Mahalanobis Distance Measures... [Pg.261]

Perhaps a more useful means of quantifying structural data is to use a similarity measurement. These are reviewed by Ludwig and Reynolds (1988) and form the basis of multivariate clustering and ordination. Similarity measures can compare the presence of species in two sites or compare a site to a predetermined set of species derived from historical data or as an artificial set comprised of measurement endpoints from the problem formulation of an ecological risk assessment. The simplest similarity measures are binary in nature, but others can accommodate the number of individuals in each set. Related to similarity measurements are distance metrics. Distance measurements, such as Euclidean distance, have the drawbacks of being sensitive to outliers, scale, transformations, and magnitudes. Distance measures form the basis of many classification and clustering techniques. [Pg.324]

For two-class problems (the most common ones), classification parameters can be defined using binary distance measures, based on the frequencies a, h, c, and d, which in this case may be interpreted as true positive (TP), false negative (FN), false positive (FP), and true negative (TN), respectively. [Pg.144]

Structure comparison methods are a way to compare three-dimensional structures. They are important for at least two reasons. First, they allow for inferring a similarity or distance measure to be used for the construction of structural classifications of proteins. Second, they can be used to assess the success of prediction procedures by measuring the deviation from a given standard-of-truth, usually given via the experimentally determined native protein structure. Formally, the problem of structure superposition is given as two sets of points in 3D space each connected as a linear chain. The objective is to provide a maximum number of point pairs, one from each of the two sets such that an optimal translation and rotation of one of the point sets (structural superposition) minimizes the rms (root mean square deviation) between the matched points. Obviously, there are two contrary criteria to be optimized the rms to be minimized and the number of matched residues to be maximized. Clearly, a smaller number of residue pairs can be superposed with a smaller rms and, clearly, a larger number of equivalent residues with a certain rms is more indicative of significant overall structural similarity. [Pg.263]

Fig. 2.1. Outline of the hybrid algorithm. The unstructured array of sensors is clustered using multi-dimensional scaling (MDS) with a mutual information (MI) based distance measure. Then Vector Quantization (VQ) is used to partition the sensor into correlated groups. Each such group provides input to one module of an associative memory layer. VQ is used again to provide each module unit with a specific receptive field, i.e. to become a feature detector. Finally, classification is done by means of BCPNN.

Penny D (1982) Toward a basis for classification the incompleteness of distance measures incompatibility analysis and phenetic classification. J Theoret Biol 96 129-142... [Pg.69]

This method can be used for muLticategory classifications application to binary encoded infrared spectra gave however poorer results than distance measurements to centres of gravity C3563. [Pg.23]

Classification by distance measurements to centres of gravity requires compact clusters which are often not present in chemical applications. Usually, other classification methods give better results. Nevertheless, this simple and evident method serves as a standard for comparisons with more sophisticated pattern recognition methods. [Pg.29]

The learning machine was tested for many chemical applications of pattern recognition, especially the interpretation of mass spectra was investigated in detail C128I. The results are usually somewhat better than for classification by distance measurement. Some caution is necessary with reported results in early publications because the restrictions to this method have not always been fully considered. The learning machine is advantageously used if the clusters are linearly separable and if the whole training set can be stored in the memory of the computer (the patterns are used more than once and in different sequencies). [Pg.41]

A KNN-classification is almost identical with the interpretation of spectra by a library search. In library search an unknown spectrum (pattern) is compared with all spectra of known compounds collected in a spectral library. A similarity criterion or a dissimilarity criterion (equivalent to a distance measurement) between two spectra must be defined. To find the most similar spectra in the library, this criterion must be calculated for each library spectrum. [Pg.69]

The following section gives examples of use cases that require a distance measure clustering and classification of impedance spectra, investigation of systematic changes of a system, modeling of impedance spectra and the comparison of impedance spectra on different frequency grids. [Pg.8]

Haddad et al. [8] developed a metric for odorant comparison based on a chemical space constracted from 1664 molecular descriptors. A refined version of this metric was devised following the elimination of redundant descriptors. The study included the comparison with models previously reported for nine datasets. The final, so-called multidimensional metric, based on Euclidean distances measured in a 32-descriptor space, was more efficient at classifying odorants cf. reference models previously reported. Thus, this study demonstrated the use of structural similarity for the classification of odors in multidimensional space. [Pg.105]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...