Unsupervised Pattern Recognition Cluster Analysis

The chemist also wishes to relate samples in a similar manner. Can protein sequences from different animals be related and does this tell us about the molecular basis of evolution Can the chemical fingerprint of wines be related and does this tell us about the origins and taste of a particular wine Unsupervised pattern recognition employs a number of methods, primarily cluster analysis, to group different samples (or objects) using chemical measurements. [Pg.224]

The first step is to determine the similarity between objects. Table 4.16 consists of six objects, 1-6, and five measurements, A-E. What are the similarities between the objects Each object has a relationship to the remaining five objects. How can a numerical value of similarity be defined A similarity matrix can be obtained, in which the similarity between each pair of objects is calculated using a numerical indicator. Note that it is possible to preprocess data prior to calculation of a number of these measures (see Section 4.3.6). [Pg.224]

Four of the most popular ways of determining how similar objects are to each other are as follows. [Pg.225]

Correlation coefficient between samples. A correlation coefficient of 1 implies that samples have identical characteristics, which all objects have with themselves. Some workers use the square or absolute value of a correlation coefficient, and it depends on the precise physical interpretation as to whether negative correlation coefficients imply similarity or dissimilarity. In this text we assume that the more negative is the correlation coefficient, the less similar are the objects. The correlation matrix is presented in Table 4.17. Note that the top right-hand side is not presented as it is the same as the bottom left-hand side. The higher is the correlation coefficient, the more similar are the objects. [Pg.225]

Euclidean distance. The distance between two samples samples k and l is defined by [Pg.225]

Analytical results are often represented in a data table, e.g., a table of the fatty acid compositions of a set of olive oils. Such a table is called a two-way multivariate data table. Because some olive oils may originate from the same region and others from a different one, the complete table has to be studied as a whole instead as a collection of individual samples, i.e., the results of each sample are interpreted in the context of the results obtained for the other samples. For example, one may ask for natural groupings of the samples in clusters with a common property, namely a similar fatty acid composition. This is the objective of cluster analysis (Chapter 30), which is one of the techniques of unsupervised pattern recognition. The results of the clustering do not depend on the way the results have been arranged in the table, i.e., the order of the objects (rows) or the order of the fatty acids (columns). In fact, the order of the variables or objects has no particular meaning. [Pg.1]

Inhomogeneities in data can be studied by cluster analysis. By means of cluster analysis both structures of objects and variables can be found without any pre-information on type and number of groupings (unsupervised learning, unsupervised pattern recognition). [Pg.256]

The answers to these questions will usually be given by so-called unsupervised learning or unsupervised pattern recognition methods. These methods may also be called grouping methods or automatic classification methods because they search for classes of similar objects (see cluster analysis) or classes of similar features (see correlation analysis, principal components analysis, factor analysis). [Pg.16]

A more formal method of treating samples is unsupervised pattern recognition, mainly consisting of cluster analysis. Many methods have their origins in numerical taxonomy. [Pg.183]

Unsupervised pattern recognition differs from exploratory data analysis in diat the aim of the methods is to detect similarities, whereas using EDA diere is no particular prejudice as to whether or how many groups will be found. Cluster analysis is described in more detail in Section 4.4. [Pg.184]

In the narrow sense, cluster analysis should not be confused with classification methods, where unknown objects are assigned to existing classes. Cluster analyses belong to the methods of unsupervised learning or unsupervised pattern recognition. [Pg.172]

One possibility to speedup the search is preliminary sorting of the data sets. Here, the methods of unsupervised pattern recognition are used, for example, principal component and factor analysis, cluster analysis, or neural networks (cf. Sections 5.2 and 8.2). The unknown spectrum is then compared with every class separately. [Pg.288]

The unsupervised pattern recognition normally consists of cluster analysis. Using a couple of dozen descriptors of molecules, it is possible to see which activities of molecules are most similar and draw a picture of these similarities, called a dendrogram, in which more closely related activities are closer to each other... [Pg.191]

In the following sections we propose typical methods of unsupervised learning and pattern recognition, the aim of which is to detect patterns in chemical, physicochemical and biological data, rather than to make predictions of biological activity. These inductive methods are useful in generating hypotheses and models which are to be verified (or falsified) by statistical inference. Cluster analysis has... [Pg.397]

Two examples of unsupervised classical pattern recognition methods are hierarchical cluster analysis (HCA) and principal components analysis (PCA). Unsupervised methods attempt to discover natural clusters within data sets. Both HCA and PCA cluster data. [Pg.112]

The two pattern recognition techniques used In this work are among those usually used for unsupervised learning. The results will be examined for the clusters which arise from the analysis of the data. On the other hand, the number of classes and a rule for assigning compounds to each had already been determined by the requirements of the mixture analysis problem. One might suppose that a supervised approach would be more suitable. In our case, this Is not so because our aim Is not to develop a classifier. Instead, we wish to examine the data base of FTIR spectra and the metric to see If they are adequate to help solve a more difficult problem, that of analyzing complex mixtures by class. [Pg.161]

Hierarchical cluster analysis (HCA) is an unsupervised technique that examines the inteipoint distances between all of the samples and represents that information in the form of a twcKlimensional plot called a dendrogram. These dendrograms present the data from high-dimensional row spaces in a form that facilitates the use of human pattern-recognition abilities. [Pg.216]

Pattern recognition can be classified according to several parameters. Below we discuss only the supervised/unsupervised dichotomy because it represents two different ways of analyzing hyperspectral data cubes. Unsupervised methods (cluster analysis) classify image pixels without calibration and with spectra only, in contrast to supervised classifications. Feature extraction methods [21] such as PCA or wavelet compression are often applied before cluster analysis. [Pg.418]

We haveemployed a variety of unsupervised and supervised pattern recognition methods such as principal component analysis, cluster analysis, k-nearest neighbour method, linear discriminant analysis, and logistic regression analysis, to study such reactivity spaces. We have published a more detailed description of these investigations. As a result of this, functions could be developed that use the values of the chemical effects calculated by the methods mentioned in this paper. These functions allow the calculation of the reactivity of each individual bond of a molecule. [Pg.354]

Statistical pattern recognition is based on the statistical nature of signals and extracted features are represented as probability density functions (Schalkoff, 1992). It therefore requires knowledge of a priori probabilities of measurements and features. Statistical approaches include linear discriminant functions, Bayesian functions and cluster analysis and may be unsupervised or supervised. Supervised classifiers require a set of exemplars for each class to be recognized they are used to train the system. Unsupervised learning, on the other hand, does not require an exemplar set. [Pg.90]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...