Unsupervised cluster analysis

Another useful method for sample selection is cluster analysis-based selection.3 4,67 in this method, it is typical to start with a compressed PCA representation of the calibration data. An unsupervised cluster analysis (Section 8.6.3.1) is then performed, where the algorithm is terminated after a specific number of clusters are determined. Then, a single sample is selected from each of the clusters, as its representative in the final calibration data set. This cluster-wise selection is often done on the basis of the maximum distance from the overall data mean, but it can also be done using each of the cluster means instead. [Pg.313]

Other classical unsupervised cluster analysis methods rely on using mathematical indicators, such as distances, to quantify the similarity among pixel spectra. Thus, each pixel can be viewed as a point in the space of original wavenumbers or on other spaces, for example PC space. The coordinates of a pixel can be the spectral readings at the different wavenumbers (in the original image space) or the scores (in the PC space). Similar pixels should be close in the reference space and, therefore, distance measurements, such as Euclidean distance ( ), can be used to assess this proximity ... [Pg.81]

Multiple linear regression is strictly a parametric supervised learning technique. A parametric technique is one which assumes that the variables conform to some distribution (often the Gaussian distribution) the properties of the distribution are assumed in the underlying statistical method. A non-parametric technique does not rely upon the assumption of any particular distribution. A supervised learning method is one which uses information about the dependent variable to derive the model. An unsupervised learning method does not. Thus cluster analysis, principal components analysis and factor analysis are all examples of unsupervised learning techniques. [Pg.719]

Analytical results are often represented in a data table, e.g., a table of the fatty acid compositions of a set of olive oils. Such a table is called a two-way multivariate data table. Because some olive oils may originate from the same region and others from a different one, the complete table has to be studied as a whole instead as a collection of individual samples, i.e., the results of each sample are interpreted in the context of the results obtained for the other samples. For example, one may ask for natural groupings of the samples in clusters with a common property, namely a similar fatty acid composition. This is the objective of cluster analysis (Chapter 30), which is one of the techniques of unsupervised pattern recognition. The results of the clustering do not depend on the way the results have been arranged in the table, i.e., the order of the objects (rows) or the order of the fatty acids (columns). In fact, the order of the variables or objects has no particular meaning. [Pg.1]

In the following sections we propose typical methods of unsupervised learning and pattern recognition, the aim of which is to detect patterns in chemical, physicochemical and biological data, rather than to make predictions of biological activity. These inductive methods are useful in generating hypotheses and models which are to be verified (or falsified) by statistical inference. Cluster analysis has... [Pg.397]

Two examples of unsupervised classical pattern recognition methods are hierarchical cluster analysis (HCA) and principal components analysis (PCA). Unsupervised methods attempt to discover natural clusters within data sets. Both HCA and PCA cluster data. [Pg.112]

Inhomogeneities in data can be studied by cluster analysis. By means of cluster analysis both structures of objects and variables can be found without any pre-information on type and number of groupings (unsupervised learning, unsupervised pattern recognition). [Pg.256]

Figure 20.13 Unsupervised hierarchical cluster analysis of nine FFPE leiomyomas from 1990-2002 and one FFPE sarcoma from 1980. Reproduced with permission from Reference 22.

Unsupervised hierarchical cluster analysis showed clear separation between the sarcoma and the leiomyomas but did not reveal associations among the leiomyomas based on storage time, possibly indicating that individual differences exceeded any differences caused by storage (Fig. 20.13). [Pg.361]

Hierarchical cluster analysis (HCA) is an unsupervised technique that examines the inteipoint distances between all of the samples and represents that information in the form of a twcKlimensional plot called a dendrogram. These dendrograms present the data from high-dimensional row spaces in a form that facilitates the use of human pattern-recognition abilities. [Pg.216]

Pattern recognition can be classified according to several parameters. Below we discuss only the supervised/unsupervised dichotomy because it represents two different ways of analyzing hyperspectral data cubes. Unsupervised methods (cluster analysis) classify image pixels without calibration and with spectra only, in contrast to supervised classifications. Feature extraction methods [21] such as PCA or wavelet compression are often applied before cluster analysis. [Pg.418]

Unsupervised learning methods - cluster analysis - display methods - nonlinear mapping (NLM) - minimal spanning tree (MST) - principal components analysis (PCA) Finding structures/similarities (groups, classes) in the data... [Pg.7]

The answers to these questions will usually be given by so-called unsupervised learning or unsupervised pattern recognition methods. These methods may also be called grouping methods or automatic classification methods because they search for classes of similar objects (see cluster analysis) or classes of similar features (see correlation analysis, principal components analysis, factor analysis). [Pg.16]

Analysis of variance in general serves as a statistical test of the influence of random or systematic factors on measured data (test for random or fixed effects). One wants to test if the feature mean values of two or more classes are different. Classes of objects or clusters of data may be given a priori (supervised learning) or found in the course of a learning process (unsupervised learning see Section 5.3, cluster analysis). In the first case variance analysis is used for class pattern confirmation. [Pg.182]

The principle of unsupervised learning consists in the partition of a data set into small groups to reflect, in advance, unknown groupings [YARMUZA, 1980] (see also Section 5.3). The results of the application of methods of hierarchical agglomerative cluster analysis (see also [HENRION et al., 1987]) were representative of the large palette of mathematical algorithms in cluster analysis. [Pg.256]

In this passage we demonstrate that comparable results may also be obtained when other methods of unsupervised learning, e.g. the non-hierarchical cluster algorithm CLUPOT [COOMANS and MASSART, 1981] or the procedure of the computation of the minimal spanning tree [LEBART et al., 1984], which is similar to the cluster analysis, are applied to the environmental data shown above. [Pg.256]

Preliminary data analysis carried out for the spectral datasets were functional group mapping, and/or hierarchical cluster analysis (HCA). This latter method, which is well described in the literature,4,9 is an unsupervised approach that does not require any reference datasets. Like most of the multivariate methods, HCA is based on the correlation matrix Cut for all spectra in the dataset. This matrix, defined by Equation (9.1),... [Pg.193]

Use of an unsupervised algorithm to do the job of a supervised algorithm. For example, a cluster analysis or self-organizing map is used in combination with a post hoc analysis to do prediction. [Pg.101]

A more formal method of treating samples is unsupervised pattern recognition, mainly consisting of cluster analysis. Many methods have their origins in numerical taxonomy. [Pg.183]

Unsupervised pattern recognition differs from exploratory data analysis in diat the aim of the methods is to detect similarities, whereas using EDA diere is no particular prejudice as to whether or how many groups will be found. Cluster analysis is described in more detail in Section 4.4. [Pg.184]

The chemist also wishes to relate samples in a similar manner. Can protein sequences from different animals be related and does this tell us about the molecular basis of evolution Can the chemical fingerprint of wines be related and does this tell us about the origins and taste of a particular wine Unsupervised pattern recognition employs a number of methods, primarily cluster analysis, to group different samples (or objects) using chemical measurements. [Pg.224]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...