Supervised principal-component analysis

Multiple linear regression is strictly a parametric supervised learning technique. A parametric technique is one which assumes that the variables conform to some distribution (often the Gaussian distribution) the properties of the distribution are assumed in the underlying statistical method. A non-parametric technique does not rely upon the assumption of any particular distribution. A supervised learning method is one which uses information about the dependent variable to derive the model. An unsupervised learning method does not. Thus cluster analysis, principal components analysis and factor analysis are all examples of unsupervised learning techniques. [Pg.719]

Remote sensing techniques have been successfully applied for the identification of rocks in Cape Smith fold belt region. Principal Component Analysis is very effective for the separation of gabbro, metabasalt and peridotite. Band Ratio was helpful for the preliminary identification of peridotite. Supervised Classification approach is taken to verify the results obtained by Principal Component Analysis and Band Ratio. It is also useful to remap the unknown regions once the results are verified. [Pg.488]

Cluster analysis is far from an automatic technique each stage of the process requires many decisions and therefore close supervision by the analyst. It is imperative that the procedure be as interactive as possible. Therefore, for this study, a menu-driven interactive statistical package was written for PDP-11 and VAX (VMS and UNIX) series computers, which includes adequate computer graphics capabilities. The graphical output includes a variety of histograms and scatter plots based on the raw data or on the results of principal-components analysis or canonical-variates analysis (14). Hierarchical cluster trees are also available. All of the methods mentioned in this study were included as an integral part of the package. [Pg.126]

A generalised structure of an electronic nose is shown in Fig. 15.9. The sensor array may be QMB, conducting polymer, MOS or MS-based sensors. The data generated by each sensor are processed by a pattern-recognition algorithm and the results are then analysed. The ability to characterise complex mixtures without the need to identify and quantify individual components is one of the main advantages of such an approach. The pattern-recognition methods maybe divided into non-supervised (e.g. principal component analysis, PCA) and supervised (artificial neural network, ANN) methods also a combination of both can be used. [Pg.330]

Reasonable noise in the spectral data does not affect the clustering process. In this respect, cluster analysis is much more stable than other methods of multivariate analysis, such as principal component analysis (PCA), in which an increasing amount of noise is accumulated in the less relevant clusters. The mean cluster spectra can be extracted and used for the interpretation of the chemical or biochemical differences between clusters. HCA, per se, is ill-suited for a diagnostic algorithm. We have used the spectra from clusters to train artificial neural networks (ANNs), which may serve as supervised methods for final analysis. This process, which requires hundreds or thousands of spectra from each spectral class, is presently ongoing, and validated and blinded analyses, based on these efforts, will be reported. [Pg.194]

The vectors of means = (xi, I2,..., x ) and deviations = (ii, S2,. ..,Sp), and matrices of covariances S = (Sij) and correlations R = (tij) can be calculated. For this data matrix, the most used non-supervised methods are Principal Components Analysis (PCA), and/or Factorial Analysis (FA) in an attempt to reduce the dimensions of the data and study the interrelation between variables and observations, and Cluster Analysis (CA) to search for clusters of observations or variables (Krzanowski 1988 Cela 1994 Afifi and Clark 1996). Before applying these techniques, variables are usually first standardised (X, X ) to achieve a mean of 0 and unit variance. [Pg.694]

Fig. 5. An example of a scores plot as one might obtain in a principal components analysis. Distinct clustering or grouping of NMR spectra is observed in this type of plot, where the discrimination results from the analyzed metric used (e.g., principal components). The distance between samples (r ) within groups is used by many supervised methods to further describe and improve class or group separation. There are different chemometric techniques that can be used to identify outliers, or to provide a group assignment.

A whole spectrum of statistical techniques have been applied to the analysis of DNA microarray data [26-28]. These include clustering analysis (hierarchical, K-means, self-organizing maps), dimension reduction (singular value decomposition, principal component analysis, multidimensional scaling, or correspondence analysis), and supervised classification (support vector machines, artificial neural networks, discriminant methods, or between-group analysis) methods. More recently, a number of Bayesian and other probabilistic approaches have been employed in the analysis of DNA microarray data [11], Generally, the first phase of microarray data analysis is exploratory data analysis. [Pg.129]

These various chemometrics methods are used in those works, according to the aim of the studies. Generally speaking, the chemometrics methods can be divided into two types unsupervised and supervised methods(Mariey et al., 2001). The objective of unsupervised methods is to extrapolate the odor fingerprinting data without a prior knowledge about the bacteria studied. Principal component analysis (PCA) and Hierarchical cluster analysis (HCA) are major examples of unsupervised methods. Supervised methods, on the other hand, require prior knowledge of the sample identity. With a set of well-characterized samples, a model can be trained so that it can predict the identity of unknown samples. Discriminant analysis (DA) and artificial neural network (ANN) analysis are major examples of supervised methods. [Pg.206]

We haveemployed a variety of unsupervised and supervised pattern recognition methods such as principal component analysis, cluster analysis, k-nearest neighbour method, linear discriminant analysis, and logistic regression analysis, to study such reactivity spaces. We have published a more detailed description of these investigations. As a result of this, functions could be developed that use the values of the chemical effects calculated by the methods mentioned in this paper. These functions allow the calculation of the reactivity of each individual bond of a molecule. [Pg.354]

In our applications in combinatorial chemistry, independent variables are molecular descriptors, and their values may be taken either from the building blocks or from the compounds in the libreuy themselves. Important methods of non-supervised learning are principal component analysis (PCA) euid cluster analysis. [Pg.295]

The first two parts of this section describe supervised learning methods which may be used for the analysis of classified data. One technique, discriminant analysis, is related to regression while the other, SIMCA, has similarities with principal component analysis (PCA). The final part of this section discusses some of the conditions which data should meet when analysed by discriminant techniques. [Pg.139]

Mass spectrometry and chemometric methods cover very diverse fields Different origin of enzymes can be disclosed with LC-MS and multivariate analysis [45], Pyrolysis mass spectrometry and chemometrics have been applied for quality control of paints [46] and food analysis [47], Olive oils can be classified by analyzing volatile organic hydrocarbons (of benzene type) with headspace-mass spectrometry and CA as well as PC A [48], Differentiation and classification of wines can similarly be solved with headspace-mass spectrometry using unsupervised and supervised principal component analyses (SIMCA = soft independent modeling of class analogy) [49], Early prediction of wheat quality is possible using mass spectrometry and multivariate data analysis [50],... [Pg.163]

The supervised pattern recognition methods include K nearest neighbor method (KNN), principal component analysis (PCA), Fisher... [Pg.191]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...