PCA and cluster analysis

Cluster analysis (which is covered extensively in Chapter 30) can be performed on the factor scores of a data table using a reduced number of factors (Section 31.1.4) rather than on the data table itself. This way, one can apply cluster analysis on the structural information only, while disregarding the noise or artefacts in the data. The number of structural factors may be determined by means of internal [Pg.156]

Joliffe, Principal Components Analysis. Springer, New York, 1986. [Pg.158]

Kvalheim, Interpretation of direct latent-variable projection methods and their aims and use in the analysis of multicomponent spectroscopic and chromatographic data. Chemom. Intell. Lab. Syst., 4 (1988) 11-25. [Pg.158]

Mandel, Use of the singular value decomposition in regression analysis. Am. Statistician, 36 (1982) 15-24. [Pg.158]

Eckart and G. Young, The approximation of one matrix by another of lower rank. Psychometrika, 1 (1936)211-218. [Pg.158]

Excipients PCA and cluster analysis Transferability between different spectrometers 66... [Pg.479]

Since the PCA and cluster analysis results were similar for the three sites and since one emission source has been suggested (12) as the source of many of the species detected In Western Washington rain, an analysis of the regional similarities In composition was appropriate. [Pg.42]

Throughout this chapter, reference will be made to techniques and approaches described elsewhere in this book, and a certain familiarity with these topics will be assumed Methods of representing molecular conformation, and different coordinate systems (Chapter 1), ways of dealing with symmetry aspects (Chapter 2), data retrieval from the Cambridge Structural Database (CSD Chapter 3) [3], and multivariate statistical techniques such as principal component analysis (PCA) and cluster analysis (CA Chapter 4). [Pg.338]

To meet the first objective, the water sample data were subjected to principal components analysis (PCA) and cluster analysis (CA) (see Appendix 2). These calculations were used to group formation waters into water types . A water type is defined in this paper as a formation water with a distinctive composition in terms of mole ratios of dissolved ions (rather than absolute concentrations of dissolved ions). Eight water types were identified and Table 3 shows the water type of each sampled well in the study area. [Pg.290]

Despite the success of these techniques with other spectroscopic data, very little has been published on their use with Raman data. The aforementioned work on postconsumer plastic identification by Allen et al. [43] utilized KNN for their analysis, although they present little of the actual classification results. Similarly, Krizova et al. [54] simply state that the SIMCA analysis of Norway spruce needles resulted in similar results to PCA and cluster analysis studies. More detail was given by Daniel et al. [52] when comparing KNN and ANN for analysis of exposive materials. [Pg.311]

The multivariate techniques which reveal underlying factors such as principal component factor analysis (PCA), soft Independent modeling of class analogy (SIMCA), partial least squares (PLS), and cluster analysis work optimally If each measurement or parameter Is normally distributed In the measurement space. Frequency histograms should be calculated to check the normality of the data to be analyzed. Skewed distributions are often observed In atmospheric studies due to the process of mixing of plumes with ambient air. [Pg.36]

Principal component analysis (PCA), factor analysis (FA) and cluster analysis (CA) are some of the most widely used multivariate analysis techniques applied to... [Pg.167]

Preference mapping can be accomplished with projection techniques such as multidimensional scaling and cluster analysis, but the following discussion focuses on principal components analysis (PCA) [69] because of the interpretability of the results. A PCA represents a multivariate data table, e.g., N rows ( molecules ) and K columns ( properties ), as a projection onto a low-dimensional table so that the original information is condensed into usually 2-5 dimensions. The principal components scores are calculated by forming linear combinations of the original variables (i.e., properties ). These are the coordinates of the objects ( molecules ) in the new low-dimensional model plane (or hyperplane) and reveal groups of similar... [Pg.332]

The vectors of means = (xi, I2,..., x ) and deviations = (ii, S2,. ..,Sp), and matrices of covariances S = (Sij) and correlations R = (tij) can be calculated. For this data matrix, the most used non-supervised methods are Principal Components Analysis (PCA), and/or Factorial Analysis (FA) in an attempt to reduce the dimensions of the data and study the interrelation between variables and observations, and Cluster Analysis (CA) to search for clusters of observations or variables (Krzanowski 1988 Cela 1994 Afifi and Clark 1996). Before applying these techniques, variables are usually first standardised (X, X ) to achieve a mean of 0 and unit variance. [Pg.694]

The contents of Sr, Cu, Mg and Zn in the serum of patients of the coronary heart diseases and normal persons were determined by using the ICP-AES[19]. The dala were evaluated by using ordinary principal component analysis, cluster analysis and stepwise discrimination analysis. It has been found that ordinary principal component analysis and cluster analysis could not give satisfactory results with four samples misclassified. There were fivo samples misclassified in stepwise discrimination analysis. These data sets were treated by PP PCA and SVD. The PC1-PC2 plot of PP classification shown in Figure 8 has only two samples misclassified. The results further demonstrate that PP PCA is more preferable than the traditional SVD algorithm. [Pg.176]

Exploratory data analysis is a common preliminary step in all the QSAR/QSPR studies. In particular. Principal Component Analysis (PCA) and clustering methods... [Pg.1251]

To establish a correlation between the concentrations of different kinds of nucleosides in a complex metabolic system and normal or abnormal states of human bodies, computer-aided pattern recognition methods are required (15, 16). Different kinds of pattern recognition methods based on multivariate data analysis such as principal component analysis (PCA) (8), partial least squares (16), stepwise discriminant analysis, and canonical discriminant analysis (10, 11) have been reported. Linear discriminant analysis (17, 18) and cluster analysis were also investigated (19,20). Artificial neural network (ANN) is a branch of chemometrics that resolves regression or classification problems. The applications of ANN in separation science and chemistry have been reported widely (21-23). For pattern recognition analysis in clinical study, ANN was also proven to be a promising method (8). [Pg.244]

Among the mathematical tools to investigate patterns and clustering behaviour in data sets, two techniques are widely established, namely principal component analysis and cluster analysis. Both can be used to reduce the dimensionality of a problem. Or in other words, cluster analysis can be used for variable or descriptor selection from a larger set. On the other hand cluster analysis may be used to investigate similarity among compounds. Cluster analysis is often used complementary to PCA. [Pg.365]

The profiles of hydrocarbons, PCBs, organochlorine pesticides, and sterols in sediments from the northwestern Mediterranean Sea were studied by PCA, hierarchical cluster analysis, and positive matrix factorization [70]. Three sources could be distinguished anthropogenic, consisting mostly of PAHs rivers, containing mostly -alkanes, pesticides, and sterols and an unspecified background source, containing just some n-alkanes. [Pg.84]

British and Italian researchers have reported on the use of an electronic nose for the detection of moulds in libraries and archives [38], The aim was to ascertain whether the device could be suitable for detecting mould activity on paper. It was fonnd that it was possible to discriminate in vitro between affected and unaffected (by mould) paper samples at both 100% and 75% relative humidity by measuring the odor hngerprint. Three different species of actively growing fungi were detected and cluster analysis allowed differentiation between specific species. However, PCA indicated that only samples analyzed at 100% RH could be separated, suggesting that further research is required before electronic nose technology could be applied. [Pg.184]

Then the next step consists on application of multivariate statistical methods to find key features involving molecules, descriptors and anticancer activity. The methods include principal component analysis (PCA), hiererchical cluster analysis (HCA), K-nearest neighbor method (KNN), soft independent modeling of class analogy method (SIMCA) and stepwise discriminant analysis (SDA). The analyses were performed on a data matrix with dimension 25 lines (molecules) x 1700 columns (descriptors), not shown for convenience. For a further study of the methodology apphed there are standard books available such as (Varmuza FUzmoser, 2009) and (Manly, 2004). [Pg.188]

In our applications in combinatorial chemistry, independent variables are molecular descriptors, and their values may be taken either from the building blocks or from the compounds in the libreuy themselves. Important methods of non-supervised learning are principal component analysis (PCA) euid cluster analysis. [Pg.295]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...