Principal component analysis description

PCR is a combination of PCA and MLR, which are described in Sections 9.4.4 and 9.4.3 respectively. First, a principal component analysis is carried out which yields a loading matrix P and a scores matrix T as described in Section 9.4.4. For the ensuing MLR only PCA scores are used for modeling Y The PCA scores are inherently imcorrelated, so they can be employed directly for MLR. A more detailed description of PCR is given in Ref. [5. ... [Pg.448]

The field points must then be fitted to predict the activity. There are generally far more field points than known compound activities to be fitted. The least-squares algorithms used in QSAR studies do not function for such an underdetermined system. A partial least squares (PLS) algorithm is used for this type of fitting. This method starts with matrices of field data and activity data. These matrices are then used to derive two new matrices containing a description of the system and the residual noise in the data. Earlier studies used a similar technique, called principal component analysis (PCA). PLS is generally considered to be superior. [Pg.248]

To reduce intensity effects, the data were normalized by reducing the area under each spectrum to a value of 1 [42]. Principal component analysis (PCA) was applied to the normalized data. This method is well suited to optimize the description of the fluorescence data sets by extracting the most useful data and rejecting the redundant ones [43]. From a data set, PCA assesses principal components and their corresponding spectral pattern. The principal components are used to draw maps that describe the physical and chemical variations observed between the samples. Software for PCA has been written by D. Bertrand (INRA Nantes) and is described elsewhere [44]. [Pg.283]

Chapter 3 starts with the first and probably most important multivariate statistical method, with principal component analysis (PC A). PC A is mainly used for mapping or summarizing the data information. Many ideas presented in this chapter, like the selection of the number of principal components (PCs), or the robustification of PCA, apply in a similar way to other methods. Section 3.8 discusses briefly related methods for summarizing and mapping multivariate data. The interested reader may consult extended literature for a more detailed description of these methods. [Pg.18]

All of the compounds measured In the monitoring program are listed In the report by Thrane (VI). Table I lists the compounds which were selected as variables for the cluster analysis. Feature (l.e. attribute) selection for the cluster analysis was partially based upon the results of a principal component analysis (Henry, 12). Additional features were Included If (1) the compound occurred In relatively large concentrations, or (2), If a compound was known to have adverse health effect. Wind direction, wind speed, and temperature were recorded as ordered variables. The chemical measurements were taken at five locations. Descriptions of those sites and of the methods and techniques used to collect the data are described in detail in the report by Thrane. [Pg.139]

Basic Concepts. The goal of factor and components analysis is to simplify the quantitative description of a system by determining the minimum number of new variables necessary to reproduce various attributes of the data. Principal components analysis attempts to maximally reproduce the variance in the system while factor analysis tries to maximally reproduce the matrix of correlations. These procedures reduce the original data matrix from one having m variables necessary to describe the n samples to a matrix with p components or factors (p[Pg.26]

There are apparently many multivariate statistical methods partly overlapping in scope [11]. For most problems occurring in practice, we have found the use of two methods sufficient, as discussed below. The first method is called principal component analysis (PCA) and the second is the partial least-squares projection to latent structures (PLS). A detailed description of the methods is given in Appendix A. In the following, a brief description is presented. [Pg.300]

Principal components analysis. There are innumerable excellent descriptions of the mathematical basis of PCA26-30 and this article will provide only a general overview. It is important, first, not to be confused between algorithms which are a means to an end, and the end in itself. There are several PCA algorithms of which NIPALS (described in Appendix A2.1) and SVD are two of the most common. If correctly applied, they will both lead to the same answer (within computer precision), the best approach depending on factors such as computing power and the number of components to be calculated. [Pg.9]

Because of the limited space we focus on a user-oriented description of basic aspects of principal component analysis (PCA). PCA is an excellent tool for exploratory data analysis in chemistry. A number of surveys on the subject have already been published and it is strongly recommended to refer to a selection of them (ref. 1-7). [Pg.44]

A principal component analysis is reasonable only when the intrinsic dimensionality is much smaller than the dimensionality of the original data. This is the case for features related by high absolute values of the correlation coefficients. Whenever correlation between features is small, a significant direction of maximum variance cannot be found (Fig. 3.7) all principal components participate in the description of the data structure hence a reduction of data by principal component analysis is not possible. [Pg.54]

In the preceding description of the Mahalanobis distance, the number of coordinates in the distance metric is equal to the number of spectral frequencies. As discussed earlier in the section on principal component analysis, the intensities at many frequencies are dependent, and by using the full spectrum, we fit the noise in addition to the real information. In recent years, Mahalanobis distance has been defined with PCA or PLS scores instead of the spectral frequencies because these techniques eliminate or at least reduce most of the overfitting problem. The overall application of the Mahalanobis distance metric is the same except that the rt intensity values are replaced by the scores from PCA or PLS. An example of a Mahalanobis distance calculation on a set of Raman spectra for 25 carbohydrates is shown in Fig. 5-11. The 25 spectra were first subjected to PCA, and it was found that the first three principal components could account for most of the variance in the spectra. It was first assumed that all 25 spectra belonged to the same class because they were all carbohydrates. However, as shown in the three-dimensional plot in Fig. 5-11, the spectra can be clearly divided into three separate classes, with two of the spectra almost equal distance from each of the three classes. Most of the components in the upper left class in the two-dimensional plot were sugars however, some sugars were found in the other two classes. For unknowns, scores have to be calculated from the principal components and processed in the same way as the spectral intensities. [Pg.289]

Comparison and ranking of sites according to chemical composition or toxicity is done by multivariate nonparametric or parametric statistical methods however, only descriptive methods, such as multidimensional scaling (MDS), principal component analysis (PCA), and factor analysis (FA), show similarities and distances between different sites. Toxicity can be evaluated by testing the environmental sample (as an undefined complex mixture) against a reference sample and analyzing by inference statistics, for example, t-test or analysis of variance (ANOVA). [Pg.145]

There are several books on pattern recognition and multivariate analysis. An introduction to several of the main techniques is provided in an edited book [19]. For more statistical in-depth descriptions of principal components analysis, books by Joliffe [20] and Mardia and co-authors [21] should be read. An early but still valuable book by Massart and Kaufmann covers more than just its title theme cluster analysis [22] and provides clear introductory material. [Pg.11]

Multivariate curve resolution is the main topic of Malinowski s book [23]. The author is a physical chemist and so the book is oriented towards that particular audience, and especially relates to the spectroscopy of mixtures. It is well known because the first edition (in 1980) was one of the first major texts in chemometrics to contain formal descriptions of many common algorithms such as principal components analysis. [Pg.11]

Principal component analysis appears to be a good tool for finding strong correlations between geometrical descriptors, such as the correlation between N—C and C=C bond lengths in the enamine core, which is significant for the description of a... [Pg.158]

The basic ideas of principal component analysis are uncomplicated and easily understood from a geometric description. This is presented first. A brief account of the mathematics involved then follows. [Pg.35]

Statistical analyses. Three-way analyses of variance treating judges as a random effect were performed on each descriptive term using SAS Institute Inc. IMP 3.1 (Cary, North Carolina). Principal component analysis of the correlation matrix of the mean intensity ratings was performed with Varimax rotation. Over 200 GC peaks... [Pg.16]

The resulting matrix is then unfolded into a one-dimensional vector, which can be merged with the shape description, and is suitable for multivariate statistics analysis such as principal component analysis (PGA) and partial least squares (PLS). [Pg.108]

Multivariate statistical techniques are commonly employed in near-IR quantitative and qualitative analysis because these approaches have been proven useful for extracting desired information from near-IR spectra, which often contain up to 1200 wavelengths of observation per spectrum. Principal component analysis/principal component regression (PCA/ PCR) is one such multivariate approach. Descriptions of this... [Pg.88]

The principles behind principal components analysis are most easily explained by means of a geometrical description of the method. From such a description it will then be evident how a principal components (PC) model can be used to simplify the problem of which test compounds should be selected. The data of the Lewis acids in Table 15.1 will be used to give an example of such selections, after the general presentation of the method which follows. [Pg.342]

Mathematical description of Factor Analysis and Principal Components Analysis... [Pg.354]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...