Principal component analysis example

The dimensionality of a data set is the number of variables that are used to describe eac object. For example, a conformation of a cyclohexane ring might be described in terms c the six torsion angles in the ring. However, it is often found that there are significai correlations between these variables. Under such circumstances, a cluster analysis is ofte facilitated by reducing the dimensionality of a data set to eliminate these correlation Principal components analysis (PCA) is a commonly used method for reducing the dimensior ality of a data set. [Pg.513]

Multiple linear regression is strictly a parametric supervised learning technique. A parametric technique is one which assumes that the variables conform to some distribution (often the Gaussian distribution) the properties of the distribution are assumed in the underlying statistical method. A non-parametric technique does not rely upon the assumption of any particular distribution. A supervised learning method is one which uses information about the dependent variable to derive the model. An unsupervised learning method does not. Thus cluster analysis, principal components analysis and factor analysis are all examples of unsupervised learning techniques. [Pg.719]

A method of resolution that makes a very few a priori assumptions is based on principal components analysis. The various forms of this approach are based on the self-modeling curve resolution developed in 1971 (55). The method requites a data matrix comprised of spectroscopic scans obtained from a two-component system in which the concentrations of the components are varying over the sample set. Such a data matrix could be obtained, for example, from a chromatographic analysis where spectroscopic scans are obtained at several points in time as an overlapped peak elutes from the column. [Pg.429]

Principal component analysis has been used in combination with spectroscopy in other types of multicomponent analyses. For example, compatible and incompatible blends of polyphenzlene oxides and polystyrene were distinguished using Fourier-transform-infrared spectra (59). Raman spectra of sulfuric acid/water mixtures were used in conjunction with principal component analysis to identify different ions, compositions, and hydrates (60). The identity and number of species present in binary and tertiary mixtures of polycycHc aromatic hydrocarbons were deterrnined using fluorescence spectra (61). [Pg.429]

How does principal component analysis work Consider, for example, the two-dimensional distribution of points shown in Figure 7a. This distribution clearly has a strong linear component and is closer to a one-dimensional distribution than to a full two-dimensional distribution. However, from the one-dimensional projections of this distribution on the two orthogonal axes X and Y you would not know that. In fact, you would probably conclude, based only on these projections, that the data points are homogeneously distributed in two dimensions. A simple axes rotation is all it takes to reveal that the data points... [Pg.86]

We now consider a type of analysis in which the data (which may consist of solvent properties or of solvent effects on rates, equilibria, and spectra) again are expressed as a linear combination of products as in Eq. (8-81), but now the statistical treatment yields estimates of both a, and jc,. This method is called principal component analysis or factor analysis. A key difference between multiple linear regression analysis and principal component analysis (in the chemical setting) is that regression analysis adopts chemical models a priori, whereas in factor analysis the chemical significance of the factors emerges (if desired) as a result of the analysis. We will not explore the statistical procedure, but will cite some results. We have already encountered examples in Section 8.2 on the classification of solvents and in the present section in the form of the Swain et al. treatment leading to Eq. (8-74). [Pg.445]

Because of the relatively small number of experiments done on commercial-scale equipment before submission, and the often very narrow factor ranges (Hi/Lo might differ by only 5-10%), if conditions are not truly under control, high-level models (multi-variate regressions, principal components analysis, etc.) will pick up spurious signals due to noise and unrecognized drift. For example, Fig. 4.43 summarizes the yields achieved for... [Pg.303]

Clustering or cluster analysis is used to classify objects, characterized by the values of a set of variables, into groups. It is therefore an alternative to principal component analysis for describing the structure of a data table. Let us consider an example. [Pg.57]

A special type of data pre-treatment is the transformation of data into a smaller number of new variables. Principal components analysis is a natural example and we have treated it in Section 36.2.3 as PCR. Another way to summarize a spectrum in a few terms is through Fourier analysis. McClure [29] has shown how a NIR... [Pg.373]

Two examples of unsupervised classical pattern recognition methods are hierarchical cluster analysis (HCA) and principal components analysis (PCA). Unsupervised methods attempt to discover natural clusters within data sets. Both HCA and PCA cluster data. [Pg.112]

Chapters 3 6 deal with direct mass spectrometric analysis highlighting the suitability of the various techniques in identifying organic materials using only a few micrograms of samples. Due to the intrinsic variability of artefacts produced in different places with more or less specific raw materials and technologies, complex spectra are acquired. Examples of chemometric methods such as principal components analysis (PCA) are thus discussed to extract spectral information for identifying materials. [Pg.515]

A sample may be characterized by the determination of a number of different analytes. For example, a hydrocarbon mixture can be analysed by use of a series of UV absorption peaks. Alternatively, in a sediment sample a range of trace metals may be determined. Collectively, these data represent patterns characteristic of the samples, and similar samples will have similar patterns. Results may be compared by vectorial presentation of the variables, when the variables for similar samples will form clusters. Hence the term cluster analysis. Where only two variables are studied, clusters are readily recognized in a two-dimensional graphical presentation. For more complex systems with more variables, i.e. //, the clusters will be in -dimensional space. Principal component analysis (PCA) explores the interdependence of pairs of variables in order to reduce the number to certain principal components. A practical example could be drawn from the sediment analysis mentioned above. Trace metals are often attached to sediment particles by sorption on to the hydrous oxides of Al, Fe and Mn that are present. The Al content could be a principal component to which the other metal contents are related. Factor analysis is a more sophisticated form of principal component analysis. [Pg.22]

The authors did not attempt to address this issue. Although construct validity is important, it does not guarantee taxometric validity, so both issues must be examined, especially in the case of null finding. For example, Franklin et al. could have performed a principal component analysis and examined loadings of the three indicators on the first unrotated component. As mentioned previously, these loadings can give a sense of indicator validity, and if the INTR failed to load sufficiently, this would indicate a measurement problem. [Pg.153]

In the previous examples and figures we indicated that functions for two independent variables can be selected. When three (or more) independent variables occur, advanced analysis tools, such as experimental design (see Section 2.4) or principal component analysis (Jackson, 1991), are required to determine the structure of the model. [Pg.55]

The constrained least-square method is developed in Section 5.3 and a numerical example treated in detail. Efficient specific algorithms taking errors into account have been developed by Provost and Allegre (1979). Literature abounds in alternative methods. Wright and Doherty (1970) use linear programming methods that are fast and offer an easy implementation of linear constraints but the structure of the data is not easily perceived and error assessment inefficiently handled. Principal component analysis (Section 4.4) is more efficient when the end-members are unknown. [Pg.9]

In a rare example which demonstrates the possibilities of the approach Biirgi and Dubler-Steudler (1988a) have recently combined structure and reactivity data in a detailed study of the ring-inversion reaction of a homogeneous set of organometallic compounds. The reaction is the auto-merization of zircocene and hafnocene complexes [73 M = Zr or Hf, X = C or O], known from temperature-dependent NMR measurements to undergo the equilibration [73]—s.[73 ]. Principal-component analysis of... [Pg.135]

In some diseases a simple ordinal scale or a VAS scale cannot describe the full spectrum of the disease. There are many examples of this including depression and erectile dysfunction. Measurement in such circumstances involves the use of multiple ordinal rating scales, often termed items. A patient is scored on each item and the summation of the scores on the individual items represents an overall assessment of the severity of the patient s disease status at the time of measurement. Considerable amoimts of work have to be done to ensure the vahdity of these complex scales, including investigations of their reprodu-cibihty and sensitivity to measuring treatment effects. It may also be important in international trials to assess to what extent there is cross-cultural imiformity in the use and imderstand-ing of the scales. Complex statistical techniques such as principal components analysis and factor analysis are used as part of this process and one of the issues that need to be addressed is whether the individual items should be given equal weighting. [Pg.280]

How is dimension reduction of chemical spaces achieved There are a number of different concepts and mathematical procedures to reduce the dimensionality of descriptor spaces with respect to a molecular dataset under investigation. These techniques include, for example, linear mapping, multidimensional scaling, factor analysis, or principal component analysis (PCA), as reviewed in ref. 8. Essentially, these techniques either try to identify those descriptors among the initially chosen ones that are most important to capture the chemical information encoded in a molecular dataset or, alternatively, attempt to construct new variables from original descriptor contributions. A representative example will be discussed below in more detail. [Pg.282]

Principal components analysis can be best understood using a simple m o-variable example. With only two variables it is possible to plot the row space without the need to reduce the number of variables. Although this docs not fully present the utilit> of PCA. it is a good demonstration of how it functions. A two-dimensional plot of the row space of an example data set is shown in Figure 4.23. The data matrix consists of two columns, representing the two measurements, and 40 rows, representing the samples. Each row of the matrix is represented as a point (O) on the graph. [Pg.46]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...