Principal component analysis linear dimensionality reduction

Principal component analysis (PCA) can be considered as the mother of all methods in multivariate data analysis. The aim of PCA is dimension reduction and PCA is the most frequently applied method for computing linear latent variables (components). PCA can be seen as a method to compute a new coordinate system formed by the latent variables, which is orthogonal, and where only the most informative dimensions are used. Latent variables from PCA optimally represent the distances between the objects in the high-dimensional variable space—remember, the distance of objects is considered as an inverse similarity of the objects. PCA considers all variables and accommodates the total data structure it is a method for exploratory data analysis (unsupervised learning) and can be applied to practical any A-matrix no y-data (properties) are considered and therefore not necessary. [Pg.73]

How is dimension reduction of chemical spaces achieved There are a number of different concepts and mathematical procedures to reduce the dimensionality of descriptor spaces with respect to a molecular dataset under investigation. These techniques include, for example, linear mapping, multidimensional scaling, factor analysis, or principal component analysis (PCA), as reviewed in ref. 8. Essentially, these techniques either try to identify those descriptors among the initially chosen ones that are most important to capture the chemical information encoded in a molecular dataset or, alternatively, attempt to construct new variables from original descriptor contributions. A representative example will be discussed below in more detail. [Pg.282]

Since SOMs are capable of projecting compound distributions in high-dimensional descriptor spaces on two-dimensional arrays of nodes, this methodology is also useful as a dimension reduction technique, similar to others discussed above. SOM projections and the relationships they establish are usually non-linear, in contrast to, for example, principal component analysis (that, as discussed, generates a smaller number of new composite descriptors as linear combinations of the original ones). [Pg.26]

Principal component analysis is a popular statistical method that tries to explain the covariance structure of data by means of a small number of components. These components are linear combinations of the original variables, and often allow for an interpretation and a better understanding of the different sources of variation. Because PCA is concerned with data reduction, it is widely used for the analysis of high-dimensional data, which are frequently encountered in chemometrics. PCA is then often the first step of the data analysis, followed by classification, cluster analysis, or other multivariate techniques [44], It is thus important to find those principal components that contain most of the information. [Pg.185]

The variables (wavelengths) associated with the IR emission spectra were highly correlated. Principal components analysis (PCA), linear and nonlinear PLS showed that at least 86% of the total variance could be explained by the two primary latent dimensions. The forward and reverse modelling results showed that dimensional reduction with a linear model (PLS) produced better models than a nonlinear model (multilayer perceptron neural network trained with the back propagation algorithm) without dimensional reduction. [Pg.450]

The objective of a principal component analysis (PCA) is to transform a number of correlated variables into a smaller set of new, uncorrelated variables (factors or latent variables). The first few factors should then explain most of the relevant variation in the data set. To allow this reduction in dimensionality, the variables are characterized by a partial correlation. The new variables can then be generated through a linear combination of the original variables, i.e. the original matrix X is then the product of a score matrix P and the transpose ( ) of the loading matrix A ... [Pg.704]

Factor analysis, a family of data reduction techniques, is often performed to reduce the amount of data. Customarily, principal components analysis, principal factor analysis, or common factor analysis is performed on the data to extract factors or scores that best represent either the variation or the similarity between the data populations of the variables. For example, principal components analysis reassembles the data as linear combinations of the original variables so that the largest variance in the data corresponds to the first principal component. Each subsequent principal component is orthonormal to the previous component and represents the largest remaining variance in the data. The maximum number of principal components allowed is equal to the number of variables measured and maintains the data structure but does not reduce the dimensionality of the data. Typically, the smallest set of principal components necessary to represent some large percentage of the total variance in the data is used for further analyses. A number of tests have been developed to determine the number of principal components to retain [49,50]. [Pg.228]

This discussion will focus on two main techniques to perform the reduction (1) principal component analysis and (2) factor analysis. Both of these techniques attempt to find an appropriate low-dimensional representation of the covariance matrix. Other approaches such as multi-dimensional scaling, non-linear mapping, and Kohonen networks are reviewed briefly in this section, and discussed in greater detail in Section 5. [Pg.748]

Linear approaches to spectral dimensionality reduction make the assumption that the data lies on or near a low-dimensional subspace. In such cases, linear spectral dimensionality reduction methods seek to learn the basis vectors of this low-dimensional subspace so that the input data can be projected onto the linear subspace. The two main methods for linear spectral dimensionality reduction. Principal Components Analysis and Multidimensional Scaling, are both described in this section. Although more powerful nonlinear approaches have been presented in recent years, these linear techniques are still widely used and are worthy of attention since they provide the basis for some of the subsequent nonlinear spectral dimensionality reduction algorithms. [Pg.9]

Ann X m matrix can be considered n points in the m-dimensional space (or m points in the n-dimensional space). The points can be projected into a smaller dimensional subspace (smaller than n or m, whichever is the smaller) using proper techniques as PCA. Therefore, PCA is often called as a projection method. Projecting the points, dimension reduction of the data can be achieved. The principal components are often called underlying components their values are the scores. The principal components are, in fact, linear combinations of the original variables. PCA is an unsupervised method of pattern recognition in the sense that no grouping of the data has to be known before the analysis. Still the data structure can be revealed easily and class membership is easy to assiga... [Pg.148]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...