Principal component analysis structure

To gain insight into chemometric methods such as correlation analysis, Multiple Linear Regression Analysis, Principal Component Analysis, Principal Component Regression, and Partial Least Squares regression/Projection to Latent Structures... [Pg.439]

Spectral features and their corresponding molecular descriptors are then applied to mathematical techniques of multivariate data analysis, such as principal component analysis (PCA) for exploratory data analysis or multivariate classification for the development of spectral classifiers [84-87]. Principal component analysis results in a scatter plot that exhibits spectra-structure relationships by clustering similarities in spectral and/or structural features [88, 89]. [Pg.534]

Schultz TW, Moulton MR Structure-toxicity relationships of selected naphthalene derivatives. 2. Principal components analysis. Bull Environ Contam Toxicol 1985 34 1-9. [Pg.491]

The concept of property space, which was coined to quanhtahvely describe the phenomena in social sciences [11, 12], has found many appUcahons in computational chemistry to characterize chemical space, i.e. the range in structure and properhes covered by a large collechon of different compounds [13]. The usual methods to approach a quantitahve descriphon of chemical space is first to calculate a number of molecular descriptors for each compound and then to use multivariate analyses such as principal component analysis (PCA) to build a multidimensional hyperspace where each compound is characterized by a single set of coordinates. [Pg.10]

Clustering or cluster analysis is used to classify objects, characterized by the values of a set of variables, into groups. It is therefore an alternative to principal component analysis for describing the structure of a data table. Let us consider an example. [Pg.57]

A first introduction to principal components analysis (PCA) has been given in Chapter 17. Here, we present the method from a more general point of view, which encompasses several variants of PCA. Basically, all these variants have in common that they produce linear combinations of the original columns in a measurement table. These linear combinations represent a kind of abstract measurements or factors that are better descriptors for structure or pattern in the data than the original measurements [1]. The former are also referred to as latent variables [2], while the latter are called manifest variables. Often one finds that a few of these abstract measurements account for a large proportion of the variation in the data. In that case one can study structure and pattern in a reduced space which is possibly two- or three-dimensional. [Pg.88]

The goal of factor analysis (FA) and their essential variant principal component analysis (PCA) is to describe the structure of a data set by means of new uncorrelated variables, so-called common factors or principal components. These factors characterize frequently underlying real effects which can be interpreted in a meaningful way. [Pg.264]

The interpretation of a multivariate image is sometimes problematic because the cause for pictorial structures may be complex and cannot be interpreted on the basis of images of single species even if they are processed by filtering etc. In such cases, principal component analysis (PCA) may advantageously be applied. The principle of the PCA is like that of factor analysis which has been mathematically described in Sect. 8.3.4. It is represented schematically in Fig. 8.33. [Pg.281]

Because protein ROA spectra contain bands characteristic of loops and turns in addition to bands characteristic of secondary structure, they should provide information on the overall three-dimensional solution structure. We are developing a pattern recognition program, based on principal component analysis (PCA), to identify protein folds from ROA spectral band patterns (Blanch etal., 2002b). The method is similar to one developed for the determination of the structure of proteins from VCD (Pancoska etal., 1991) and UVCD (Venyaminov and Yang, 1996) spectra, but is expected to provide enhanced discrimination between different structural types since protein ROA spectra contain many more structure-sensitive bands than do either VCD or UVCD. From the ROA spectral data, the PCA program calculates a set of subspectra that serve as basis functions, the algebraic combination of which with appropriate expansion coefficients can be used to reconstruct any member of the... [Pg.107]

A simple protocol was used to build the compounds compounds were modeled with the corresponding net charges, after which 2D-3D structure conversion was carried out using the program Concord [21]. The 3D dataset obtained was submitted to the VolSurf program, and principal component analysis (PCA) was applied for chemometric interpretation. No metabolic stability information was applied to the model. [Pg.417]

In the previous examples and figures we indicated that functions for two independent variables can be selected. When three (or more) independent variables occur, advanced analysis tools, such as experimental design (see Section 2.4) or principal component analysis (Jackson, 1991), are required to determine the structure of the model. [Pg.55]

Principal components analysis (PCA) and project to latent structure (PLS) were suggested to absorb information from continued-process data (Kresta et al., 1991 MacGregor and Kourti, 1995 Kourti and MacGregor, 1994). The key point of these approaches is to utilize PCA or PLS to compress the data and extract the information by projecting them into a low-dimension subspace that summarizes all the important information. Then, further monitoring work can be conducted in the reduced subspace. Two comprehensive reviews of these methods have been published by Kourti and Macgregor (1995) and Martin et al. (1996). [Pg.238]

The constrained least-square method is developed in Section 5.3 and a numerical example treated in detail. Efficient specific algorithms taking errors into account have been developed by Provost and Allegre (1979). Literature abounds in alternative methods. Wright and Doherty (1970) use linear programming methods that are fast and offer an easy implementation of linear constraints but the structure of the data is not easily perceived and error assessment inefficiently handled. Principal component analysis (Section 4.4) is more efficient when the end-members are unknown. [Pg.9]

Principal component analysis (PCA) is aimed at explaining the covariance structure of multivariate data through a reduction of the whole data set to a smaller number of independent variables. We assume that an m-point sample is represented by the nxm matrix X which collects i=l,...,m observations (measurements) xt of a column-vector x with j=, ...,n elements (e.g., the measurements of n=10 oxide weight percents in m = 50 rocks). Let x be the mean vector and Sx the nxn covariance matrix of this sample... [Pg.237]

Figure 2.11 Plot of compounds developed for different target classes based on a principal components analysis (PCA) of 2D structure-based property fingerprints. Compounds are coded according to their target class (triangle, PDE square, 5HT receptor diamond, statin circle, F-quinoline antibiotics) and clinical status at the time (gray, ok yellow, clearance issue red,...

In a rare example which demonstrates the possibilities of the approach Biirgi and Dubler-Steudler (1988a) have recently combined structure and reactivity data in a detailed study of the ring-inversion reaction of a homogeneous set of organometallic compounds. The reaction is the auto-merization of zircocene and hafnocene complexes [73 M = Zr or Hf, X = C or O], known from temperature-dependent NMR measurements to undergo the equilibration [73]—s.[73 ]. Principal-component analysis of... [Pg.135]

The reason for the correlation between the localization and the amino acid composition was sought by Andrade et al. (1998). They examined the amino acid composition of proteins with known localization and three-dimensional structure in three ways total composition, surface composition, and interior composition. The principal component analysis showed the best correlation between the surface composition and the localization. Therefore, they concluded that the correlation is the result of evolutionary adaptation of proteins to the surrounding environment. [Pg.329]

Principal component analysis (PCA) can be considered as the mother of all methods in multivariate data analysis. The aim of PCA is dimension reduction and PCA is the most frequently applied method for computing linear latent variables (components). PCA can be seen as a method to compute a new coordinate system formed by the latent variables, which is orthogonal, and where only the most informative dimensions are used. Latent variables from PCA optimally represent the distances between the objects in the high-dimensional variable space—remember, the distance of objects is considered as an inverse similarity of the objects. PCA considers all variables and accommodates the total data structure it is a method for exploratory data analysis (unsupervised learning) and can be applied to practical any A-matrix no y-data (properties) are considered and therefore not necessary. [Pg.73]

Principal Component Analysis (PCA) is the most popular technique of multivariate analysis used in environmental chemistry and toxicology [313-316]. Both PCA and factor analysis (FA) aim to reduce the dimensionality of a set of data but the approaches to do so are different for the two techniques. Each provides a different insight into the data structure, with PCA concentrating on explaining the diagonal elements of the covariance matrix, while FA the off-diagonal elements [313, 316-319]. Theoretically, PCA corresponds to a mathematical decomposition of the descriptor matrix,X, into means (xk), scores (fia), loadings (pak), and residuals (eik), which can be expressed as... [Pg.268]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...