Principal components coefficient

Table I. Principal Component Coefficients for the Example Data Set ...

Table III. Principal Component Coefficients Chinautla- Sacojito Data with Outliners Removed ...

To further analyze the relationships within descriptor space we performed a principle component analysis of the whole data matrix. Descriptors have been normalized before the analysis to have a mean of 0 and standard deviation of 1. The first two principal components explain 78% of variance within the data. The resultant loadings, which characterize contributions of the original descriptors to these principal components, are shown on Fig. 5.8. On the plot we can see that PSA, Hhed and Uhba are indeed closely grouped together. Calculated octanol-water partition coefficient CLOGP is located in the opposite corner of the property space. This analysis also demonstrates that CLOGP and PSA are the two parameters with... [Pg.122]

Each column of S represents a row-principal component of X and can be interpreted as a linear combination of the columns of X using the elements of V as weighting coefficients ... [Pg.96]

The combination of PCA and LDA is often applied, in particular for ill-posed data (data where the number of variables exceeds the number of objects), e.g. Ref. [46], One first extracts a certain number of principal components, deleting the higher-order ones and thereby reducing to some degree the noise and then carries out the LDA. One should however be careful not to eliminate too many PCs, since in this way information important for the discrimination might be lost. A method in which both are merged in one step and which sometimes yields better results than the two-step procedure is reflected discriminant analysis. The Fourier transform is also sometimes used [14], and this is also the case for the wavelet transform (see Chapter 40) [13,16]. In that case, the information is included in the first few Fourier coefficients or in a restricted number of wavelet coefficients. [Pg.236]

Note that the lipophilicity parameter log P is defined as a decimal logarithm. The parabolic equation is only non-linear in the variable log P, but is linear in the coefficients. Hence, it can be solved by multiple linear regression (see Section 10.8). The bilinear equation, however, is non-linear in both the variable P and the coefficients, and can only be solved by means of non-linear regression techniques (see Chapter 11). It is approximately linear with a positive slope (/ ,) for small values of log P, while it is also approximately linear with a negative slope b + b for large values of log P. The term bilinear is used in this context to indicate that the QSAR model can be resolved into two linear relations for small and for large values of P, respectively. This definition differs from the one which has been introduced in the context of principal components analysis in Chapter 17. [Pg.390]

A difficulty with Hansch analysis is to decide which parameters and functions of parameters to include in the regression equation. This problem of selection of predictor variables has been discussed in Section 10.3.3. Another problem is due to the high correlations between groups of physicochemical parameters. This is the multicollinearity problem which leads to large variances in the coefficients of the regression equations and, hence, to unreliable predictions (see Section 10.5). It can be remedied by means of multivariate techniques such as principal components regression and partial least squares regression, applications of which are discussed below. [Pg.393]

Figure 8.16 shows the principal components of a coefficient of friction tester. Polymer samples in the form of thick sheets or molded plaques are attached to the base and a sled with standard dimensions, weight, and surface properties is drawn over the surface. The load beam measures the force required to initiate movement and sustain motion at a given rate of crosshead travel. Thin films can be taped to the sled and drawn across a contact surface that has known properties. [Pg.174]

Because protein ROA spectra contain bands characteristic of loops and turns in addition to bands characteristic of secondary structure, they should provide information on the overall three-dimensional solution structure. We are developing a pattern recognition program, based on principal component analysis (PCA), to identify protein folds from ROA spectral band patterns (Blanch etal., 2002b). The method is similar to one developed for the determination of the structure of proteins from VCD (Pancoska etal., 1991) and UVCD (Venyaminov and Yang, 1996) spectra, but is expected to provide enhanced discrimination between different structural types since protein ROA spectra contain many more structure-sensitive bands than do either VCD or UVCD. From the ROA spectral data, the PCA program calculates a set of subspectra that serve as basis functions, the algebraic combination of which with appropriate expansion coefficients can be used to reconstruct any member of the... [Pg.107]

Our band shape methods have made use of the principal component method of factor analysis (Pancoska etal., 1979 Malinowski, 1991) to characterize the protein spectra in terms of a relatively small number of coefficients (loadings) (Pancoska et al., 1994 1995 Baumruk et al., 1996). This approach is similar, in its initial stages, to various methods (Selcon, Variselect, etc.) that have been used for determining protein secondary structure from ECD data (Hennessey and Johnson, 1981 Provencher and Glockner, 1981 Johnson, 1988 Pancoska and Keiderling, 1991 Sreerama and Woody, 1993, 1994 Venyaminov and Yang, 1996). At this point, one can say these traditional quantitative methods have had little impact upon structural studies of denatured proteins. [Pg.167]

The ijth term of this matrix represents the correlation coefficient (loading) between the (th variable and the jth principal component. [Pg.241]

In addition to the linear sensitivity coefficients described above, various other types of sensitivity coefficients have been studied to probe underlying relationships between input and output parameters of chemical kinetic models. These include higher-order coefficients, Green s function coefficients, derived coefficients, feature coefficients, and principal components. Their descriptions and applications can be found in the literature [22,23, 27, 28],... [Pg.65]

The difference between interval and ratio scales can be important for including or not including an intercept term in mathematical models for the correct calculation of the correlation coefficient for deciding to mean center or not in principal component analysis and for a host of other decisions in data treatment and modeling. [Pg.19]

Many different types of similarity measure have been discussed in the literature, but they generally involve three principal components the representation that is used to characterize the molecules that are being compared the weighting scheme that is used to assign differing degrees of importance to the various components of these representations and the similarity coefficient that is used to... [Pg.52]

Differences between PIS and PCR Principal component regression and partial least squares use different approaches for choosing the linear combinations of variables for the columns of U. Specifically, PCR only uses the R matrix to determine the linear combinations of variables. The concentrations are used when the regression coefficients are estimated (see Equation 5.32), but not to estimate A potential disadvantage with this approach is that variation in R that is not correlated with the concentrations of interest is used to construct U. Sometiraes the variance that is related to the concentrations is a verv... [Pg.146]

Here E is the solute excess molar refractivity, S is the solute dipolarity/ polarizability A and B are the overall or summation hydrogen-bond acidity and basicity, respectively and V is the McGowan characteristic volume lower-case letters stand for respective coefficients which are characteristic of the solvent, c is the constant. By help of sfafisfical methods like the principal component analysis and nonlinear mapping, the authors determined the mathematical distance (i.e., measure of dissimilarify) from an IL fo seven conventional solvents immiscible with water. It appears that the closest to the IL conventional solvent is 1-octanol. Even more close to IL is an aqueous biphasic system based on PEG-200 and ammonium sulfate (and even closer are ethylene glycol and trifluoroethanol, as calculated for hypofhefical water-solvenf sysfems involving fhese solvenfs). [Pg.251]

Since the principal components k cannot depend on the particular coordinate axes, the coefficients must be independent of the particular coordinate-system representation of T. That is to say, the quantities I, II, and III must be invariant to the particular coordinate-system representation. In terms of the principal components,... [Pg.760]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...