Non-linear PCA

Non-linear PCA can be obtained in many different ways. Some methods make use of higher order terms of the data (e.g. squares, cross-products), non-linear transformations (e.g. logarithms), metrics that differ from the usual Euclidean one (e.g. city-block distance) or specialized applications of neural networks [50]. The objective of these methods is to increase the amount of variance in the data that is explained by the first two or three components of the analysis. We only provide a brief outline of the various approaches, with the exception of neural networks for which the reader is referred to Chapter 44. [Pg.149]

One approach is to extend the columns of a measurement table by means of their powers and cross-products. An example of such non-linear PCA is discussed in Section 37.2.1 in an application of QSAR, where biological activity was known to be related to the hydrophobic constant by means of a quadratic function. In this case it made sense to add the square of a particular column to the original measurement table. This procedure, however, tends to increase the redundancy in the data. [Pg.149]

The logarithmic transformation prior to column- or double-centered PCA (Section 31.3) can be considered as a special case of non-linear PCA. The procedure tends to make the row- and column-variances more homogeneous, and allows us to interpret the resulting biplots in terms of log ratios. [Pg.150]

The theory of the non-linear PCA biplot has been developed by Gower [49] and can be described as follows. We first assume that a column-centered measurement table X is decomposed by means of classical (or linear) PCA into a matrix of factor scores S and a matrix of factor loadings L ... [Pg.150]

Fig. 31.17. (a) In a classical PCA biplot, data values xy can be estimated by means of perpendicular projection of the ith row-point upon a unipolar axis which represents theyth column-item of the data table X. In this case the axis is a straight line through the origin (represented by a small cross), (b) In a non-linear PCA biplot, the jth column-item traces out a curvilinear trajectory. The data value is now estimated by defining the shortest distance between the ith row point and theyth trajectory. [Pg.151]

Non linear PCA algorithms have also been developed to provide a representation along principle curves rather than principal directions... [Pg.156]

Scholz,M.,Kaplan,E,Guy,C.L.,Kopka,X,Selbig,X(2005) Non-linear PCA a missing data approach. Bioinformatics, 21, 3887-3895. [Pg.557]

In order to apply RBL or GRAFA successfully some attention has to be paid to the quality of the data. Like any other multivariate technique, the results obtained by RBL and GRAFA are affected by non-linearity of the data and heteroscedast-icity of the noise. By both phenomena the rank of the data matrix is higher than the number of species present in the sample. This has been demonstrated on the PCA results obtained for an anthracene standard solution eluted and detected by three different brands of diode array detectors [37]. In all three cases significant second eigenvalues were obtained and structure is seen in the second principal component. [Pg.301]

Now, what is interesting about this situation is that ordinary regression theory and the theory of PCA and PLS specify that the model generated must be linear in the coefficients. Nothing is specified about the nature of the data (except that it be noise-free, as our simulated data is) the data may be non-linear to any degree. Ordinarily this is not a problem because any data transform may be used to linearize the data, if that is desirable. [Pg.132]

Figure 3 shows the PCA score plot of the same data of figure 2 after the application of equation 4. The application of linear normalization to an array of linear sensors should produce, on the PCA score plot, one point for each compound, independent of its concentration, and achieve the highest possible recognition. Deviations from ideal behaviour, as shown in figure 3, are due to the presence of measurement errors, and to the non-linear relationship between sensor response and concentration. [Pg.152]

A final consideration about PCA is concerned with its use as a preprocessor of non-linear methods such as neural networks [22], The assumption of a normal distribution of the data requires all following analysis steps to adhere to this hypothesis. If positive results are sometimes achieved they have to be considered as serendipitous events. [Pg.157]

There are several methods that can be used to select well-distributed calibration samples from a set of such happenstance data. One simple method, called leverage-based selection, is to run a PCA analysis on the calibration data, and select a subset of calibration samples that have extreme values of the leverage for each of the significant PCs in the model. The selected samples will be those that have extreme responses in their analytical profiles. In order to cover the sample states better, it would also be wise to add samples that have low leverage values for each of the PCs, so that the center samples with more normal analytical responses are well represented as well. Otherwise, it would be very difficult for the predictive model to characterize any non-linear response effects in the analytical data. In PAC, where spectroscopy and chromatography methods are common, it is better to assume that non-linear effects in the analytical responses could be present than to assume that they are not. [Pg.313]

A more complex method is described by WOLD [1978], who used cross-validation to estimate the number of factors in FA and PCA. WOLD applied the NIPALS (non linear iterative partial least squares) algorithm and also mentioned its usefulness in cases of incomplete data. [Pg.173]

Principal component analysis (PCA) Cluster analysis Non-linear mapping Kohonen mapping... [Pg.351]

Thus PCA may be used to filter out the most relevant information in a data set. Applications of variations of PCA, such as correspondence factorial analysis and non-linear mapping may have small advantages with particular data sets, but require expert support. [Pg.364]

Like PCA, non-linear mapping (NLM), or multidimensional scaling, is a method for visualizing relationships between objects, which in the medicinal chemistry context often are compounds, but could equally be a number of measured activities. It is an iterative minimization procedure which attempts to preserve interpoint distances in multidimensional space in a 2D or 3D representation. Unlike PCA however, the axes are not orthogonal and are not clearly interpretable with respect to the original variables. Nonlinear mapping has been used to cluster aromatic and aliphatic substituents. ... [Pg.365]

Mediator Release Inhibitors (MRI) - A comprehensive review of this class of compounds has appeared recently. Non-linear regression analysis of a series of 51 drugs active in the rat PCA assay revealed that both the conformation of a drug and its capacity to act as an electron acceptor in charge-transfer interactions were critical for high activity. Studies continue to elucidate the mode of action of the prototype MRI, disodium cromoglycate (DSCG) and a comprehensive review of the... [Pg.97]

Multivariate display methods are very useful techniques for the inspection of high-dimensional data sets. They allow us to examine the relationships between points (compounds, samples, etc.) in both training and test sets, and between descriptor variables. Linear and non-linear methods are available, both with advantages and disadvantages, which have proved useful in numerous chemical applications. The linear approach (PCA) forms the basis of a variety of multivariate techniques as described later in this book. Finally, it is not possible to say in advance which, if any, is the best approach to use. [Pg.88]

The deeomposition in eqn (4.30) is general for PCR, PLS and other regression methods. These methods differ in the criterion (and the algorithm) used for ealeulating P and, hence, they characterise the ealibrators by different scores T. In PCR, T and P are found from the PCA of the data matrix R. Both the non-linear iterative partial least-squares (NIPALS) algorithm and the singular-value deeomposition (SVD) (much used, see Appendix) of R can be used to obtain the T and P used in PCA/PCR. In PLS, other algorithms are used to obtain T and P (see Chapter 5). [Pg.289]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...