Biplot

Historically, a distinction has been made between PCA of column-variables and that of row-variables. These are referred to as R-mode or Q-mode PCA, respectively. The modem approach is to consider both analyses as dual and to unify the two views (of rows and columns) into a single display, which is called biplot and which will be discussed in greater detail later on. [Pg.88]

Fig. 31.2. Geometrical example of the duality of data space and the concept of a common factor space, (a) Representation of n rows (circles) of a data table X in a space Sf spanned by p columns. The pattern P" is shown in the form of an equiprobabi lity ellipse. The latent vectors V define the orientations of the principal axes of inertia of the row-pattern, (b) Representation of p columns (squares) of a data table X in a space y spanned by n rows. The pattern / is shown in the form of an equiprobability ellipse. The latent vectors U define the orientations of the principal axes of inertia of the column-pattern, (c) Result of rotation of the original column-space S toward the factor-space S spanned by r latent vectors. The original data table X is transformed into the score matrix S and the geometric representation is called a score plot, (d) Result of rotation of the original row-space S toward the factor-space S spanned by r latent vectors. The original data table X is transformed into the loading table L and the geometric representation is referred to as a loading plot, (e) Superposition of the score and loading plot into a biplot.

Since U and V express one and the same set of latent vectors, one can superimpose the score plot and the loading plot into a single display as shown in Fig. 31,2e. Such a display was called a biplot (Section 17.4), as it represents two entities (rows and columns of X) into a single plot [10]. The biplot plays an important role in the graphic display of the results of PCA. A fundamental property of PCA is that it obviates the need for two dual data spaces and that instead of these it produces a single space of latent variables. [Pg.108]

The interpretation of biplots is made easier by the construction of axes in it. These axes are used in the same way as in a bivariate Cartesian diagram. Perpendicular projection of the points in the diagrams upon a coordinate axis allows us to determine (or reconstruct) the values in the table. [Pg.112]

We consider the special biplot in which both rows and columns are represented in a single display of latent variables subjected to the constraint that a + P equals 1. As we have seen above, this constraint allows us to reconstruct the original data X from which the latent variables U, V and the latent values A have been computed (eq. (31.22)). [Pg.112]

Figure 31.4 shows the biplot of the trace elements and wind directions for the case when a = p = 0.5. Since here we have that a + P equals 1, we can reconstruct the values in the columns of the data table X by means of perpendicular projections upon unipolar axes. In Fig. 31.4a we have drawn a unipolar axis through Cl. Perpendicular projection of the four wind directions upon this axis reconstructs the order of the concentrations of Cl at the four wind directions as listed in Table 31.1. Now we have established a way which leads back from the graphic display to the tabulated data. This interpretation of the biplot emphasizes the one-to-one relationship between the data and the plot. Such a relationship is also inherent in the ordinary bivariate (or Cartesian) diagram. [Pg.113]

Fig. 31.4. (a) Biplot in which the concentrations of an atmospheric trace element (Cl) are reconstructed by perpendicular projection upon a unipolar axis, (b) Biplot in which the differences (contrasts) between two atmospheric trace elements (Cl, Si) are reproduced by perpendicular projection upon a bipolar axis. [Pg.114]

Biplots constructed from this table are shown in Figs. 31.5 to 31.11. The horizontal and vertical axes of these biplots represent scores and loadings of the... [Pg.117]

Fig. 31.5. Biplot of 23 substituted chalcones (circles) and 8 chromatographic methods (squares) as described by their retention times in Table 31.2, after no transformation of the data. Areas of circles and squares are related to the mean retention times of the corresponding compounds and methods, such as they appear in the margins of the table.

Fig. 31.6. Biplot of chromatographic retention times in Table 31.2, after column-centering of the data. Two unipolar axes and one bipolar axis have been drawn through the representations of the methods DMSO and methylenedichloride (CH2CI2). The projections of three selected compounds are indicated by dashed lines. TTie values read off from the unipolar axes reproduce the retention times in the corresponding columns. The values on the bipolar axis reproduce the differences between retention times.

There are two outstanding poles on this biplot. DMSO and dimethylchloride are at a large distance from the origin and from one another. These poles are the most likely candidates for the construction of unipolar axes. As has been explained in the previous section, perpendicular projections of points (representing compounds) upon a unipolar axis (representing a method) leads to a reproduction of the data in Table 31.3. In this case we have to substitute the untransformed value in eq. (31.35) by Zy of eq. (31.42) ... [Pg.121]

In the corresponding column-standardized biplot of Fig. 31.7 we find all representations of the eight chromatographic methods more or less at the same distance from the origin of space. The circle is distorted because of the large difference between the contributions of the first and second latent variables (95 and 4%) and the choice of a = [3 = 0.5 which has been made at the outset. The combined effect is an apparent dilation of the vertical axis. [Pg.123]

The distances between compounds in Fig. 31.7 are not notably affected by the transformation in comparison with the previous Fig. 31.6. This biplot allows more easily to perceive the correlations between measurements. Three clusters are now put in evidence, namely (1) DMSO and DMF, (2) ethanol and propanol, (3) octanol, dioxane, THF and methylenedichloride. The line segments drawn from the origin have been added to emphasize these groupings. Unipolar axes could have been defined here in the same way as in Fig. 31.6. Bipolar axes on the column-standardized biplot, however, cannot be interpreted directly in terms of the original data in X. [Pg.123]

Fig. 31.8. Biplot of chromatographic retention times in Table 31.2, after log column-centering of the data. The values on the bipolar axis reproduce the (log) ratios between retention times in the two corresponding columns.

The biplot of Fig. 31.9 shows that both the centroids of the compounds and of the methods coincide with the origin (the small cross in the middle of the plot). The first two latent variables account for 83 and 14% of the inertia, respectively. Three percent of the inertia is carried by higher order latent variables. In this biplot we can only make interpretations of the bipolar axes directly in terms of the original data in X. Three prominent poles appear on this biplot DMSO, methylene-dichloride and ethylalcohol. They are called poles because they are at a large distance from the origin and from one another. They are also representative for the three clusters that have been identified already on the column-standardized biplot in Fig. 31.7. [Pg.126]

The first bipolar axis (DMSO/ethanol) accounts for the contrast between compounds with NO2 substitutions and those without. Compounds with a NO2 substituent systematically obtain higher scores on this bipolar axis than others. The second bipolar axis (methylenedichloride/ethanol) seems to produce an order of the substituents according to their electronic properties. To emphasize this point we have reproduced the log double-centered biplot again in Fig. 31.10. The dashed line near the middle separates the class of NO2 substituted chalcones from the other compounds. Further, we have joined substituents by line segments according to the sequence CF3, F, H, methyl, ethyl, I -propyl, t-butyl, methoxy, phenyl and di-methylamine. The electronic properties of these substituents vary progressively from electron acceptors to electron donors [ 11 ] in accordance with their scores on the second bipolar axis. [Pg.127]

The size component which may be strongly present (as in this chromatographic application) is eliminated by the operation of double-centering. Hence, double-centered latent variables only express contrasts. In column-centered biplots one may find that one latent variable expresses mainly size and the others mainly contrasts. In general, none of the latter is a pure component of size or of contrasts. If we want to see size and some contrasts represented in a biplot, column-centering... [Pg.127]

Fig. 31.10. Same biplot of chromatographic retention times as in Fig. 31.9. The line segments connect compounds that share a common substituent. The horizontal contrast reflects the presence or absence of a NO2 substituent. The vertical contrast expresses the electronegativity of the substituents.

One can also state that the log double-centered biplot shows interactions between the rows and columns of the table. In the context of analysis of variance (ANOVA), interaction is the variance that remains in the data after removal of the main effects produced by the rows and columns of the table [12], This is precisely the effect of double-centering (eq. (31.49)). [Pg.129]

Sometimes it is claimed that the double-centered biplot of latent variables 1 and 2 is identical to the column-centered biplot of latent variables 2 and 3. This is only the case when the first latent variable coincides with the main diagonal of the data space (i.e. the line that makes equal angles with all coordinate axes). In the present application of chromatographic data this is certainly not the case and the results are different. Note that projection of the compounds upon the main diagonal produces the size component. [Pg.129]

The analysis of Table 31.2 by CFA is shown in Fig. 31.11. As can be seen, the result is very similar to that obtained by log double-centering in Figs. 31.9 and 31.10. The first latent variable expresses a contrast between NO2 substituted chalcones and the others. The second latent variable seems to be related to the electronic properties of the substituents. The contributions of the two latent variables to the total inertia is 96%. The double-closed biplot of Fig. 31.11 does not allow a direct interpretation of unipolar and bipolar axes in terms of the original data X. The other rules of interpretation are similar to those of the log double-centered biplot in the previous subsection. Compounds and methods that seem to have moved away from the center and in the same directions possess a positive interaction (attraction). Those that moved in opposite directions show a negative interaction (repulsion). [Pg.132]

The NIPALS algorithm is easy to program, particularly with a matrix-oriented computer notation, and is highly efficient when only a few latent vectors are required, such as for the construction of a two-dimensional biplot. It is also suitable for implementation in personal or portable computers with limited hardware resources. [Pg.136]

The logarithmic transformation prior to column- or double-centered PCA (Section 31.3) can be considered as a special case of non-linear PCA. The procedure tends to make the row- and column-variances more homogeneous, and allows us to interpret the resulting biplots in terms of log ratios. [Pg.150]

The theory of the non-linear PCA biplot has been developed by Gower [49] and can be described as follows. We first assume that a column-centered measurement table X is decomposed by means of classical (or linear) PCA into a matrix of factor scores S and a matrix of factor loadings L ... [Pg.150]

Fig. 31.17. (a) In a classical PCA biplot, data values xy can be estimated by means of perpendicular projection of the ith row-point upon a unipolar axis which represents theyth column-item of the data table X. In this case the axis is a straight line through the origin (represented by a small cross), (b) In a non-linear PCA biplot, the jth column-item traces out a curvilinear trajectory. The data value is now estimated by defining the shortest distance between the ith row point and theyth trajectory. [Pg.151]

The same idea can be developed in the case of a non-Euclidean metric such as the city-block metric or L,-norm (Section 31.6.1). Here we find that the trajectories, traced out by the variable coefficient kj are curvilinear, rather than linear. Markers between equidistant values on the original scales of the columns of X are usually not equidistant on the corresponding curvilinear trajectories of the nonlinear biplot (Fig. 31.17b). Although the curvilinear trajectories intersect at the origin of space, the latter does not necessarily coincide with the centroid of the row-points of X. We briefly describe here the basic steps of the algorithm and we refer to the original work of Gower [53,54] for a formal proof. [Pg.152]

The r coordinates of the variable point which traces out the trajectory of the yth column-item in the r-dimensional biplot are compiled in the r-vector Sj. The elements of the latter can be estimated by means of linear regression of the nxr factor scores S upon an n-vector which is defined as ... [Pg.152]

Note that dj and represent the row- and global means of the squared distance matrix D, respectively. The latter needs only be computed once and for all. The n distances of the variable point d(/cy) to the n row-points, however, are to be reevaluated for every column-item of X and for every marker which is to appear in the corresponding trajectory in the biplot. Usually, the range of is limited between the minimum and maximum value in the jth column of X or somewhat beyond (say 10% of the range on either side). [Pg.153]

K. R. Gabriel, The biplot graphic display of matrices with applications to principal components analysis. Biometrika, 58 (1971) 453-467. [Pg.158]

Correspondence factor analysis can be described in three steps. First, one applies a transformation to the data which involves one of the three types of closure that have been described in the previous section. This step also defines two vectors of weight coefficients, one for each of the two dual spaces. The second step comprises a generalization of the usual singular value decomposition (SVD) or eigenvalue decomposition (EVD) to the case of weighted metrics. In the third and last step, one constructs a biplot for the geometrical representation of the rows and columns in a low-dimensional space of latent vectors. [Pg.183]

In CFA we can derive biplots for each of the three types of transformed contingency tables which we have discussed in Section 32.3 (i.e., by means of row-, column- and double-closure). These three transformations produce, respectively, the deviations (from expected values) of the row-closed profiles F, of the column-closed profiles G and of the double-closed data Z. It should be reminded that each of these transformations is associated with a different metric as defined by W and W. Because of this, the generalized singular vectors A and B will be different also. The usual latent vectors U, V and the matrix of singular values A, however, are identical in all three cases, as will be shown below. Note that the usual singular vectors U and V are extracted from the matrix. ... [Pg.187]

In what follows we restrict the discussion of biplots to the case of double-closed data Z as defined by the elements ... [Pg.187]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...