Column-centering

Column-centering is a customary form of preprocessing in principal components analysis (Section 17.6.1). It involves the subtraction of the corresponding column-means from each element of the table X [Pg.119]

Atmospheric data from Table 31.1, after column-centering. [Pg.120]

After this transformation we find that the column-means nip are zero as shown in Table 31.4. [Pg.120]

In this case the column-norms dp are called column-standard deviations. The square of these numbers are the column-variances, whose sum represents the global variance in the data. Note that the column-variances are heterogeneous which means that they are very different from each other. [Pg.120]

There are two outstanding poles on this biplot. DMSO and dimethylchloride are at a large distance from the origin and from one another. These poles are the most likely candidates for the construction of unipolar axes. As has been explained in the previous section, perpendicular projections of points (representing compounds) upon a unipolar axis (representing a method) leads to a reproduction of the data in Table 31.3. In this case we have to substitute the untransformed value in eq. (31.35) by Zy of eq. (31.42) [Pg.121]

Fig. 29.8. (a) Pattern of points in column-space S (left panel) and in row-space S" (right panel) before column-centering, (b) After column-centering, the pattern in 5 is translated such that the centroid coincides with the origin of space. Distances between points in S are conserved while those in S" are not. (c) After column-standardization, distances between points in S and 5" are changed. Points in 5" are located on a (hyper)sphere centered around the origin of space. [Pg.44]

A special form of cross-product matrix is the variance-covariance matrix (or covariance matrix for short) Cp, which is based on the column-centered matrix Yp derived from an original matrix X ... [Pg.49]

In the following section on preprocessing of the data we will show that column-centering of X leads to an interpretation of the sums of squares and cross-products in in terms of the variances-covariances of the columns of X. Furthermore, cos djj> then becomes the coefficient of correlation between these columns. [Pg.112]

The vector of column-means nip defines the coordinates of the centroid (or center of mass) of the row-pattern P" that represents the rows in column-space Sf . Similarly, the vector of row-means m defines the coordinates of the center of mass of the column-pattern that represents the columns in row-space S". If the column-means are zero, then the centroid will coincide with the origin of SP and the data are said to be column-centered. If both row- and column-means are zero then the centroids are coincident with the origin of both 5" and S . In this case, the data are double-centered (i.e. centered with respect to both rows and columns). In this chapter we assume that all points possess unit mass (or weight), although one can extend the definitions to variable masses as is explained in Chapter 32. [Pg.116]

Fig. 31.6. Biplot of chromatographic retention times in Table 31.2, after column-centering of the data. Two unipolar axes and one bipolar axis have been drawn through the representations of the methods DMSO and methylenedichloride (CH2CI2). The projections of three selected compounds are indicated by dashed lines. TTie values read off from the unipolar axes reproduce the retention times in the corresponding columns. The values on the bipolar axis reproduce the differences between retention times.

Column-standardization is the most widely used transformation. It is performed by division of each element of a column-centered table by its corresponding column-standard deviation (i.e. the square root of the column-variance) ... [Pg.122]

The transformation by log column-centering consists of taking logarithms followed by column-centering. The choice of the base of the logarithms has no effect on the interpretation of the result, but decimal logs will be used throughout. [Pg.123]

In this case it is required that the original data in X are strictly positive. The effect of the transformation appears from Table 31.6. Column-means are zero, while column-standard deviations tend to be more homogeneous than in the case of simple column-centering in Table 31.4 as can be seen by inspecting the corresponding values for Na and Cl. [Pg.124]

With log column-centering we obtain unipolar axes by substituting eq. (31.46) in eq. (31.35) ... [Pg.124]

Fig. 31.8. Biplot of chromatographic retention times in Table 31.2, after log column-centering of the data. The values on the bipolar axis reproduce the (log) ratios between retention times in the two corresponding columns.

A bipolar axis through columns j and/ can be interpreted in the same way as in the log column-centered case (eq. (31.48)) since the terms nij and cancel out. The first (close to horizontal) axis between DMSO and ethanol represents the (log)ratios of the corresponding retention times. They can be read off by vertical projection of the compounds on this scale. Note that the scale is divided logarithmically. In the same way, one can read off the (log)ratios of methylenedichloride and ethanol from the second (close to vertical) axis on Fig. 31.9. Graphical estimation of these contrasts for the dimethylamine-N02 substituted chalcone produces 9.5 on the DMSO/ethanol axis and 6.2 on the methylenedichloride/ ethanol axis of Fig. 31.9. The exact ratios from Table 31.2 are 10.00 and 6.14, respectively. [Pg.127]

The size component which may be strongly present (as in this chromatographic application) is eliminated by the operation of double-centering. Hence, double-centered latent variables only express contrasts. In column-centered biplots one may find that one latent variable expresses mainly size and the others mainly contrasts. In general, none of the latter is a pure component of size or of contrasts. If we want to see size and some contrasts represented in a biplot, column-centering... [Pg.127]

Sometimes it is claimed that the double-centered biplot of latent variables 1 and 2 is identical to the column-centered biplot of latent variables 2 and 3. This is only the case when the first latent variable coincides with the main diagonal of the data space (i.e. the line that makes equal angles with all coordinate axes). In the present application of chromatographic data this is certainly not the case and the results are different. Note that projection of the compounds upon the main diagonal produces the size component. [Pg.129]

The power algorithm [21] is the simplest iterative method for the calculation of latent vectors and latent values from a square symmetric matrix. In contrast to NIPALS, which produces an orthogonal decomposition of a rectangular data table X, the power algorithm decomposes a square symmetric matrix of cross-products X which we denote by C. Note that Cp is called the column-variance-covariance matrix when the data in X are column-centered. [Pg.138]

It can be regarded as a special case of the squared weighted Euclidean distance (Section 30.2.2.1), A property of (weighted) Euclidean distance functions is that the distances between row-items D are invariant under column-centering of the table X ... [Pg.146]

Geometrically, column-centering of X is equivalent to a translation of the origin of column-space toward the centroid of the points which represent the rows of the data table X. Hence, the operation of column-centering leaves distances between the row-points unchanged. [Pg.147]

The theory of the non-linear PCA biplot has been developed by Gower [49] and can be described as follows. We first assume that a column-centered measurement table X is decomposed by means of classical (or linear) PCA into a matrix of factor scores S and a matrix of factor loadings L ... [Pg.150]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...