Latent vectors

The relationships between scores, loadings and latent vectors can be written in a compact way by means of the so-called transition formulae. ... [Pg.100]

Summarizing, we find that, depending on the choice of a and p, we are able to reconstruct different features of the data in factor-space by means of the latent vectors. On the one hand, if a = 1 then we can reproduce the cross-products C between the rows of the table. On the other hand, if p equals 1 then we are able to reproduce the cross-products between columns of the table. Clearly, we can have both a = 1 and P = 1 and reproduce cross-products between rows as well as between columns. In the following section we will explain that cross-products can be related to distances between the geometrical representations of the corresponding rows or columns. [Pg.102]

An important aspect of latent vectors analysis is the number of latent vectors that are retained. So far, we have assumed that all latent vectors are involved in the reconstruction of the data table (eq. (31.1)) and the matrices of cross-products (eq. (31.3)). In practical situations, however, we only retain the most significant latent vectors, i.e. those that contribute a significant part to the global sum of squares c (eq.(31.8)). [Pg.102]

If we only include the first r latent variables we have to redefine our relationships between data, latent vectors and latent values ... [Pg.102]

A measure for the goodness of the reconstruction is provided by the relative contribution y of the retained latent vectors to the global sum of squares c (eq. (31.8)) ... [Pg.103]

In our example, we find that y = 0.895 for r = 1. We will discuss various methods which can guide in the choice of the number of relevant latent vectors r in Section 31.5. [Pg.103]

In the previous section we have developed principal components analysis (PCA) from the fundamental theorem of singular value decomposition (SVD). In particular we have shown by means of eq. (31.1) how an nxp rectangular data matrix X can be decomposed into an nxr orthonormal matrix of row-latent vectors U, a pxr orthonormal matrix of column-latent vectors V and an rxr diagonal matrix of latent values A. Now we focus on the geometrical interpretation of this algebraic decomposition. [Pg.104]

In Fig. 31.2a we have represented the ith row x, of the data table X as a point of the row-pattern F in column-space S . The additional axes v, and V2 correspond with the columns of V which are the column-latent vectors of X. They define the orientation of the latent vectors in column-space S. In the case of a symmetrical pattern such as in Fig. 31.2, one can interpret the latent vectors as the axes of symmetry or principal axes of the elliptic equiprobability envelopes. In the special case of multinormally distributed data, Vj and V2 appear as the major and minor... [Pg.104]

Fig. 31.2. Geometrical example of the duality of data space and the concept of a common factor space, (a) Representation of n rows (circles) of a data table X in a space Sf spanned by p columns. The pattern P" is shown in the form of an equiprobabi lity ellipse. The latent vectors V define the orientations of the principal axes of inertia of the row-pattern, (b) Representation of p columns (squares) of a data table X in a space y spanned by n rows. The pattern / is shown in the form of an equiprobability ellipse. The latent vectors U define the orientations of the principal axes of inertia of the column-pattern, (c) Result of rotation of the original column-space S toward the factor-space S spanned by r latent vectors. The original data table X is transformed into the score matrix S and the geometric representation is called a score plot, (d) Result of rotation of the original row-space S toward the factor-space S spanned by r latent vectors. The original data table X is transformed into the loading table L and the geometric representation is referred to as a loading plot, (e) Superposition of the score and loading plot into a biplot.

Since this latent vector is defined as the vector for which the sum of squares of the projections is maximum (eq. (31.5)), we can interpret v, as an axis of maximal inertia ... [Pg.106]

We now consider a subspace of S which is orthogonal to v, and we repeat the argument. This leads to V2, and in the multidimensional case to all r columns in V. By the geometrical construction, all r latent vectors are mutually orthogonal, and r is equal to the number of dimensions of the pattern of points represented by X. This number r is the rank of X and cannot exceed the number of columns p in X and, in our case, is smaller than the number of rows in X (because we assume that n is larger than p). [Pg.106]

One of the earliest interpretations of latent vectors is that of lines of closest fit [9]. Indeed, if the inertia along v, is maximal, then the inertia from all other directions perpendicular to v, must be minimal. This is similar to the regression criterion in orthogonal least squares regression which minimizes the sum of squared deviations which are perpendicular to the regression line (Section 8.2.11). In ordinary least squares regression one minimizes the sum of squared deviations from the regression line in the direction of the dependent measurement, which assumes that the independent measurement is without error. Similarly, the plane formed by v, and Vj is a plane of closest fit, in the sense that the sum of squared deviations perpendicularly to the plane is minimal. Since latent vectors v, contribute... [Pg.106]

Note that the inertia X loaded by u, in S is the same as that loaded by v, in S. For this reason, we must consider Uj and Vj as two different expressions of one and the same latent vector. The former is developed in S" while the latter is constructed in 5. ... [Pg.107]

Once we have obtained the projections S and L of X upon the latent vectors V and U, we can do away with the original data spaces S and 5". Since V and U are orthonormal vectors that span the space of latent vectors each row i and each column j of X is now represented as a point in as shown in Figs. 31.2c and d. The... [Pg.108]

Since U and V express one and the same set of latent vectors, one can superimpose the score plot and the loading plot into a single display as shown in Fig. 31,2e. Such a display was called a biplot (Section 17.4), as it represents two entities (rows and columns of X) into a single plot [10]. The biplot plays an important role in the graphic display of the results of PCA. A fundamental property of PCA is that it obviates the need for two dual data spaces and that instead of these it produces a single space of latent variables. [Pg.108]

Preprocessing is the operation which precedes the extraction of latent vectors from the data. It is an operation which is carried out on all the elements of an original data table X and which produces a transformed data table Z. We will discuss six common methods of preprocessing, including the trivial case in which the original data are left unchanged. The effects of each of these six types of preprocessing will be illustrated numerically by means of the small 4x3 data table from the study of trace elements in atmospheric samples which has been used in previous sections (Table 31.1). The various effects of the transformations can be observed from the two summary statistics (mean and norm). These statistics include the vector of column-means m and the vector of column-norms of the transformed data table Z ... [Pg.115]

A large variety of algorithms is available for the extraction of latent vectors from rectangular data tables and from square symmetric matrices. We only discuss very briefly a few of these. [Pg.134]

The NIPALS algorithm is easy to program, particularly with a matrix-oriented computer notation, and is highly efficient when only a few latent vectors are required, such as for the construction of a two-dimensional biplot. It is also suitable for implementation in personal or portable computers with limited hardware resources. [Pg.136]

The power algorithm [21] is the simplest iterative method for the calculation of latent vectors and latent values from a square symmetric matrix. In contrast to NIPALS, which produces an orthogonal decomposition of a rectangular data table X, the power algorithm decomposes a square symmetric matrix of cross-products X which we denote by C. Note that Cp is called the column-variance-covariance matrix when the data in X are column-centered. [Pg.138]

Correspondence factor analysis can be described in three steps. First, one applies a transformation to the data which involves one of the three types of closure that have been described in the previous section. This step also defines two vectors of weight coefficients, one for each of the two dual spaces. The second step comprises a generalization of the usual singular value decomposition (SVD) or eigenvalue decomposition (EVD) to the case of weighted metrics. In the third and last step, one constructs a biplot for the geometrical representation of the rows and columns in a low-dimensional space of latent vectors. [Pg.183]

The results of applying these operations to the double-closed data in Table 32.6 are shown in Table 32.7. The analysis yielded two latent vectors with associated singular values of 0.567 and 0.433. [Pg.183]

One can define transition formulae for the two sets of generalized latent vectors in A and B (see also Section 31.1.6) ... [Pg.185]

These transition formulae express one set of generalized latent vectors (A or B) in terms of the other set (B or A). They follow readily from the definition of the generalized SVD problem which has been stated above. [Pg.185]

In CFA we can derive biplots for each of the three types of transformed contingency tables which we have discussed in Section 32.3 (i.e., by means of row-, column- and double-closure). These three transformations produce, respectively, the deviations (from expected values) of the row-closed profiles F, of the column-closed profiles G and of the double-closed data Z. It should be reminded that each of these transformations is associated with a different metric as defined by W and W. Because of this, the generalized singular vectors A and B will be different also. The usual latent vectors U, V and the matrix of singular values A, however, are identical in all three cases, as will be shown below. Note that the usual singular vectors U and V are extracted from the matrix. ... [Pg.187]

From the latent vectors and singular values one can compute the nxr generalized score matrix S and the pxr generalized loading matrix L. These matrices contain the coordinates of the rows and columns in the space spanned by the latent vectors ... [Pg.188]

The reconstruction Z of the transformed contingency table Z in a reduced space of latent vectors follows from ... [Pg.192]

CFA can also be defined as an expansion of a contingency table X using the generalized latent vectors in A, B and the singular values in A ... [Pg.192]

The three latent vectors account for respectively 86, 13 and 1% of the interaction. The next two columns in Tables 32.11 and 32.12 show the distances 5 and 8 of rows and columns from the origin of space and their contributions y and to the interaction ... [Pg.196]

The final column in Tables 32.11 and 32.12 lists the precisions 7t and with which the rows and columns are represented in the plane spanned by the first two latent vectors ... [Pg.196]

Figure 32.8 shows the biplot constructed from the first two columns of the scores matrix S and from the loadings matrix L (Table 32.11). This biplot corresponds with the exponents a = 1 and p = 1 in the definition of scores and loadings (eq. (39.41)). It is meant to reconstruct distances between rows and between columns. The rows and columns are represented by circles and squares respectively. Circles are connected in the order of the consecutive time intervals. The horizontal and vertical axes of this biplot are in the direction of the first and second latent vectors which account respectively for 86 and 13% of the interaction between rows and columns. Only 1% of the interaction is in the direction perpendicular to the plane of the plot. The origin of the frame of coordinates is indicated... [Pg.197]

In the case when one of the two measurements of the contingency table is divided in ordered categories, one can construct a so-called thermometer plot. On this plot we represent the ordered measurement along the horizontal axis and the scores of the dominant latent vectors along the vertical axis. The solid line in Fig. 32.9 displays the prominent features of the first latent vector which, in the context of our illustration, is called the women/men factor. It clearly indicates a sustained progress of the share of women doctorates from 1966 onwards. The dashed line corresponds with the second latent vector which can be labelled as the chemistry/ other fields factor. This line shows initially a decline of the share of chemistry and a slow but steady recovery from 1973 onwards. The successive decline and rise are responsible for the horseshoe-like appearance of the pattern of points representing... [Pg.198]

Both types of symmetric displays exhibited in Figs. 32.9 and 32.10 have their merits. They are called symmetric because they produce equal variances in the scores and in the loadings. In the case when a = 3 = 1, we obtain that the variances along the horizontal and vertical axes are equal to the eigenvalues h associated to the dominant latent vectors. In the other case when a = P = 0.5, the variances are found to be equal to the singular values X. [Pg.200]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...