Line of closest fit

In the previous section we have developed principal components analysis (PCA) from the fundamental theorem of singular value decomposition (SVD). In particular we have shown by means of eq. (31.1) how an nxp rectangular data matrix X can be decomposed into an nxr orthonormal matrix of row-latent vectors U, a pxr orthonormal matrix of column-latent vectors V and an rxr diagonal matrix of latent values A. Now we focus on the geometrical interpretation of this algebraic decomposition. [Pg.104]

In Chapter 29 we introduced the concept of the two dual data spaces. Each of the n rows of the data table X can be represented as a point in the p-dimensional column-space S . In Fig. 31.2a we have represented the n rows of X by means of the row-pattern F. The curved contour represents an equiprobability envelope, e.g. a curve that encloses 99% of the points. In the case of multinormally distributed data this envelope takes the form of an ellipsoid. For convenience we have only represented two of the p dimensions of SP which is in reality a multidimensional space rather than a two-dimensional one. One must also imagine the equiprobability envelope as an ellipsoidal (hyper)surface rather than the elliptical curve in the figure. The assumption that the data are distributed in a multinormal way is seldom fulfilled in practice, and the patterns of points often possess more complex structure than is shown in our illustrations. In Fig. 31.2a the centroid or center of mass of the pattern of points appears at the origin of the space, but in the general case this needs not to be so. [Pg.104]

Similarly, Fig. 31.2b shows the column-pattern F of the p columns of the data table X by means of an elliptical envelope in the dual n-dimensional row-space 5 . The ellipses should be interpreted as (hyper)ellipsoidal equiprobability envelopes of multinormal data. In practice the data are rarely multinormal and the centroid (or center of mass) of the pattern does not generally appear at the origin of space. An essential feature is that the equiprobability envelopes are similarly shaped in Figs. 31.2a and b. The reason for this will become apparent below. Note that in the previous section we have assumed by convention that n exceeds p, but this is not reflected in Figs. 31.2a and b. [Pg.104]

In Fig. 31.2a we have represented the ith row x, of the data table X as a point of the row-pattern F in column-space S . The additional axes v, and V2 correspond with the columns of V which are the column-latent vectors of X. They define the orientation of the latent vectors in column-space S. In the case of a symmetrical pattern such as in Fig. 31.2, one can interpret the latent vectors as the axes of symmetry or principal axes of the elliptic equiprobability envelopes. In the special case of multinormally distributed data, Vj and V2 appear as the major and minor [Pg.104]

Since this latent vector is defined as the vector for which the sum of squares of the projections is maximum (eq. (31.5)), we can interpret v, as an axis of maximal inertia [Pg.106]

One of the earliest interpretations of latent vectors is that of lines of closest fit [9]. Indeed, if the inertia along v, is maximal, then the inertia from all other directions perpendicular to v, must be minimal. This is similar to the regression criterion in orthogonal least squares regression which minimizes the sum of squared deviations which are perpendicular to the regression line (Section 8.2.11). In ordinary least squares regression one minimizes the sum of squared deviations from the regression line in the direction of the dependent measurement, which assumes that the independent measurement is without error. Similarly, the plane formed by v, and Vj is a plane of closest fit, in the sense that the sum of squared deviations perpendicularly to the plane is minimal. Since latent vectors v, contribute... [Pg.106]

This method is applicable when data are to be inspected and characterized. PCA is easily understood by graphical illustrations, for example, by a two-dimensional co-ordinate system with a number of points in it (Figure 6.25). The first principal component (PC) is the line with the closest fit to these points [12]. Unless the point swarm has, for example, the shape of a circle, the position of the first PC is unambiguous. Because the first PC is the line of closest fit, it is also the line that explains most of the variation (maximum variance) in the data [13]. Therefore it is called the principal component. [Pg.324]

This reflects exactly the approach of Pearson defining a line of closest fit. The vector pi gives a direction in the 7-dimensional space (defining a line) and ti represents the scores (orthogonal projections) on that line. The outer product ti pj is a rank one matrix and is the best rank-one approximation of X in a least squares sense. This approach can be generalized for more than one component. Then the problem becomes one of finding the subspace of closest fit. [Pg.39]

Fig. 3.3. Time course of radiocarbon accumulation in basal agar receivers of 8-mm sections of tobacco stem tissues, apically supplied with agar donor blocks containing l i C IAA (3 pM). The linear regression equations and the lines of closest fit were estimated by the least-squares method from the data beyond 1.5 h for tissue types 1, 2 and 3, and 2.5 h for tissue types 4 and 5 (the last time value was ignored for the inner tissues since it showed a decline in the export rate after a transport period of about 3.5 h in other experiments). Note that small amounts of radioactivity were found in the receivers from all tissue types before the intersections of the straight lines. (Data from Sheldrake 1973 a)...

Pearson, K. 1901. On lines and planes of closest fit to systems of points in space. Philosophical Magazine Journal of Science 2(6) 559-572. [Pg.122]

Pearson K.. On Lines and Planes of Closest Fit to Systems of Points in Space Philosophical Magazine. 1901 2 559-572. [Pg.89]

Fig. 12. Volume change kinetics of PVME gel (Trial 4) between states above and below the volume transition of 37 °C. Non-Fickian behaviour is observed for both swelling and shrinking. For comparison, lines are calculated which provide the closest fit of Fickian theory to the data. O Swelling (from 50 to 24°C), D = 4.0 x 10-7cm2/s Shrinking (from 23 to 50°C), D = I x 10-5 cm2/s. Reprinted from Polymer (1991) 33 990 by permission of the publishers, Butterworth Heinemann [46]...

The term closest fit is used to denote a least-squares fit. In PCA, the sum of the squared distances from the points perpendicularly to the fitted line (.Figure 6.25) is minimized. [Pg.324]

Figure 10. (A) Comparison between the experimental isotherm (o) and the theoretical curve calculated from the BET equation with k = 1, using the values of the parameters collected in the first row of Table 1, corresponding to the homogeneous surface model. (B) Comparison between the experimental isotherm (o) and the theoretical curve, assuming k as an additional best-fit parameter for linear regression. The corresponding values of the parameters are collected in Table 1 in the second row for the homogeneous surface model. The dotted lines are the isotherms of adsorption in the first layer, closest to the surface.

The problem of numerical interpolation is considered first. Suppose that an estimate of the fraction unreacted is required at 52 s. Since data are available at 50 s and 60 s the simplest way to estimate c/cq is to draw a straight line between the two adjacent points and thereby estimate c/cq- However, one sees clearly from the graph that the data do not fall on a straight line, so an improved estimate is obtained by using the three closest data points and fitting them to a quadratic equation. In order to keep the following discussion general, time is referred to as x and c/cq as y. Thus the equation to be fit to the three points is... [Pg.610]

To determine the Eg, the j-V data (considering only the photocurrent, Jptd is plotted in a photocurrent squared versus potential (/ph vs. V) [26]. The linear portion closest to the onset of photocurrent is fitted to a linear line. The x-intercept... [Pg.82]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...