Canonical correlation analysis

Canonical Correlation Analysis (CCA) is perhaps the oldest truly multivariate method for studying the relation between two measurement tables X and Y [5]. It generalizes the concept of squared multiple correlation or coefficient of determination, R. In Chapter 10 on multiple linear regression we found that is a measure for the linear association between a univeiriate y and a multivariate X. This R tells how much of the variance of y is explained by X = y y/yV = IlylP/llylP. Now, we extend this notion to a set of response variables collected in the multivariate data set Y. [Pg.317]

For example, let us take a look at the data of Table 35.5a. This table shows two very simple data sets, X and Y, each containing only two variables. Is there a relationship between the two data sets Looking at the matrix of correlation coefficients (Table 35.5b) we find that the so-called intra-set (or within-set) correlations are strong [Pg.318]

The squared inter-set correlation coefficients vary from 0.21 to 0.38. Thus, only some 20% to 40% of the variance of the individual variables can be explained by one of the variables from the other data set. At a first glance these low inter-set correlations do not indicate a strong relation between the two data tables. In [Pg.318]

the notation (, I C, X2) stands for the squared multiple correlation coefficient (or coefficient of determination) of the multiple regression of y, on Xj and X2. The improvement is quite modest, suggesting once more that there is only a weak (linear) relation between the two sets of data. [Pg.319]

We can go one step further, however. Each of the above multiple regression relations is between a single variable (response) of one data set and a linear combination of the variables (predictors) from the other set. Instead, one may consider the multiple-multiple correlation, i.e. the correlation of a linear combination from one set with a linear combination of the other set. Such linear combinations of the original variables are variously called factors, components, latent variables, canonical variables or canonical variates (also see Chapters 9,17, 29, and 31). [Pg.319]

An important aspect of all methods to be discussed concerns the choice of the model complexity, i.e., choosing the right number of factors. This is especially relevant if the relations are developed for predictive purposes. Building validated predictive models for quantitative relations based on multiple predictors is known as multivariate calibration. The latter subject is of such importance in chemo-metrics that it will be treated separately in the next chapter (Chapter 36). The techniques considered in this chapter comprise Procrustes analysis (Section 35.2), canonical correlation analysis (Section 35.3), multivariate linear regression... [Pg.309]

This is already a considerable improvement. The natural question then is Which linear combination of K-variables yields the highest R when regressed on the X-variables in a multiple regression Canonical correlation analysis answers this question. [Pg.319]

Computationally, canonical correlation analysis can be implemented using the following steps, where it is assumed that the data X and Y are mean-centered. [Pg.320]

It should be appreciated that canonical correlation analysis, as the name implies, is about correlation not about variance. The first step in the algorithm is to move from the original data matrices X and Y, to their singular vectors, Ux and Uy, respectively. The singular values, or the variances of the PCs of X and Y, play no role. [Pg.321]

C.J.F. ter Braak, Interpreting canonical correlation analysis through biplots of structure correlations and weights. Psychometrika, 55 (1990) 519-531. [Pg.346]

Multivariate chemometric techniques have subsequently broadened the arsenal of tools that can be applied in QSAR. These include, among others. Multivariate ANOVA [9], Simplex optimization (Section 26.2.2), cluster analysis (Chapter 30) and various factor analytic methods such as principal components analysis (Chapter 31), discriminant analysis (Section 33.2.2) and canonical correlation analysis (Section 35.3). An advantage of multivariate methods is that they can be applied in... [Pg.384]

While principal components models are used mostly in an unsupervised or exploratory mode, models based on canonical variates are often applied in a supervisory way for the prediction of biological activities from chemical, physicochemical or other biological parameters. In this section we discuss briefly the methods of linear discriminant analysis (LDA) and canonical correlation analysis (CCA). Although there has been an early awareness of these methods in QSAR [7,50], they have not been widely accepted. More recently they have been superseded by the successful introduction of partial least squares analysis (PLS) in QSAR. Nevertheless, the early pattern recognition techniques have prepared the minds for the introduction of modem chemometric approaches. [Pg.408]

Another way for BOD estimation is the use of sensor arrays [37]. An electronic nose incorporating a non-specific sensor array of 12 conducting polymers was evaluated for its ability to monitor wastewater samples. A statistical approach (canonical correlation analysis) showed a linear relationship between the sensor responses and BOD over 5 months for some subsets of samples, leading to the prediction of BOD values from electronic nose analysis using neural network analysis. [Pg.260]

T. Cserhati, A. Kosa and S. Balogh, Comparison of partial least-square method and canonical correlation analysis in a quantitative structure-retention relationship study. J. Biochem. Biophys. Meth., 36 (1998) 131-141. [Pg.565]

If more than one y-variable has to be modeled, a separate model can be developed for each y-variable or methods can be applied that work with an X- and a y-matrix, such as PLS2 (Section 4.7.1), or canonical correlation analysis (CCA) (Section 4.8.1). [Pg.119]

FIGURE 4.27 Canonical correlation analysis (CCA), x-scores are uncorrelated v-scores are uncorrelated pairs of x- and y-sores (for instance t and Ui) have maximum correlation loading vectors are in general not orthogonal. Score plots are connected projections of x- and y-space. [Pg.178]

Cancer-risk-diet relationship, 262 Canonical correlation analysis, 104 Capsaicin, 15-16 N-(Carboxymethyl)chitosan, preservation of meat flavor, 73 Carrageenan, fat replacement in ground beef, 73-75 Carry-over, description, 57 Carry-through, description, 57 Carvone, headspace analysis, 24,25/ L-Carvone, chemicals resulting in anosmias, 211... [Pg.343]

Canonical correlation analysis was used to relate small subsets of physicochemical parameters to the MDS space. Small subsets were necessary because in canonical correlation analysis, the number of stimuli should be greater than the number of dimensions and physicochemical parameters combined. The analysis revealed that a linear combination of two ADAPT parameters in Table 3 (number of oxygen atoms and chemical environment of substructure (7)) in addition to a concentration variable accounted for 63% of the arrangement of the pyrazine odor space. [Pg.47]

We also reviewed the method for estimating paleo-moist-enthalpy. To estimate paleoenthalpy from plant fossils, Forest et al. (1999) quantified a relationship between leaf physiognomy and enthalpy from present-day plants and their local climate. Using Canonical Correlation Analysis, mean annual moist enthalpy can be estimated with an uncertainty of 5.5 kJ/kg. The contribution to the uncertainty in altitude is 560 m and is comparable to using temperature alone. Other statistical techniques that improve the ability to estimate enthalpy could replace the current method. [Pg.191]

Factorial methods - factor analysis (FA) - principal components analysis ( PCA) - partial least squares modeling (PLS) - canonical correlation analysis Finding factors (causal complexes)... [Pg.7]

Canonical correlation analysis (CCA) is a method for searching for interactions between two data sets, the matrices X and Y. These data sets may have different numbers of features but the same number of objects. Using canonical analysis one creates a set of canonical variables / for the data set X and a set of canonical variables g for data set Y similar to the factors in factor analysis. The canonical variables / and g should have the following properties ... [Pg.179]

Concluding the section on the PLS modeling it should be pointed out that results from the application of canonical correlation analysis (see also Section 5.5) for the quantitative description of interactions between river water and sediments are comparable to those from PLS analyses [GEISS, 1990]. [Pg.315]

Pyo, S., Mihalik, B.J. and Uysal, M. (1989) Attraction attributes and motivations A canonical correlation analysis. Annals of Tourism Research 16, 277-282. [Pg.226]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...