Correlation, multivariate methods

Canonical Correlation Analysis (CCA) is perhaps the oldest truly multivariate method for studying the relation between two measurement tables X and Y [5]. It generalizes the concept of squared multiple correlation or coefficient of determination, R. In Chapter 10 on multiple linear regression we found that is a measure for the linear association between a univeiriate y and a multivariate X. This R tells how much of the variance of y is explained by X = y y/yV = IlylP/llylP. Now, we extend this notion to a set of response variables collected in the multivariate data set Y. [Pg.317]

Multivariate chemometric techniques have subsequently broadened the arsenal of tools that can be applied in QSAR. These include, among others. Multivariate ANOVA [9], Simplex optimization (Section 26.2.2), cluster analysis (Chapter 30) and various factor analytic methods such as principal components analysis (Chapter 31), discriminant analysis (Section 33.2.2) and canonical correlation analysis (Section 35.3). An advantage of multivariate methods is that they can be applied in... [Pg.384]

Sets of spectroscopic data (IR, MS, NMR, UV-Vis) or other data are often subjected to one of the multivariate methods discussed in this book. One of the issues in this type of calculations is the reduction of the number variables by selecting a set of variables to be included in the data analysis. The opinion is gaining support that a selection of variables prior to the data analysis improves the results. For instance, variables which are little or not correlated to the property to be modeled are disregarded. Another approach is to compress all variables in a few features, e.g. by a principal components analysis (see Section 31.1). This is called... [Pg.550]

Standardizing the spectral response is mathematically more complex than standardizing the calibration models but provides better results as it allows slight spectral differences - the most common between very similar instruments - to be corrected via simple calculations. More marked differences can be accommodated with more complex and specific algorithms. This approach compares spectra recorded on different instruments, which are used to derive a mathematical equation, allowing their spectral response to be mutually correlated. The equation is then used to correct the new spectra recorded on the slave, which are thus made more similar to those obtained with the master. The simplest methods used in this context are of the univariate type, which correlate each wavelength in two spectra in a direct, simple manner. These methods, however, are only effective with very simple spectral differences. On the other hand, multivariate methods allow the construction of matrices correlating bodies of spectra recorded on different instruments for the above-described purpose. The most frequent choice in this context is piecewise direct standardization... [Pg.477]

Multivariate methods, on the other hand, resolve the major sources by analyzing the entire ambient data matrix. Factor analysis, for example, examines elemental and sample correlations in the ambient data matrix. This analysis yields the minimum number of factors required to reproduce the ambient data matrix, their relative chemical composition and their contribution to the mass variability. A major limitation in common and principal component factor analysis is the abstract nature of the factors and the difficulty these methods have in relating these factors to real world sources. Hopke, et al. (13.14) have improved the methods ability to associate these abstract factors with controllable sources by combining source data from the F matrix, with Malinowski s target transformation factor analysis program. (15) Hopke, et al. (13,14) as well as Klelnman, et al. (10) have used the results of factor analysis along with multiple regression to quantify the source contributions. Their approach is similar to the chemical mass balance approach except they use a least squares fit of the total mass on different filters Instead of a least squares fit of the chemicals on an individual filter. [Pg.79]

How can multivariate methods be used to avoid the problems associated with the OVAT approach In general, multivariate methods use the information contained in the relation between the variables (correlations or covariances) and therefore data like those in Figure 6.3 present no problem. The risk of type I errors is kept under control in multivariate analysis by considering all variables simultaneously. To consider all variables simultaneously involves a... [Pg.298]

Preliminary data analysis carried out for the spectral datasets were functional group mapping, and/or hierarchical cluster analysis (HCA). This latter method, which is well described in the literature,4,9 is an unsupervised approach that does not require any reference datasets. Like most of the multivariate methods, HCA is based on the correlation matrix Cut for all spectra in the dataset. This matrix, defined by Equation (9.1),... [Pg.193]

The prediction of Y-data of unknown samples is based on a regression method where the X-data are correlated to the Y-data. The multivariate methods, usually used for such a calibration, are principal component regression (PCR) and partial least squares regression (PLS). Both methods are based on the assumption of linearity and can deal with co-linear data. The problem of co-linearity is solved in the same way as the formation of a PCA plot. The X-variables are added together into latent variables, score vectors. These vectors are independent since they are orthogonal to each other and they can therefore be used to create a calibration model. [Pg.7]

Results for elements in aerosol samples which are obtained by multielement techniques from data sets from which information about the sources of the components can be extracted (Gordon 1980). Such methods which make use of data obtained at receptor points are called receptor models. The most important receptor models are chemical mass balances (CMB), enrichment factors, time series correlation, multivariate models and spatial models (Cooper and Watson 1980 Gordon 1988). Dispersion modeling has also been used to explain the... [Pg.40]

In this section we shall consider the rather general case where for a series of chemical compounds measurements are made in a number of parallel biological tests and where a set of descriptor variables is believed to be related to the biological potencies observed. In order to imderstand the data in their entirety and to deal adequately with the mathematical properties of such data, methods of multivariate statistics are required. A variety of such methods is available as, for example, multivariate regression, canonical correlation, principal component analysis, principal component regression, partial least squares analysis, and factor analysis, which have all been applied to biological or chemical problems (for reviews, see [1-11]). Which method to choose depends on the ultimate objective of an analysis and the property of the data. We have found principal component and factor analysis particularly useful. For this reason and also since many multivariate methods make use of components for factors we will start with these methods in some detail, while the discussion of other approaches will be less extensive. [Pg.44]

Fig. 3. Comparison of some multivariate methods used in QSAR relating a matrix of biologieal variables, Y, to a matrix of deseriptor variables, X PCRA (prineipal component regression analysis) PLS (partial least-squares method) PCA (principal component analysis according to the Weiner/Malinowski approach) MRA (multivariate regression analysis) and CCA (canonical correlation analysis).

In this small-sample set, the f-test does as well as the best multivariate methods. This shows that modeling the correlation structure is not necessarily an advantage if the number of samples is low, or, alternatively, that the true correlation structure has not been captured well enough from the few samples that are available to allow meaningful inference. A definite advantage of the f-test is that it has no tunable parameters and can be applied without further optimization. It should be noted that we do not need to apply multiple-testing corrections in this context since we only use the order of the absolute size of the f-statistics to construct the ROC curves, and not a specific cut-off level a. In other applications, however, this aspect should be taken into accoimt. [Pg.152]

When the laboratory value is plotted against the NIR predicted value for the calibration sample set it may well be noted that some points lie well away from the computed regression line. This will, of course, reduce the correlation between laboratory and NIR data and increase the SEC or SEP. These samples may be outliers. The statistic hi describes the leverage or effect of an individual sample upon a regression. If a particular value of hi is exceeded this may be used to determine an outlier sample. Evaluation criteria for selecting outliers, howevei are somewhat subjective so there is a requirement for expertise in multivariate methods to make outlier selection effective. [Pg.2249]

When a core sample is taken, the deeper the layer in the core, the older it is. In many lakes, the mineral content of a cote is derived from diatoms (algae) that lived and died in the lake and fell to the bottom to be covered. Any change in pH would have an impact on the biota of the lake and the material that fell to the bottom. Thus, pH conditions are reflected in the chemical composition of the core layers. Near-IR, which is effective for pattern recognition, is well suited for the task. By taking bottom and core samples of lakes with known pH and linking pH to the MR spectrum via multivariate methods, researchers have reported some success in correlating pH to NIR spectra. As with much of forensic science, the goal was to recreate the past the difference is the time-frame. [Pg.169]

Anderson et al used LIBS spectra and three multivariate methods to perform quantitative chemical analysis of rocks. The methods used were PLS, multilayer perceptron artificial neural networks (MLP ANNs) and cascade correlation (CC) ANNs. Precision and accuracy were influenced by the ratio of laser beam diameter (490 pm) to grain size, with coarse-grained rocks often resulting in lower accuracy and precision than analyses of fine-grained rocks and powders. [Pg.354]

Anderson et al applied three multivariate methods [PLS, multilayer perceptron (MLP-ANNs) and cascade correlation (CC-ANNs)] to LIBS data in order to perform quantitative chemical analyses of rocks. [Pg.405]

In this review, aspects of IR spectroscopy for medical diagnostics have been presented. There are major differences between this new field and classical spectroscopy of biomolecules due to the size and complexity of the systems, but it is important to point out that this work is based on, and relies on, decades of research in biospectroscopy. Yet there are two major differences between classical biospectroscopy and the medical apphcations first, there is a heavy reliance on mathematical methods for data analysis, since multivariate methods of analysis are highly suitable for extracting small, correlated spectral differences that are often smaller than and are buried in uncorrelated spectral variations. Second, and even more important, is the fact that for medical apphcations, conclusions can never be based on single measurements, since the variability of the sample is enormous, both on a patient-to-patient and on a ceU-to-cell level. Thus, the authors hope that this research not only advances the field of medical diagnostics by spectral methods but also helps to usher in new ways to look and process spectral data. [Pg.219]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...