Multivariate data latent variables

The eigenvectors extracted from the cross-product matrices or the singular vectors derived from the data matrix play an important role in multivariate data analysis. They account for a maximum of the variance in the data and they can be likened to the principal axes (of inertia) through the patterns of points that represent the rows and columns of the data matrix [10]. These have been called latent variables [9], i.e. variables that are hidden in the data and whose linear combinations account for the manifest variables that have been observed in order to construct the data matrix. The meaning of latent variables is explained in detail in Chapters 31 and 32 on the analysis of measurement tables and contingency tables. [Pg.50]

On the other hand, when latent variables instead of the original variables are used in inverse calibration then powerful methods of multivariate calibration arise which are frequently used in multispecies analysis and single species analysis in multispecies systems. These so-called soft modeling methods are based, like the P-matrix, on the inverse calibration model by which the analytical values are regressed on the spectral data ... [Pg.186]

Partial least squares (PLS) projections to latent structures [40] is a multivariate data analysis tool that has gained much attention during past decade, especially after introduction of the 3D-QSAR method CoMFA [41]. PLS is a projection technique that uses latent variables (linear combinations of the original variables) to construct multidimensional projections while focusing on explaining as much as possible of the information in the dependent variable (in this case intestinal absorption) and not among the descriptors used to describe the compounds under investigation (the independent variables). PLS differs from MLR in a number of ways (apart from point 1 in Section 16.5.1) ... [Pg.399]

An essential concept in multivariate data analysis is the mathematical combination of several variables into a new variable that has a certain desired property (Figure 2.14). In chemometrics such a new variable is often called a latent variable, other names are component or factor. A latent variable can be defined as a formal combination (a mathematical function or a more general algorithm) of the variables a latent variable summarizes the variables in an appropriate way to obtain acertain property. The value of a latent variable is called score. Most often linear latent variables are used given by... [Pg.64]

Exploratory data analysis has the aim to learn about the data distribution (clusters, groups of similar objects). In multivariate data analysis, an X-matrix (objects/samples characterized by a set of variables/measurements) is considered. Most used method for this purpose is PCA, which uses latent variables with maximum variance of the scores (Chapter 3). Another approach is cluster analysis (Chapter 6). [Pg.71]

Principal component analysis (PCA) can be considered as the mother of all methods in multivariate data analysis. The aim of PCA is dimension reduction and PCA is the most frequently applied method for computing linear latent variables (components). PCA can be seen as a method to compute a new coordinate system formed by the latent variables, which is orthogonal, and where only the most informative dimensions are used. Latent variables from PCA optimally represent the distances between the objects in the high-dimensional variable space—remember, the distance of objects is considered as an inverse similarity of the objects. PCA considers all variables and accommodates the total data structure it is a method for exploratory data analysis (unsupervised learning) and can be applied to practical any A-matrix no y-data (properties) are considered and therefore not necessary. [Pg.73]

In case of two groups, the Fisher method transforms the multivariate data to a univariate discriminant variable such that the transformed groups are separated as much as possible. For this transformation, a linear combination of the original x-variables is used, in other words a latent variable. [Pg.215]

Unlike other classification methods, the PLS-DA method explicitly determines relevant multivariate directions in the data (the PLS latent variables) that optimize the separation of known classes. Second, unlike KNN, the classification rule for PLS-DA is based on statistical analysis of the prediction values, which allows one to apply prior knowledge regarding the expected analytical response distributions of the different classes. Furthermore, PLS-DA can handle cases where an unknown sample belongs to more than one class, or to no class at all. [Pg.395]

PLS falls in the category of multivariate data analysis whereby the X-matrix containing the independent variables is related to the Y-matrix, containing the dependent variables, through a process where the variance in the Y-matrix influences the calculation of the components (latent variables) of the X-block and vice versa. It is important that the number of latent variables is correct so that overfitting of the model is avoided this can be achieved by cross-validation. The relevance of each variable in the PLS-metfiod is judged by the modelling power, which indicates how much the variable participates in the model. A value close to zero indicates an irrelevant variable which may be deleted. [Pg.103]

As already mentioned, any multivariate analysis should include some validation, that is, formal testing, to extrapolate the model to new but similar data. This requires two separate steps in the computation of each model component calibration, which consists of finding the new components, and validation, which checks how well the computed components describe the new data. Each of these two steps needs its own set of samples calibration samples or training samples, and validation samples or test samples. Computation of spectroscopic data PCs is based solely on optic data. There is no explicit or formal relationship between PCs and the composition of the samples in the sets from which the spectra were measured. In addition, PCs are considered superior to the original spectral data produced directly by the NIR instrument. Since the first few PCs are stripped of noise, they represent the real variation of the spectra, presumably caused by physical or chemical phenomena. For these reasons PCs are considered as latent variables as opposed to the direct variables actually measured. [Pg.396]

The term factor is a catch-all for the concept of an identifiable property of a system whose quantity value might have some effect on the response. Factor tends to be used synonymously with the terms variable and parameter, although each of these terms has a special meaning in some branches of science. In factor analysis, a multivariate method that decomposes a data matrix to identify independent variables that can reconstitute the observed data, the term latent variable or latent factor is used to identify factors of the model that are composites of input variables. A latent factor may not exist outside the mathematical model, and it might not therefore influence... [Pg.69]

At this point it should be remarked that multivariate regression with latent variables is a useful tool for describing the relationship between complex processes and/or features in the environment. A specific example is the prediction of the relationship between the hydrocarbon profile in samples of airborne particulate matter and other variables, e.g. extractable organic material, carbon preference index of the n-alkane homologous series, and particularly mutagenicity. The predictive power was between 68% and 81% [ARMANINO et al., 1993]. VONG [1993] describes a similar example in which the method of PLS regression was used to compare rainwater data with different emission source profiles. [Pg.263]

The PLS multivariate data analysis of the training set was carried out on the descriptors matrix to correlate the complete set of variables with the activity data. From a total of 710 variables, 559 active variables remained after filtering descriptors with no variability by the ALMOND program. The PLS analysis resulted in four latent variables (LVs) with / = 0.76. The cross validation of the model using the leave-one-out (LOO) method yielded values of 0.72. As shown in Table 9.2, the GRIND descriptors 11-36, 44-49, 12-28, 13-42, 14-46, 24-46 and 34-45 were found to correlate with the inhibition activity in terms of high coefficients. [Pg.205]

PLS is a method by which blocks of multivariate data sets (tables) can be quantitatively related to each other. PLS is an acronym Partial Least Squares correlation in latent variables, or Projections to Latent Structures. The PLS method is described in detail in Chapter 17. [Pg.334]

Theory. PCA is a frequently used variable reduction technique, which can be used to visualize the objects of a multivariate data set in a lowerdimensional space. This technique calculates new latent variables, called principal components (PCs), which are hnear combinations of the original manifest... [Pg.294]

The prediction of Y-data of unknown samples is based on a regression method where the X-data are correlated to the Y-data. The multivariate methods, usually used for such a calibration, are principal component regression (PCR) and partial least squares regression (PLS). Both methods are based on the assumption of linearity and can deal with co-linear data. The problem of co-linearity is solved in the same way as the formation of a PCA plot. The X-variables are added together into latent variables, score vectors. These vectors are independent since they are orthogonal to each other and they can therefore be used to create a calibration model. [Pg.7]

Like many other statistical methods for the evaluation of biomonitoring data, the above-depicted example of a trend analysis considers only a single variable. Although the multivariate procedures consider several measured variables at the same time, their results are often only limited meaningful. Cluster analyses can reveal structures in a given data set principal component analyses concentrate the information contents of many variables in a set of a few latent variables, which are difficult to interpret correctly. [Pg.289]

In the first case, the structural description of over 100 thienyl- and furyl-benzimidazoles and benzoxazoles was multivariately characterized to identify three latent variables. A set of 16 informative molecules was derived thereafter on applying a central composite design criterion in these latent variables to all the available structures. The data were analyzed by a linear PLS model, which permitted the optimization of three structural features out of four. The fourth one, the substituent linked to the homocyclic ring of the bicyclic system was finally optimized by the CARSO procedure in terms of the substituents PPs, predicting two new compounds as possible optimal structures. Indeed, later analysis revealed the accuracy of these predictions. [Pg.32]

For most applications in the "omics" fields, even the most simple multivariate techniques such as Linear Discriminant Analysis (LDA) cannot be applied directly. From Equation 2 it is clear that an inverse of the the covariance matrix 2 needs to be calculated, which is impossible in cases where the number of variables exceeds the number of samples. In practice, the number of samples is nowhere near the number of variables. For QDA, the situation is even worse to allow a stable matrix inversion, every single class should have at least as many samples as variables (and preferably quite a bit more). A common approach is to compress the information in the data into a low number of latent variables (LVs), either using PCA (leading... [Pg.143]

PCR is a two-step multivariate calibration method involving compression of the data (x-) matrix into latent variables by principal components analysis (PCA), followed by MLR. PCA (also known as Karhunen-Loeve expansion or Eigen-xy analysis) mathematically transforms a number of possibly correlated variables into a smaller number of uncorrelated variables called eigenvectors (or PCs). Essentially, PCA is the breakdown of the original data matrix (X) to a product of scores matrix (T) and a loadings matrix (L). The loading matrix describes the direction of the PC. These relationships can be represented by the equation ... [Pg.593]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...