Latent variable regression analysis

To conclude the list of approachesfor variable selection, we briefly mention the idea of using the output from an all variable selection run as input into a latent variables regression analysis." This technique is expensive computationally as it requires not only the subsets of variables to be found but also the number of latent variables needed to optimize some criterion function. [Pg.327]

Partial Least Squares Regression (PLS) is a multivariate calibration technique, based on the principles of Latent Variable Regression. Originated in a slightly different form in the field of econometrics, PLS has entered the spectroscopic scene.46,47,48 It is mostly employed for quantitative analysis of mixtures with overlapping bands (e.g. mixture of glucose, fructose and sucrose).49,50... [Pg.405]

Another problem is to determine the optimal number of descriptors for the objects (patterns), such as for the structure of the molecule. A widespread observation is that one has to keep the number of descriptors as low as 20 % of the number of the objects in the dataset. However, this is correct only in case of ordinary Multilinear Regression Analysis. Some more advanced methods, such as Projection of Latent Structures (or. Partial Least Squares, PLS), use so-called latent variables to achieve both modeling and predictions. [Pg.205]

Other chemometrics methods to improve caUbration have been advanced. The method of partial least squares has been usehil in multicomponent cahbration (48—51). In this approach the concentrations are related to latent variables in the block of observed instmment responses. Thus PLS regression can solve the colinearity problem and provide all of the advantages discussed earlier. Principal components analysis coupled with multiple regression, often called Principal Component Regression (PCR), is another cahbration approach that has been compared and contrasted to PLS (52—54). Cahbration problems can also be approached using the Kalman filter as discussed (43). [Pg.429]

Partial least squares regression (PLS). Partial least squares regression applies to the simultaneous analysis of two sets of variables on the same objects. It allows for the modeling of inter- and intra-block relationships from an X-block and Y-block of variables in terms of a lower-dimensional table of latent variables [4]. The main purpose of regression is to build a predictive model enabling the prediction of wanted characteristics (y) from measured spectra (X). In matrix notation we have the linear model with regression coefficients b ... [Pg.544]

On the other hand, when latent variables instead of the original variables are used in inverse calibration then powerful methods of multivariate calibration arise which are frequently used in multispecies analysis and single species analysis in multispecies systems. These so-called soft modeling methods are based, like the P-matrix, on the inverse calibration model by which the analytical values are regressed on the spectral data ... [Pg.186]

Oveilitting Oceifitting occurs when the model used to describe a data set is overly convex. An example of this in regression analysis is the use of a sccond-onfcr polynomial to describe the relationship between two variables when the true relationship is a straight line. In chemometrics, the most common exan le of overfitting is the use of too many latent variables in a... [Pg.8]

The method which satisfies these conditions is partial least squares (PLS) regression analysis, a relatively recent statistical technique (18, 19). The basis of tiie PLS method is that given k objects, characterised by i descriptor variables, which form the X-matrix, and j response variables which form the Y-matrix, it is possible to relate the two blocks (or data matrices) by means of the respective latent variables u and 1 in such a way that the two data sets are linearly dependent ... [Pg.103]

There are several distinctions of the PLS-DA method versus other classification methods. First of all, the classification space is unique. It is not based on X-variables or PCs obtained from PCA analysis, but rather the latent variables obtained from PLS or PLS-2 regression. Because these compressed variables are determined using the known class membership information in the calibration data, they should be more relevant for separating the samples by their classes than the PCs obtained from PCA. Secondly, the classification rule is based on results obtained from quantitative PLS prediction. When this method is applied to an unknown sample, one obtains a predicted number for each of the Y-variables. Statistical tests, such as the /-test discussed earlier (Section 8.2.2), can then be used to determine whether these predicted numbers are sufficiently close to 1 or 0. Another advantage of the PLS-DA method is that it can, in principle, handle cases where an unknown sample belongs to more than one class, or to no class at all. [Pg.293]

Correlation and regression analysis - with direct variables - with latent variables Quantitative description of the relationships between variables... [Pg.7]

It is a common misunderstanding that a statistical analysis is only possible when the number of experiments exceeds the number of variables. This is true of multiple regression, but it is note true of PLS. As PLS is based on projections, it can handle any number of variables provided that the number of underlying latent variables (cf. principal components) is less than the number of objects. [Pg.52]

The molecular descriptors for a CoMFA analysis number in the hundreds or thousands, even for datasets of twenty or so compounds. A multiple regression equation cannot be fitted for such a dataset. In such cases. Partial Least Squares (PLS) is the appropriate method. PLS unravels the relationship between log (1/C) and molecular properties by extracting from the data matrix linear combinations (latent variables) of molecular properties that best explain log (1/C). Because the individual properties are correlated (for example, steric properties at adjacent lattice points), more than one contributes to each latent variable. The first latent variable extracted explains most of the variance in log (1/C) the second the next greatest degree of variance, etc. At each step iP and s are calculated to help one decide when enough variables have been extracted—the maximum number of extracted variables is found when extracting another does not decrease x substantially. Cross-validation, discussed in Section 3.5.3, is commonly used to decide how many latent variables are significant. For example, Table 3.5 summarizes the CoMFA PLS analysis... [Pg.80]

PCA is not only used as a method on its own but also as part of other mathematical techniques such as SIMCA classification (see section on parametric classification methods), principal component regression analysis (PCRA) and partial least-squares modelling with latent variables (PLS). Instead of original descriptor variables (x-variables), PCs extracted from a matrix of x-variables (descriptor matrix X) are used in PCRA and PLS as independent variables in a regression model. These PCs are called latent variables in this context. [Pg.61]

How are PLS models used One obvious way is to simply make predictions for test set samples by calculation of their latent variables from eqn (7.10) and application of the appropriate regression coefficients (eqn 7.9). The latent variables may be used like PCs for data display (see Chapter 4) by the construction of scores plots for samples and loadings plots for variables. A PLS analysis of halogenated ether anaesthetics allowed the production of the scores plot shown in Fig. 7.6 in which anaesthetics with similar side-effects are grouped together (Hellberg et al. [Pg.155]

The goal of this study is to test hypotheses about the relationships between multiple independent variables and one dependent variable. As most of my latent constructs are measured on interval scales and I expect linear relationships between the variables, multiple linear regression analysis with ordinary least squares estimation was used (Cohen 2003 Tabachnick and Fidell 1989). The study had two thematically separate parts the first part is focused on the antecedents of lead usemess of employees in firms (n=83, hypotheses 1-3, dependent variable lead usemess). in the second part, h3 otheses about lead usemess of employees and behavioral outcomes are tested (n=149, hypotheses 4-8, dependent variables innovative work behavior, internal boundaiy spanning behavior, external boundary spanning behavior, organizational... [Pg.136]

This book contains several different NIR applications in food analysis, and many of them use multivariate data handling. Our aim in this chapter is to discuss the aspects of latent variable decomposition in principal component analysis and partial least squares regression and to illustrate their use by an application in the NIR region. [Pg.146]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...