Latent variables calibration

On the other hand, when latent variables instead of the original variables are used in inverse calibration then powerful methods of multivariate calibration arise which are frequently used in multispecies analysis and single species analysis in multispecies systems. These so-called soft modeling methods are based, like the P-matrix, on the inverse calibration model by which the analytical values are regressed on the spectral data ... [Pg.186]

In multivariate calibration, the latent variable has maximum correlation coefficient or covariance with a y-property, and can therefore be used to predict this property. [Pg.65]

Multivariate calibration has the aim to develop mathematical models (latent variables) for an optimal prediction of a property y from the variables xi,..., jcm. Most used method in chemometrics is partial least squares regression, PLS (Section 4.7). An important application is for instance the development of quantitative structure—property/activity relationships (QSPR/QSAR). [Pg.71]

FIGURE 4.25 PLS2 works with X- and K-matrix in this scheme both have three dimensions. t and u are linear latent variables with maximum covariance of the scores (inner relation) the corresponding loading vectors are p und q. The second pair of x- and y-components is not shown. A PLS2 calibration model allows a joint prediction of all y-variables from the x-variables via x- and y-scores. [Pg.167]

Figure 12.26 Plot of the calibration error (RMSEE) and the validation error (RMSEP) as a function of the number of latent variables, for the case where 63 of the styrene-butadiene copolymer samples were selected for calibration, and the remaining seven samples were used for validation.

M. Sjostrom, S. Wold, W. Lindberg, J.A. Persson and H. Martens, A multivariate calibration problem in analytical chemistry solved by partial least squares models in latent variables. Anal. Chim. Acta, 150, 61-70 (1983). [Pg.434]

The optimal complexity of the PLS model, that is, the most appropriate number of latent variables, is determined by evaluating, with a proper validation strategy (see Section Vl.F), the prediction error corresponding to models with increasing complexity. The parameter considered is usually the standard deviation of the error of calibration (SDEC), if computed with the objects used for building the model, or the standard deviation of the error of prediction (SDEP), if computed with objects not used for building the model (see Section Vl.F). [Pg.95]

FIGURE 2.18 Typical profile of calibration and prediction errors as a function of the PLS model complexity (number of latent variables). The examination of such a plot may be helpful in selecting the optimal model complexity. [Pg.96]

PLS is best described in matrix notation where the matrix X represents the calibration matrix (the training set, here physicochemical parameters) and Y represents the test matrix (the validation set, here the coordinates of the odor stimulus space). If there are n stimuli, p physicochemical parameters, and m dimensions of the stimulus space, the equations in Figure 6a apply. The C matrix is an m x p coefficient matrix to be determined and the residuals not explained by the model are contained in E. The X matrix is decomposed as shown in Figure 6b into two small matrices, an n x a matrix T and an a x p matrix B where a << n and a << p. F is the error matrix. The computation of T is such that it both models X and correlates with T and is accomplished with a weight matrix W and a set of latent variables U for Y with a corresponding loading matrix B. ... [Pg.47]

As already mentioned, any multivariate analysis should include some validation, that is, formal testing, to extrapolate the model to new but similar data. This requires two separate steps in the computation of each model component calibration, which consists of finding the new components, and validation, which checks how well the computed components describe the new data. Each of these two steps needs its own set of samples calibration samples or training samples, and validation samples or test samples. Computation of spectroscopic data PCs is based solely on optic data. There is no explicit or formal relationship between PCs and the composition of the samples in the sets from which the spectra were measured. In addition, PCs are considered superior to the original spectral data produced directly by the NIR instrument. Since the first few PCs are stripped of noise, they represent the real variation of the spectra, presumably caused by physical or chemical phenomena. For these reasons PCs are considered as latent variables as opposed to the direct variables actually measured. [Pg.396]

A convenient analogy for understanding latent variables is reconstructing the spectrum of a mixture from the spectra of the pure chemicals contained in the mixture. The spectra of these pure chemicals would be the latent variables of the measured spectrum because they are not directly accessible in the spectrum of the mixture. However, PCs are not necessarily the spectra of the pure chemicals in the mixtures representing the samples. PCs represent whatever independent phenomena affect the spectra of the samples composing the calibration set. If one sample constituent varies entirely independently of everything else, and this constituent has a spectrum of its own, then one of the PCs will indeed represent the spectrum of that constituent. It is most unusual for any one constituent to vary in a manner that is exactly independent of any other. There is inevitably some correlation between the various constituents in a set of specimens, and any PC will represent the sum of the effects of these correlated constituents. Even if full independence is accomplished, there is dependence in that the sum of all constituents must equal 100%. Consequently, the PC representing that source of independent variability will look like the difference between the constituent of interest and all the other constituents in the samples. The spectrum of the constituent considered could be extracted mathematically, but the PCs will not look exactly like the spectrum of the pure constituent. [Pg.396]

X A) is related to the calibration samples and is termed the matrix of scores and E (/ x J) is the part of the data that is not modelled by the A factors. As mentioned above, A is the number of latent variables that are retained in the model. The loadings in P define a new coordinate system (a rotation of the original axis of the measured variables) and the scores T are the coordinates of the samples in this new coordinate system (see Figure 3.4 for a brief description). [Pg.175]

RMSEP = [L(y - J rcfcrcncc) /A] (A = numbcr of validation samples), with concentration units. The behaviour of RMSEP is depicted at Figure 4.14, i.e. it is high whenever a too low or too high number of latent variables are included in the model and it decreases more or less sharply at the vicinity of the optimum number of factors. Nevertheless, you should not use the validation set to select the model because, then, you would need another true validation set. As was explained in previous sections, if you have many samples (which is seldom the case) you can develop three sample sets one for calibration, one for fine-tuning the model and, finally, another one for a true validation. [Pg.222]

It is worth comparing briefly the PLS (Chapter 4) and ANN models. The ANN selected finally uses four neurons in the hidden layer, which is exactly the same number of latent variables as selected for PLS, a situation reported fairly frequently when PLS and ANN models perform similarly. The RMSBC and RMSBP were slightly higher for PLS, 1.4 and L5pgmU respectively, and they were outperformed by the ANN (0.7 and 0.5pgnil respectively). The best predictive capabilities of the neural network might be attributed to the presence of some sort of spectral nonlinearities in the calibration set and/or some spectral behaviour not easy to account for by the PLS linear models. [Pg.269]

There are some distinct advantages of the PLS regression method over the PCR method. Because Y-data are used in the data compression step, it is often possible to build PLS models that are simpler (i.e. require fewer compressed variables), yet just as effective as more complex PCR models built from the same calibration data. In the process analytical world, simpler models are more stable over time and easier to maintain. There is also a small advantage of PLS for qualitative interpretative purposes. Even though the latent variables in PLS are still abstract, and rarely express pure chemical or physical phenomena, they are at least more relevant to the problem than the PCs obtained from PCR. [Pg.263]

In order to assess the optimal complexity of a model, the RMSEP statistics for a series of different models with different complexity can be compared. In the case of PLS models, it is most common to plot the RMSEP as a function of the number of latent variables in the PLS model. In the styrene—butadiene copolymer example, an external validation set of 7 samples was extracted from the data set, and the remaining 63 samples were used to build a series of PLS models for ris-butadicne with 1 to 10 latent variables. These models were then used to predict the ris-butadicne of the seven samples in the external validation set. Figure 8.19 shows both the calibration fit error (in RMSEE) and the validation prediction error (RMSEP) as a function of the number of... [Pg.269]

There are several distinctions of the PLS-DA method versus other classification methods. First of all, the classification space is unique. It is not based on X-variables or PCs obtained from PCA analysis, but rather the latent variables obtained from PLS or PLS-2 regression. Because these compressed variables are determined using the known class membership information in the calibration data, they should be more relevant for separating the samples by their classes than the PCs obtained from PCA. Secondly, the classification rule is based on results obtained from quantitative PLS prediction. When this method is applied to an unknown sample, one obtains a predicted number for each of the Y-variables. Statistical tests, such as the /-test discussed earlier (Section 8.2.2), can then be used to determine whether these predicted numbers are sufficiently close to 1 or 0. Another advantage of the PLS-DA method is that it can, in principle, handle cases where an unknown sample belongs to more than one class, or to no class at all. [Pg.293]

Sjoestroem, M., Wold, S., Lindberg, W., Persson, J.A. and Martens, H., A Multivariate Calibration Problem in Analytical Chemistry Solved by Partial Least Squares Models in Latent Variables Anal. Chim. Acta 1983, 150, 61-70. [Pg.325]

Partial Least Squares Regression (PLS) is a multivariate calibration technique, based on the principles of Latent Variable Regression. Originated in a slightly different form in the field of econometrics, PLS has entered the spectroscopic scene.46,47,48 It is mostly employed for quantitative analysis of mixtures with overlapping bands (e.g. mixture of glucose, fructose and sucrose).49,50... [Pg.405]

PLS has been used mainly for calibration purposes in analytical chemistry. In this case the determination of unknown concentrations is the most important demand. In spectroscopic research, there is also the interpretation of diagnostic plots such as the score plots and loading plots as a function of reaction mechanisms and spectroscopic background knowledge. Also the interpretation of rank as complexity of a mechanism is a valuable tool. A nice property of latent variable methods is that they do not demand advanced knowledge of the system studied, but that the measurements... [Pg.417]

Theory. When PLSDA (8) is used to allocate new samples in different classes, first, a classical PLS model is built for a calibration set of samples. In classical PLS, first, the number of explanatory variables is reduced by creating new latent variables (factors), which maximize the covariance between the explanatory and response variables. The obtained factors are then used to build a linear regression model. [Pg.311]

The prediction of Y-data of unknown samples is based on a regression method where the X-data are correlated to the Y-data. The multivariate methods, usually used for such a calibration, are principal component regression (PCR) and partial least squares regression (PLS). Both methods are based on the assumption of linearity and can deal with co-linear data. The problem of co-linearity is solved in the same way as the formation of a PCA plot. The X-variables are added together into latent variables, score vectors. These vectors are independent since they are orthogonal to each other and they can therefore be used to create a calibration model. [Pg.7]

In the arsenal of calibration methods there are methods more suited for modelling any number of correlated variables. The most popular among them are Principal Component Regression (PCR) and Partial Least Squares (PLS) [3], Their models are based on a few orthogonal latent variables, each of them being a linear combination of all original variables. As all the information contained in the spectra can be used for the modelling, these methods are often called the full-spectrum methods. ... [Pg.323]

Table 3. Results of PLS models for fresh Duke berry samples (r = coefficient of correlation RMSEC = root mean square of the standard error in calibration RMSEGV = root mean square of the standard error in cross-validation LV = latent variables). All data were preprocessed by second derivative of reduced and smoothed data.

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...