PLS algorithm

The field points must then be fitted to predict the activity. There are generally far more field points than known compound activities to be fitted. The least-squares algorithms used in QSAR studies do not function for such an underdetermined system. A partial least squares (PLS) algorithm is used for this type of fitting. This method starts with matrices of field data and activity data. These matrices are then used to derive two new matrices containing a description of the system and the residual noise in the data. Earlier studies used a similar technique, called principal component analysis (PCA). PLS is generally considered to be superior. [Pg.248]

PLS has been introduced in the chemometrics literature as an algorithm with the claim that it finds simultaneously important and related components of X and of Y. Hence the alternative explanation of the acronym PLS Projection to Latent Structure. The PLS factors can loosely be seen as modified principal components. The deviation from the PCA factors is needed to improve the correlation at the cost of some decrease in the variance of the factors. The PLS algorithm effectively mixes two PCA computations, one for X and one for Y, using the NIPALS algorithm. It is assumed that X and Y have been column-centred as usual. The basic NIPALS algorithm can best be demonstrated as an easy way to calculate the singular vectors of a matrix, viz. via the simple iterative sequence (see Section 31.4.1) ... [Pg.332]

The PLS algorithm is relatively fast because it only involves simple matrix multiplications. Eigenvalue/eigenvector analysis or matrix inversions are not needed. The determination of how many factors to take is a major decision. Just as for the other methods the right number of components can be determined by assessing the predictive ability of models of increasing dimensionality. This is more fully discussed in Section 36.5 on validation. [Pg.335]

S. de Jong, Comparison of PLS algorithms. Chemom. Intell. Lab. Syst. (1998)... [Pg.347]

In principle, in the absence of noise, the PLS factor should completely reject the nonlinear data by rotating the first factor into orthogonality with the dimensions of the x-data space which are spawned by the nonlinearity. The PLS algorithm is supposed to find the (first) factor which maximizes the linear relationship between the x-block scores and the y-block scores. So clearly, in the absence of noise, a good implementation of PLS should completely reject all of the nonlinearity and return a factor which is exactly linearly related to the y-block variances. (Richard Kramer)... [Pg.153]

Some of this variance was indeed rejected by the PLS algorithm, but the amount, compared to the Principal Component algorithm, seems to have been rather minuscule, rather than providing a nearly exact fit. [Pg.165]

Mean-centring and normalisation are optional. The PCR (and PLS) algorithm are essentially independent of the nature of pre-treatment of the data, only the centring has to be reversed in the prediction step. In the programs we... [Pg.297]

Partial Least Squares is the chemometrics method par excellence. There is a tremendous number of published applications and also a large number of minor improvements to the original PLS algorithm. [Pg.306]

In the standard PLS algorithm wi, is normalised to unity length and subsequently, t ,i is calculated as... [Pg.308]

A complicating aspect of most PLS algorithms is the stepwise calculation of the components. After a component is computed, the residual matrices for X (and eventually Y) are determined. The next PLS component is calculated from the residual matrices and therefore its parameters (scores, loadings, weights) do not relate to X but to the residual matrices. However, equations exist, that relate the PLS -x-loadings and PLS-x-scores to the original x-data, and that also provide... [Pg.166]

Traditional macroscale NIR spectroscopy requires a calibration set, made of the same chemical components as the target sample, but with varying concentrations that are chosen to span the range of concentrations possible in the sample. A concentration matrix is made from the known concentrations of each component. The PLS algorithm is used to create a model that best describes the mathematical relationship between the reference sample data and the concentration matrix. The model is applied to the unknown data from the target sample to estimate the concentration of sample components. This is called concentration mode PLS . [Pg.268]

These same analysis techniques can be applied to chemical imaging data. Additionally, because of the huge number of spectra contained within a chemical imaging data set, and the power of statistical sampling, the PLS algorithm can also be applied in what is called classification mode as described in Section 8.4.5. When the model is applied to data from the sample, each spectrum is scored relative to its membership to a particular class (i.e. degree of purity relative to a chemical component). Higher scores indicate more similarity to the pure component spectra. While these scores are not indicative of the absolute concentration of a chemical component, the relative abundance between the components is maintained, and can be calculated. If all sample components are accounted for, the scores for each component can be normalized to unity, and a statistical assessment of the relative abundance of the components made. [Pg.268]

There are several different PLS algorithms, the most common of which are the NIPALS algorithm [1,64], the SIMPLS algorithm [65], and the Bidiagonalization algorithm [66]. These algorithms are somewhat more... [Pg.385]

The variable selection methods discussed above certainly do not cover all selection methods that have been proposed, and there are several other methods that could be quite effective for PAT applications. These include a modified version of a PLS algorithm that includes interactive variable selection [102], and a combination of GA selection with wavelet transform data compression [25]. [Pg.424]

Figure 1 is the summary of the two block PLS algorithm using the equation numbers. [Pg.273]

Although the PCR and PLS algorithms are different, we do not feel there is overwhelming evidence to suggest that one method is superior to the other. Therefore, we do not have a strong recommendation when choosing between PCR and PLS. In practice, our tendency is to use PLS because of our experiences with it and the software tools we employ. Therefore, the examples discussed below only present the PLS results. [See Haaland and Thomas (1990) for additional readings.]... [Pg.325]

The potentials of the PLS algorithm are very well demonstrated on the spectro-fluorimetric analysis of mixtures of hiunic acid and Ugninsulfonate investigated by Lindberg et al. The problems associated with this analysis are the strong similarities... [Pg.37]

Once the basis of the PLS algorithm has been presented, it is easier to understand the advantages of this multivariate regression technique over the simpler ones MLR and PCR. Some of these advantages are already obtained in PCR. They can be summarised as follows ... [Pg.190]

These same analysis techniques can be applied to chemical imaging data. Additionally, because of the huge number of spectra contained within a chemical imaging data set, and the power of statistical sampling, the PLS algorithm can also be applied in what is called classification mode. In this case, the reference library used to establish the PLS model is... [Pg.211]

In order to handle multiple Y-variables, an extension of the PLS regression method discussed earlier, called PLS-2, must be used.1 The algorithm for the PLS-2 method is quite similar to the PLS algorithms discussed earlier. Just like the PLS method, this method determines each compressed variable (latent variable) based on the maximum variance explained in both X and Y. The only difference is that Y is now a matrix that contains several Y-variables. For PLS-2, the second equation in the PLS model (Equation 8.36) can be replaced with the following ... [Pg.292]

Two non-parametric methods for hypothesis testing with PCA and PLS are cross-validation and the jackknife estimate of variance. Both methods are described in some detail in the sections describing the PCA and PLS algorithms. Cross-validation is used to assess the predictive property of a PCA or a PLS model. The distribution function of the cross-validation test-statistic cvd-sd under the null-hypothesis is not well known. However, for PLS, the distribution of cvd-sd has been empirically determined by computer simulation technique [24] for some particular types of experimental designs. In particular, the discriminant analysis (or ANOVA-like) PLS analysis has been investigated in some detail as well as the situation with Y one-dimensional. This simulation study is referred to for detailed information. However, some tables of the critical values of cvd-sd at the 5 % level are given in Appendix C. [Pg.312]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...