Predictive least squares

The least squares regression approach will be illustrated using these data as an example for the model Y = bX + a. The calculations are shown in Table 16. The predicted least squares parameters are calculated by the following steps. [Pg.49]

When the value of a is 2 in the expression for GIC, it takes the form of the Akaike information criterion (AIC). The optimum model order p is one that minimizes the generalized or Akaike information criterion. More recent methods to estimate model order for signal sequences with a finite number of sample points include predictive least squares (PLS) and finite sample information criteria (FSIC). ... [Pg.447]

Another problem is to determine the optimal number of descriptors for the objects (patterns), such as for the structure of the molecule. A widespread observation is that one has to keep the number of descriptors as low as 20 % of the number of the objects in the dataset. However, this is correct only in case of ordinary Multilinear Regression Analysis. Some more advanced methods, such as Projection of Latent Structures (or. Partial Least Squares, PLS), use so-called latent variables to achieve both modeling and predictions. [Pg.205]

Partial Least Squares Regression, also called Projection to Latent Structures, can be applied to estabfish a predictive model, even if the features are highly correlated. [Pg.449]

The field points must then be fitted to predict the activity. There are generally far more field points than known compound activities to be fitted. The least-squares algorithms used in QSAR studies do not function for such an underdetermined system. A partial least squares (PLS) algorithm is used for this type of fitting. This method starts with matrices of field data and activity data. These matrices are then used to derive two new matrices containing a description of the system and the residual noise in the data. Earlier studies used a similar technique, called principal component analysis (PCA). PLS is generally considered to be superior. [Pg.248]

The response produced by Eq. (8-26), c t), can be found by inverting the transfer function, and it is also shown in Fig. 8-21 for a set of model parameters, K, T, and 0, fitted to the data. These parameters are calculated using optimization to minimize the squarea difference between the model predictions and the data, i.e., a least squares approach. Let each measured data point be represented by Cj (measured response), tj (time of measured response),j = 1 to n. Then the least squares problem can be formulated as ... [Pg.724]

For many applications, quantitative band shape analysis is difficult to apply. Bands may be numerous or may overlap, the optical transmission properties of the film or host matrix may distort features, and features may be indistinct. If one can prepare samples of known properties and collect the FTIR spectra, then it is possible to produce a calibration matrix that can be used to assist in predicting these properties in unknown samples. Statistical, chemometric techniques, such as PLS (partial least-squares) and PCR (principle components of regression), may be applied to this matrix. Chemometric methods permit much larger segments of the spectra to be comprehended in developing an analysis model than is usually the case for simple band shape analyses. [Pg.422]

We now use CLS to generate calibrations from our two training sets, A1 and A2. For each training set, we will get matrices, Kl and K2, respectively, containing the best least-squares estimates for the spectra of pure components 1-3, and matrices, Kl i and K2cnl, each containing 3 rows of calibration coefficients, one row for each of the 3 components we will predict. First, we will compare the estimated pure component spectra to the actual spectra we started with. Next, we will see how well each calibration matrix is able to predict the concentrations of the samples that were used to generate that calibration. Finally, we will see how well each calibration is able to predict the... [Pg.54]

Fig. 14a c. Anisotropy of n.m.r. second moment in P 4GT (specimen B) and the best least squares fit of the anisotropy predicted from (a) Hall-Pass model 6, 18% crystallinity (b) Hall-Pass model 7, 40% crystallinity (c) Yokouchi model, 37% crystallinity. Reproduced from Polymer by permission of the publishers Butterworth Co (Publishers) Ltd. (C)... [Pg.112]

A Brief Review of the QSAR Technique. Most of the 2D QSAR methods employ graph theoretic indices to characterize molecular structures, which have been extensively studied by Radic, Kier, and Hall [see 23]. Although these structural indices represent different aspects of the molecular structures, their physicochemical meaning is unclear. The successful applications of these topological indices combined with MLR analysis have been summarized recently. Similarly, the ADAPT system employs topological indices as well as other structural parameters (e.g., steric and quantum mechanical parameters) coupled with MLR method for QSAR analysis [24]. It has been extensively applied to QSAR/QSPR studies in analytical chemistry, toxicity analysis, and other biological activity prediction. On the other hand, parameters derived from various experiments through chemometric methods have also been used in the study of peptide QSAR, where partial least-squares (PLS) analysis has been employed [25]. [Pg.312]

Because of peak overlappings in the first- and second-derivative spectra, conventional spectrophotometry cannot be applied satisfactorily for quantitative analysis, and the interpretation cannot be resolved by the zero-crossing technique. A chemometric approach improves precision and predictability, e.g., by the application of classical least sqnares (CLS), principal component regression (PCR), partial least squares (PLS), and iterative target transformation factor analysis (ITTFA), appropriate interpretations were found from the direct and first- and second-derivative absorption spectra. When five colorant combinations of sixteen mixtures of colorants from commercial food products were evaluated, the results were compared by the application of different chemometric approaches. The ITTFA analysis offered better precision than CLS, PCR, and PLS, and calibrations based on first-derivative data provided some advantages for all four methods. ... [Pg.541]

Partial least squares regression (PLS). Partial least squares regression applies to the simultaneous analysis of two sets of variables on the same objects. It allows for the modeling of inter- and intra-block relationships from an X-block and Y-block of variables in terms of a lower-dimensional table of latent variables [4]. The main purpose of regression is to build a predictive model enabling the prediction of wanted characteristics (y) from measured spectra (X). In matrix notation we have the linear model with regression coefficients b ... [Pg.544]

Mathematical Models. As noted previously, a mathematical model must be fitted to the predicted results shown In each factorial table generated by each scientist. Ideally, each scientist selects and fits an appropriate model based upon theoretical constraints and physical principles. In some cases, however, appropriate models are unknown to the scientists. This Is likely to occur for experiments Involving multifactor, multidisciplinary systems. When this occurs, various standard models have been used to describe the predicted results shown In the factorial tables. For example, for effects associated with lognormal distributions a multiplicative model has been found useful. As a default model, the team statistician can fit a polynomial model using standard least square techniques. Although of limited use for Interpolation or extrapolation, a polynomial model can serve to Identify certain problems Involving the relationships among the factors as Implied by the values shown In the factorial tables. [Pg.76]

Norinder, U., Osterberg, T. Theoretical calculation and prediction of drug transport processes using simple parameters and partial least squares projections to latent structures (PLS) statistics. The use of electrotopological state indices./. Pharm. Sci. 2001, 90, 1075-1085. [Pg.107]

The least squares criterion states that the norm of the error between observed and predicted (dependent) measurements 11 y - yl I must be minimal. Note that the latter condition involves the minimization of a sum of squares, from which the unknown elements of the vector b can be determined, as is explained in Chapter 10. [Pg.53]

P.J. Lewi, B. Vekemans and L.M. Gypen, Partial least squares (PLS) for the prediction of real-life performance from laboratory results. In Scientific Computing and Automation (Europe) 1990. E.J. Kaijalainen (Ed.). Elsevier, Amsterdam, 1990, pp. 199-210. [Pg.159]

Some of the results are collected in Table 35.7. Table 35.7a shows that some sensory attributes can be fitted rather well by the RRR model, especially yellow and green (/ == 0.75), whereas for instance brown and syrup do much worse R 0.40). These fits are based on the first two PCs of the least-squares fit (Y. The PCA on the OLS predictions showed the 2-dimensional approximation to be very good, accounting for 99.2% of the total variation of Y. The table shows the PC weights of the (fitted) sensory variables. Particularly the attributes brown , and to a lesser extent syrup , stand out as being different and being the main contributors to the second dimension. [Pg.327]

The purpose of Partial Least Squares (PLS) regression is to find a small number A of relevant factors that (i) are predictive for Y and (u) utilize X efficiently. The method effectively achieves a canonical decomposition of X in a set of orthogonal factors which are used for fitting Y. In this respect PLS is comparable with CCA, RRR and PCR, the difference being that the factors are chosen according to yet another criterion. [Pg.331]

Principal covariates regression (PCovR) is a technique that recently has been put forward as a more flexible alternative to PLS regression [17]. Like CCA, RRR, PCR and PLS it extracts factors t from X that are used to estimate Y. These factors are chosen by a weighted least-squares criterion, viz. to fit both Y and X. By requiring the factors to be predictive not only for Y but also to represent X adequately, one introduces a preference towards the directions of the stable principal components of X. [Pg.342]

M. Stone and R.J. Brooks, Continuum regression cross-validated sequentially constructed prediction embracing ordinary least sqaures, partial least squares, and principal component regression. J. Roy. Stat. Soc. B52 (1990) 237-269. [Pg.347]

However, our preoccupation is with the opposite application given a newly measured spectrum y , what is the most likely mixture composition and, how precise is the estimate Thus, eq. (36.2) is necessary for a proper estimation of the parameters B, but we have to invert the relation y =fix) = xB into, say, x = g y) for the purpose of making future predictions about x (concentration) given y (spectrum). We will treat this case of controlled calibration using classical least squares (CLS) estimation in Section 36.2.1. [Pg.352]

K. Faber and B.R. Kowalski, Propagation of measurement errors for the validation of predictions obtained by principal component regession and partial least squares. J. Chemom., 11 (1997) 181-238. [Pg.381]

Partial Least Squares (PLS) regression (Section 35.7) is one of the more recent advances in QSAR which has led to the now widely accepted method of Comparative Molecular Field Analysis (CoMFA). This method makes use of local physicochemical properties such as charge, potential and steric fields that can be determined on a three-dimensional grid that is laid over the chemical stmctures. The determination of steric conformation, by means of X-ray crystallography or NMR spectroscopy, and the quantum mechanical calculation of charge and potential fields are now performed routinely on medium-sized molecules [10]. Modem optimization and prediction techniques such as neural networks (Chapter 44) also have found their way into QSAR. [Pg.385]

A difficulty with Hansch analysis is to decide which parameters and functions of parameters to include in the regression equation. This problem of selection of predictor variables has been discussed in Section 10.3.3. Another problem is due to the high correlations between groups of physicochemical parameters. This is the multicollinearity problem which leads to large variances in the coefficients of the regression equations and, hence, to unreliable predictions (see Section 10.5). It can be remedied by means of multivariate techniques such as principal components regression and partial least squares regression, applications of which are discussed below. [Pg.393]

While principal components models are used mostly in an unsupervised or exploratory mode, models based on canonical variates are often applied in a supervisory way for the prediction of biological activities from chemical, physicochemical or other biological parameters. In this section we discuss briefly the methods of linear discriminant analysis (LDA) and canonical correlation analysis (CCA). Although there has been an early awareness of these methods in QSAR [7,50], they have not been widely accepted. More recently they have been superseded by the successful introduction of partial least squares analysis (PLS) in QSAR. Nevertheless, the early pattern recognition techniques have prepared the minds for the introduction of modem chemometric approaches. [Pg.408]

A drawback of the method is that highly correlating canonical variables may contribute little to the variance in the data. A similar remark has been made with respect to linear discriminant analysis. Furthermore, CCA does not possess a direction of prediction as it is symmetrical with respect to X and Y. For these reasons it is now replaced by two-block or multi-block partial least squares analysis (PLS), which bears some similarity with CCA without having its shortcomings. [Pg.409]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...