Regression linear multiple

For multiple-descriptor data sets, one could use the methods in Section 17.3 to derive a correlation between y and xl, between y and x2, between y and x3, etc., to find the k data set giving the best correlation with y. It is very likely, however, that one of the other X variabled can describe part of the y variation, which is not described by k, and a third x variable describing some of the remaining variation, etc. Since the x vectors may be internally correlated, however, the second most important x vector found in a one-to-one correlation is not necessarily the most important once the k vector has been included. [Pg.555]

In order to use all the information in the x variables, a Multiple Linear Regression (MLR) of the type indicated in eq. (17.13) can be attempted. [Pg.555]

Note that each data set (y and x ) is a vector containing N data points, and the constant corresponding to b in eq. (17.5) is eliminated if the data are centred with a mean value of zero. Since the expansion coefficients are multiplied directly onto the x variables, MLR is independent of a possible scaling of the x data (a scaling just affects the magnitude of the a coefficients but does not change the correlation). [Pg.555]

The number of fitting parameters is M, and N must therefore be larger than or equal to M, and in practice one should not attempt fitting unless N 5M, as overfitting is otherwise a strong possibility. The fitting coefficients contained in the a vector can be obtained from the generalized inverse (Section 16.2) of the X matrix. [Pg.555]

This procedure works fine as long as there are relatively few x variables that are not internally correlated. In reality, however, it is very likely that some of the x vectors describe almost the same variation, and in such cases there is a large risk of overfitting the data. This can also be seen from the solution vector in eq. (17.14), the (X X) matrix has dimension M x M, and will be poorly conditioned (Section 16.2) if the X vectors are (almost) linearly dependent. Note that the presence of (experimental) noise in the x data can often mask the linear dependence, and MLR methods are therefore sensitive to noisy x data. [Pg.555]

Robust Regression Robust regression is based at an iterative Robustness means [Pg.231]

Computation is started by conventional regression analysis. Subsequently, the residuals are determined and the weights for a value of 1 are calculated for each observation. Regression is repeated as long as the parameters change only by a predefined small amount. The smaller the value for k, the more the residuals are weighted down. [Pg.231]

In the case of multivariate modeling, several independent as well as several dependent variables may operate. Out of the many regression methods, we will learn about the conventional method of ordinary least squares (OLS) as well as methods that are based on biased parameter estimations reducing simultaneously the dimensionality of the regression problem, that is, principal component regression (PCR) and the partial least squares (PLS) method. [Pg.231]

As an example of the application of these methods, spectro-metric multicomponent analysis will be considered, leading to an introduction to regression diagnostics in multiple linear regression. [Pg.231]

The general least squares problem that relates a matrix of dependent variables y to a matrix of independent variables X can be stated as follows [Pg.231]

Multiple linear regression is an extension of simple linear regression by the inclusion of extra independent variables [Pg.121]

What about an assessment of the significance of the fit of a multiple regression equation (or simple regression) to a set of data A guide to the overall significance of a regression model can be obtained by calculation of a quantity called the F statistic. This is simply the ratio of the explained mean square (MSB) to the residual mean square (MSR) [Pg.122]

An F statistic is used by looking up a standard value for F from a table of F statistics and comparing the calculated value with the tabulated value. If the calculated value is greater than the tabulated value, the equation is significant at that particular confidence level. F tables normally have [Pg.122]

The squared multiple correlation coefficient gives a measure of how well a regression model fits the data and the F statistic gives a measure of the overall significance of the fit. What about the significance of individual terms This can be assessed by calculation of the standard error of the regression coefficients, a measure of how much of the dependent variable [Pg.123]

Another useful statistic that can be calculated to characterize the fit of a regression model to a set of data is the standard error of prediction. This gives a measure of how well one might expect to be able to make [Pg.124]

In MLR, the model is a straightforward linear combination of descriptors or functions of descriptors. Because MLR is based on a linear equation, it is called a linear model (as opposed to the non-linear models to be discussed later in this [Pg.366]

This inherent simplicity is valuable in two ways. First, the number of parameters to be fit is minimal, avoiding the common problem of trying to solve too many variables with too little data. This problem is often referred to as the Curse of Dimensionality and results in a degradation of a model s generalizabiUty. Second, MLR models are easy to interpret. Each descriptor used is accompanied by a coefficient and a sign, and this information provides the relative weight and direction of each descriptor s contribution to the property of interest. [Pg.367]

MLR is most suited to the modeling of simple physical properties for which the contributions of the descriptors are more likely to be independent than cooperative. It is also the preferred technique when the amount of available data is extremely limited. MLR is less suited for modeling complex physical or biological processes, as these tend to be non-Unear in nature. [Pg.367]

An extension of linear regression, multiple linear regression (MLR) involves the use of more than one independent variable. Such a technique can be very effective if it is suspected that the information contained in a single dependent variable (x) is insufficient to explain the variation in the independent variable (y). In PAT, such a situation often occurs because of the inability to find a single analyzer response variable that is affected solely by the property of interest, without interference from other properties or effects. In such cases, it is necessary to use more than one response variable from the analyzer to build an effective calibration model, so that the effects of such interferences can be compensated. [Pg.361]

The multiple linear regression model is simply an extension of the linear regression model (Equation 12.7), and is given below [Pg.361]

The difference here is that X is a matrix that contains responses from M ( 1) different x variables, and b contains M regression coefficients for each of the x variables. As for linear regression, the coefficients for MLR (b) are determined using the least-squares method [Pg.361]

At this point, it is important to note two limitations of the MLR method [Pg.361]

In real applications, where there is noise in the data, it is rare to have two x variables exactly correlated to one another. However, a high degree of correlation between any two x variables leads to an unstable matrix inversion, which resnlts in a large amonnt of noise being introduced to the regression coefficients. Therefore, one mnst be very wary of intercorrelation between x variables when nsing the MLR method. [Pg.362]

Note that this formula involves the inversion of XjX, the covariance matrix of X. If this inverse does not exist, then A cannot be calculated by means of MLR. Singularity of XjX corresponds to the linear dependence of a subset of the X variables. So, if X contains such relationships, MLR cannot be applied. In practice, it is rare to have exact linear dependence, since the measurement errors will tend to preclude this. However, near linearity will tend to make the inverse numerically unstable and subject to large errors, so much the same effect occurs. Another formulation of the [Pg.340]

MLR equation can be obtained by expressing it in terms of the matrix of eigenvectors, P, and the diagonal matrix of eigenvalues, diag(A/), as follows [Pg.341]

If X contains (almost) collinear variables, then A/ will be (close to) zero for some U giving an eigenvalue matrix containing large values which are susceptible to small alterations in the calibration data. In our application, collinearity of the X variables is to be expected, since harmonic responses at similar frequencies and amplitudes are likely to be similar. [Pg.341]

Forward selection. Here the model is built up by adding variables in X, one at a time, to the model. The variables are added by choosing those which improve the model most at each step according to some statistical test. Variables which are collinear with a variable already added will contribute little to the quality of a subsequent model. Likewise irrelevant variables contribute little. Such variables will not, therefore, be added. Variables are added until a stopping criterion is reached, for example when the improvement afforded by addition of the next variable drops below a threshold value. [Pg.341]

Backward elimination. Here, all the variables are used to form an initial model. Variables are then selected for deletion, once again by using a statistical test and threshold to determine when to stop. In the case of BE, exactly collinear variables must be eliminated before the initial model is formed. [Pg.341]

The analysis presented for fitting a straight line to experimental data is easily extended to a curve or to a system with more than one independent variable on the basis of the principle of least squares. [Pg.605]

Consider first a system where the observed quantity y depends on two independent variables, and Xj. The resulting relationship describes the equation of a plane in three dimensions and may be written as [Pg.605]

Three parameters, namely, a, b, and c are required to specify the relationship. The equation describing the principle of least squares is [Pg.605]

The sum A is now minimized with respect to each of the adjustable parameters. This leads to the following three equations [Pg.605]

After simplification, and using the notation introduced earlier, the resulting normal equations are [Pg.605]

To gain insight into chemometric methods such as correlation analysis, Multiple Linear Regression Analysis, Principal Component Analysis, Principal Component Regression, and Partial Least Squares regression/Projection to Latent Structures... [Pg.439]

Kohonen network Conceptual clustering Principal Component Analysis (PCA) Decision trees Partial Least Squares (PLS) Multiple Linear Regression (MLR) Counter-propagation networks Back-propagation networks Genetic algorithms (GA)... [Pg.442]

While simple linear regression uses only one independent variable for modeling, multiple linear regression uses more variables. [Pg.446]

Multiple linear regression (MLR) models a linear relationship between a dependent variable and one or more independent variables. [Pg.481]

Besides these LFER-based models, approaches have been developed using whole-molecule descriptors and learning algorithms other then multiple linear regression (see Section 10.1.2). [Pg.494]

Step S Building a Multiple Linear Regression Analysis (MLRA) Model... [Pg.500]

Multiple linear regression analysis is a widely used method, in this case assuming that a linear relationship exists between solubility and the 18 input variables. The multilinear regression analy.si.s was performed by the SPSS program [30]. The training set was used to build a model, and the test set was used for the prediction of solubility. The MLRA model provided, for the training set, a correlation coefficient r = 0.92 and a standard deviation of, s = 0,78, and for the test set, r = 0.94 and s = 0.68. [Pg.500]

Alternatives to Multiple Linear Regression Discriminant Analysis, Neural Networks and Classification Methods... [Pg.718]

Multiple linear regression is strictly a parametric supervised learning technique. A parametric technique is one which assumes that the variables conform to some distribution (often the Gaussian distribution) the properties of the distribution are assumed in the underlying statistical method. A non-parametric technique does not rely upon the assumption of any particular distribution. A supervised learning method is one which uses information about the dependent variable to derive the model. An unsupervised learning method does not. Thus cluster analysis, principal components analysis and factor analysis are all examples of unsupervised learning techniques. [Pg.719]

Using a multiple linear regression computer program, a set of substituent parameters was calculated for a number of the most commonly occurring groups. The calculated substituent effects allow a prediction of the chemical shifts of the exterior and central carbon atoms of the allene with standard deviations of l.Sand 2.3 ppm, respectively Although most compounds were measured as neat liquids, for a number of compounds duplicatel measurements were obtained in various solvents. [Pg.253]

Most of the 2D QSAR methods are based on graph theoretic indices, which have been extensively studied by Randic [29] and Kier and Hall [30,31]. Although these structural indices represent different aspects of molecular structures, their physicochemical meaning is unclear. Successful applications of these topological indices combined with multiple linear regression (MLR) analysis are summarized in Ref. 31. On the other hand, parameters derived from various experiments through chemometric methods have also been used in the study of peptide QSAR, where partial least square (PLS) [32] analysis has been employed [33]. [Pg.359]

It may be necessary and possible to achieve a good Brf nsted relationship by adding another term to the equation, as Toney and Kirsch did in correlating the effects of various amines on the catalytic activity of a mutant enzyme. A simple Brf nsted plot failed, but a multiple linear regression on the variables pKa and molecular volume (of the amines) was successful. [Pg.349]

Numerous authors have devised multiple linear regression approaches to the eorrelation of solvent effects, the intent being to widen the applieability of the eorrelation and to develop insight into the moleeular factors controlling the eorrelated proeess. For example, Zilian treated polarity as a eombination of effeets measured by molar refraction, AN, and DN. Koppel and Palm write... [Pg.443]

In the above paragraphs we saw that multiple linear regression analysis on equations of the form... [Pg.444]

We now consider a type of analysis in which the data (which may consist of solvent properties or of solvent effects on rates, equilibria, and spectra) again are expressed as a linear combination of products as in Eq. (8-81), but now the statistical treatment yields estimates of both a, and jc,. This method is called principal component analysis or factor analysis. A key difference between multiple linear regression analysis and principal component analysis (in the chemical setting) is that regression analysis adopts chemical models a priori, whereas in factor analysis the chemical significance of the factors emerges (if desired) as a result of the analysis. We will not explore the statistical procedure, but will cite some results. We have already encountered examples in Section 8.2 on the classification of solvents and in the present section in the form of the Swain et al. treatment leading to Eq. (8-74). [Pg.445]

In contrast to points (l)-(3) of discussion, the effect of ion association on the conductivity of concentrated solutions is proven only with difficulty. Previously published reviews refer mainly to the permittivity of the solvent or quote some theoretical expressions for association constants which only take permittivity and distance parameters into account. Ue and Mori [212] in a recent publication tried a multiple linear regression based Eq. (62)... [Pg.488]

We will explore the two major families of chemometric quantitative calibration techniques that are most commonly employed the Multiple Linear Regression (MLR) techniques, and the Factor-Based Techniques. Within each family, we will review the various methods commonly employed, learn how to develop and test calibrations, and how to use the calibrations to estimate, or predict, the properties of unknown samples. We will consider the advantages and limitations of each method as well as some of the tricks and pitfalls associated with their use. While our emphasis will be on quantitative analysis, we will also touch on how these techniques are used for qualitative analysis, classification, and discriminative analysis. [Pg.2]

Classical least-squares (CLS), sometimes known as K-matrix calibration, is so called because, originally, it involved the application of multiple linear regression (MLR) to the classical expression of the Beer-Lambert Law of spectroscopy ... [Pg.51]

Multiple Linear Regression (MLR), Classical Least-Squares (CLS, K-matrix), Inverse Least-Squares (ILS, P-matrix)... [Pg.191]

Gonzalez, A. G., TWo Level Factorial Experimental Designs Based on Multiple Linear Regression Models A Tutorial Digest Illustrated by Case Studies, Analytica Chimica Acta 360, 1998, 227-241. [Pg.412]

Experimental polymer rheology data obtained in a capillary rheometer at different temperatures is used to determine the unknown coefficients in Equations 11 - 12. Multiple linear regression is used for parameter estimation. The values of these coefficients for three different polymers is shown in Table I. The polymer rheology is shown in Figures 2 - 4. [Pg.137]

Sathe PM, Venitz J. Comparison of neural networks and multiple linear regression as dissolution predictors. Drug Dev Ind Pharm 2003 29 349-55. [Pg.699]

See also in sourсe #XX -- [ Pg.444 , Pg.446 ]

See also in sourсe #XX -- [ Pg.358 ]

See also in sourсe #XX -- [ Pg.473 , Pg.477 , Pg.480 , Pg.498 ]

See also in sourсe #XX -- [ Pg.85 , Pg.93 ]

See also in sourсe #XX -- [ Pg.53 ]

See also in sourсe #XX -- [ Pg.398 ]

See also in sourсe #XX -- [ Pg.3 , Pg.21 , Pg.23 , Pg.28 , Pg.30 , Pg.33 , Pg.34 , Pg.35 , Pg.41 , Pg.43 , Pg.47 , Pg.107 , Pg.113 , Pg.119 , Pg.127 , Pg.134 , Pg.138 , Pg.163 , Pg.165 , Pg.166 , Pg.418 , Pg.441 , Pg.459 , Pg.460 , Pg.494 , Pg.502 ]

See also in sourсe #XX -- [ Pg.166 ]

See also in sourсe #XX -- [ Pg.512 ]

See also in sourсe #XX -- [ Pg.136 ]

See also in sourсe #XX -- [ Pg.50 , Pg.70 ]

See also in sourсe #XX -- [ Pg.312 ]

See also in sourсe #XX -- [ Pg.172 ]

See also in sourсe #XX -- [ Pg.154 , Pg.172 , Pg.236 , Pg.243 , Pg.254 , Pg.318 , Pg.354 , Pg.374 , Pg.377 , Pg.409 , Pg.413 ]

See also in sourсe #XX -- [ Pg.406 ]

See also in sourсe #XX -- [ Pg.304 ]

See also in sourсe #XX -- [ Pg.92 ]

See also in sourсe #XX -- [ Pg.401 ]

See also in sourсe #XX -- [ Pg.144 ]

See also in sourсe #XX -- [ Pg.707 , Pg.708 , Pg.709 ]

See also in sourсe #XX -- [ Pg.212 ]

See also in sourсe #XX -- [ Pg.3383 , Pg.3435 ]

See also in sourсe #XX -- [ Pg.3 , Pg.21 , Pg.23 , Pg.28 , Pg.30 , Pg.33 , Pg.34 , Pg.35 , Pg.41 , Pg.43 , Pg.47 , Pg.107 , Pg.113 , Pg.119 , Pg.127 , Pg.134 , Pg.138 , Pg.163 , Pg.165 , Pg.166 , Pg.422 , Pg.445 , Pg.463 , Pg.464 , Pg.498 , Pg.506 ]

See also in sourсe #XX -- [ Pg.114 ]

See also in sourсe #XX -- [ Pg.218 , Pg.219 , Pg.230 , Pg.231 , Pg.249 , Pg.307 , Pg.323 , Pg.324 , Pg.359 , Pg.586 ]

See also in sourсe #XX -- [ Pg.209 ]

See also in sourсe #XX -- [ Pg.59 , Pg.388 , Pg.439 ]

See also in sourсe #XX -- [ Pg.35 , Pg.50 , Pg.52 , Pg.72 ]

See also in sourсe #XX -- [ Pg.109 ]

See also in sourсe #XX -- [ Pg.493 , Pg.498 , Pg.502 ]

See also in sourсe #XX -- [ Pg.74 ]

See also in sourсe #XX -- [ Pg.10 ]

See also in sourсe #XX -- [ Pg.605 ]

See also in sourсe #XX -- [ Pg.169 , Pg.182 , Pg.187 , Pg.203 ]

See also in sourсe #XX -- [ Pg.63 ]

See also in sourсe #XX -- [ Pg.1011 , Pg.1036 ]

See also in sourсe #XX -- [ Pg.18 ]

See also in sourсe #XX -- [ Pg.2 , Pg.199 ]

See also in sourсe #XX -- [ Pg.448 ]

See also in sourсe #XX -- [ Pg.666 , Pg.699 , Pg.702 ]

See also in sourсe #XX -- [ Pg.359 , Pg.361 ]

See also in sourсe #XX -- [ Pg.136 ]

See also in sourсe #XX -- [ Pg.136 ]

See also in sourсe #XX -- [ Pg.147 ]

See also in sourсe #XX -- [ Pg.493 , Pg.498 , Pg.502 ]

See also in sourсe #XX -- [ Pg.2 , Pg.462 ]

See also in sourсe #XX -- [ Pg.310 ]

See also in sourсe #XX -- [ Pg.383 ]

See also in sourсe #XX -- [ Pg.340 , Pg.341 , Pg.344 , Pg.366 ]

See also in sourсe #XX -- [ Pg.58 ]

See also in sourсe #XX -- [ Pg.108 ]

See also in sourсe #XX -- [ Pg.227 , Pg.235 , Pg.236 , Pg.251 ]

See also in sourсe #XX -- [ Pg.147 , Pg.151 , Pg.152 , Pg.164 ]

See also in sourсe #XX -- [ Pg.212 ]

See also in sourсe #XX -- [ Pg.353 ]

See also in sourсe #XX -- [ Pg.187 , Pg.213 , Pg.215 ]

See also in sourсe #XX -- [ Pg.302 , Pg.362 ]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...