Multiple linear regression procedures

We now consider a type of analysis in which the data (which may consist of solvent properties or of solvent effects on rates, equilibria, and spectra) again are expressed as a linear combination of products as in Eq. (8-81), but now the statistical treatment yields estimates of both a, and jc,. This method is called principal component analysis or factor analysis. A key difference between multiple linear regression analysis and principal component analysis (in the chemical setting) is that regression analysis adopts chemical models a priori, whereas in factor analysis the chemical significance of the factors emerges (if desired) as a result of the analysis. We will not explore the statistical procedure, but will cite some results. We have already encountered examples in Section 8.2 on the classification of solvents and in the present section in the form of the Swain et al. treatment leading to Eq. (8-74). [Pg.445]

In a general way, we can state that the projection of a pattern of points on an axis produces a point which is imaged in the dual space. The matrix-to-vector product can thus be seen as a device for passing from one space to another. This property of swapping between spaces provides a geometrical interpretation of many procedures in data analysis such as multiple linear regression and principal components analysis, among many others [12] (see Chapters 10 and 17). [Pg.53]

Vukjovic et al.199 recently proposed a simple, fast, sensitive, and low-cost procedure based on solid phase spectrophotometric (SPS) and multicomponent analysis by multiple linear regression (MA) to determine traces of heavy metals in pharmaceuticals. Other spectroscopic techniques employed for high-throughput pharmaceutical analysis include laser-induced breakdown spectroscopy (LIBS),200 201 fluorescence spectroscopy,202 204 diffusive reflectance spectroscopy,205 laser-based nephelometry,206 automated polarized light microscopy,207 and laser diffraction and image analysis.208... [Pg.269]

Although the term theoretical techniques in relation to electronic effects may commonly be taken to refer to quantum-mechanical methods, it is appropriate also to mention the application of chemometric procedures to the analysis of large data matrices. This is in a way complementary to analysis through substituent constants based on taking certain systems as standards and applying simple or multiple linear regression. Chemometrics involves the analysis of suitable data matrices through elaborate statistical procedures,... [Pg.506]

Simple and valence indices up to sixth order were computed for all the PAHs used in the present study database. The program MOLCONN2 [133, 152,154, 156] performed these calculations using the chemical structural formula as input. SAS [425] was used on a mainframe computer to perform statistical analyses. First, indices were selected which explained the greatest amount of variance in the data (i.e., R2 procedure). These indices were then used in a multiple linear regression analysis (REG procedure). [Pg.289]

Techniques for parameter estimation vary considerably. If consistent values for model parameters cannot be obtained, the investigators may decide that the model is itself unreliable and should be changed. Thus, model choice and parameter estimation are interactive. A number of workers have discussed generalised procedures [16—18]. Yeh [19] developed numerical algorithms and showed that multiple linear regression could be used successfully if the reaction scheme consisted of steps such as those shown in eqn. (41) or... [Pg.125]

Empirical multiple linear regression models were developed to describe the foam capacity and stability data of Figures 2 and 4 as a function of pH and suspension concentration (Tables III and IV). These statistical analyses and foaming procedures were modeled after data published earlier (23, 24, 29, 30, 31). The multiple values of 0.9601 and 0.9563 for foam capacity and stability, respectively, were very high, indicating that approximately 96% of the variability contributing to both of these functional properties of foam was accounted for by the seven variables used in the equation. [Pg.158]

Li etal. discuss the use of on-line Raman spectroscopy to dynamically model the synthesis of aspirin, one of the most documented and well-understood reactions in organic chemistry. That makes it an excellent choice for building confidence in the sampling interface, Raman instrumentation, and analysis procedures. The researchers used wavelets during analysis to remove fluorescent backgrounds in the spectra and modeled the concentrations with multiple linear regression.53... [Pg.154]

Following the same procedure as for multiple linear regression, the values of the... [Pg.142]

Chemometrics is the discipline concerned with the application of statistical and mathematical methods to chemical data [2.18], Multiple linear regression, partial least squares regression and the analysis of the main components are the methods that can be used to design or select optimal measurement procedures and experiments, or to provide maximum relevant chemical information from chemical data analysis. Common areas addressed by chemometrics include multivariate calibration, visualisation of data and pattern recognition. Biometrics is concerned with the application of statistical and mathematical methods to biological or biochemical data. [Pg.31]

As a first attempt, a linear model was assumed and a multiple linear regression (MLR) was investigated. Applying a stepwise procedure, the explicative input variables were discriminated. The explicative variables were successively included in the model provided that the coefficient of determination could be significantly improved. The best linear regression was then derived based on the maximum r criteria. [Pg.267]

These special cases of multiple linear regression analysis have been developed for the determination of the impact of individual molecular substructures (independent variables) on one dependent variable. Both techniques are similar yet, the Free-Wilson method considers the retention of the unsubstituted analyte as base, while Fujita-Ban analysis uses the less substituted molecule as reference. These procedures have not been frequently employed in chromatography only their application in QSRR studies in RP TLC and HPLC have been reported. [Pg.353]

These parameters can be estimated by multiple linear regression. This method is described below. By this procedure, the polynomial model is fitted to known experimental results so that the deviations between the observed responses and the corresponding responses calculated from the model are as small as possible. How these calculations are done and how the experiments should be laid out to obtain good estimates of the model parameters is treated in detail in the chapters that follow. [Pg.35]

The major conceptual limitation of regression techniques is that one can only ascertain relationships but can never be sure about the underlying causal mechanism. As mentioned already, multiple linear regression analysis assumes that the relationship between variables is linear. In practice, this assumption is rarely met. Fortunately, multiple regression procedures are not greatly affected by minor deviations from... [Pg.85]

For the estimation of components concentration, a second step is required, based on a multiple linear regression (MLR, see Section 3.1.3) between the absorbance values and the PCA scores. This can be carried out automatically after the PCA step, with the principal component regression (PCR) procedure (including PCA). This methodology was first applied to analytical chemical problems by Lawton and Sylvestre [25], and has more recently been used in different models by other researchers [26-28], Finally, the PCA procedure can also be coupled with cluster analysis (CA), as described in a very recent study on the characterisation of industrial wastewater samples [29],... [Pg.42]

Model Selection and Sequential Variable Selection Procedures In Multiple Linear Regression... [Pg.64]

Stored in a table where columns are descriptors, and rows are compounds (or conformers), QSAR data sets contain separate columns for the measured target property (Y), attributed to the training set, as well as computed descriptors for (external) reference compounds on which the QSAR model is tested—the test set. Statistical procedures, e.g., multiple linear regression (MLR), projection to latent structures (PLS), or neural networks (NN) [38], are then used to establish a mathematical soft model relating the observed measurement(s) in the Y column(s) with some combination of the properties represented in the subsequent columns. PLS, NN, and AI (artificial intelligence) techniques have been explored by Green and Marshall in the context of 3D-QSAR models [39], and were shown to extract similar information. A problem that may lead to spurious (chance) correlations when using MLR techniques, the colinearity between various descriptors, or cross-correlation, is usually dealt with in PLS [40],... [Pg.573]

Spectroscopic methods are increasingly employed for quantitative applications in many different fields, including chemistry [1]. The dimensionality of spectral data sets is basically limited by the number of the objects studied, whereas the number of variables can easily reach a few thousands. Highdimensional spectral data are very correlated and usually somewhat noisy, so that, the conventional multiple linear regression (MLR) cannot be applied to this type of data directly the feature selection or reduction procedures are needed [2],... [Pg.323]

However, it is not sufficient to identify a set of non-correlated, well-distributed parameters and then simply press the button and derive a multiple linear regression equation. To derive a QSAR equation properly requires a lot of care. Some of the descriptors may have little or no relevance to the property being modelled. Moreover, one generally wants to achieve a balance between an equation that captures the essence of the problem and yet is predictive. Fortunately, there are a number of procedures that can help with some of these problems. [Pg.700]

Chapter 6 discusses common problems encountered in multiple linear regression and the ways to deal with them. One problem is multiple collin-earity, in which some of the x, variables are correlated with other x, variables and the regression equation becomes unstable in applied work. A number of procedures are explained to deal with such problems and a biasing method called ridge regression is also discussed. [Pg.511]

An alternative route to the total solvatochromic equation is by the method of multiple linear regression analysis (multiple parameter least squares correlation), which has become quite convenient with the recent availability of inexpensive programmable computers. In this one-step procedure, correlation of (9)max results with solvent ir and values leads directly to the equation. [Pg.548]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...