Statistical models linear regression

From a data analytical point of view, data can be categorised according to structure, as exemplified in Table 1. Depending on the kind of data acquired, appropriate data analytical tools must be selected. In the simplest case, only one variable/number is acquired for each sample in which case the data are commonly referred to as zeroth-order data. If several variables are collected for each sample, this is referred to as first-order data. A typical example could be a ID spectrum acquired for each sample. Several ID spectra from different samples may be organised in a two-way table or a matrix. For such a matrix of data, multivariate data analysis is commonly employed. It is clearly not possible to analyse zeroth-order data by multivariate techniques and one is restricted to traditional statistics and linear regression models. When first- or second-order data are available, multivariate data analysis may be used and several advantages may be exploited,... [Pg.210]

Predictive Modeling is another Data Mining task that is addressed by Statistical methods. The most common type of predictive model used in Statistics is linear regression, where we describe one variable as a linear combination of other known variables. A number of other tasks that involve analysis of several variables for various purposes are categorized by statisticians under the umbrella term multivariate analysis. [Pg.85]

For the calculation of E, was compared with the highest observed of the six possible two-descriptor models that could be ftmned from the three selected inputs (corresponding to the unshuffled pair in Table X). Statistics for linear regression and additional measures of the predictive accuracy are available in Table X and XI. [Pg.24]

Multiple linear regression is strictly a parametric supervised learning technique. A parametric technique is one which assumes that the variables conform to some distribution (often the Gaussian distribution) the properties of the distribution are assumed in the underlying statistical method. A non-parametric technique does not rely upon the assumption of any particular distribution. A supervised learning method is one which uses information about the dependent variable to derive the model. An unsupervised learning method does not. Thus cluster analysis, principal components analysis and factor analysis are all examples of unsupervised learning techniques. [Pg.719]

We now consider a type of analysis in which the data (which may consist of solvent properties or of solvent effects on rates, equilibria, and spectra) again are expressed as a linear combination of products as in Eq. (8-81), but now the statistical treatment yields estimates of both a, and jc,. This method is called principal component analysis or factor analysis. A key difference between multiple linear regression analysis and principal component analysis (in the chemical setting) is that regression analysis adopts chemical models a priori, whereas in factor analysis the chemical significance of the factors emerges (if desired) as a result of the analysis. We will not explore the statistical procedure, but will cite some results. We have already encountered examples in Section 8.2 on the classification of solvents and in the present section in the form of the Swain et al. treatment leading to Eq. (8-74). [Pg.445]

Statistical testing of model adequacy and significance of parameter estimates is a very important part of kinetic modelling. Only those models with a positive evaluation in statistical analysis should be applied in reactor scale-up. The statistical analysis presented below is restricted to linear regression and normal or Gaussian distribution of experimental errors. If the experimental error has a zero mean, constant variance and is independently distributed, its variance can be evaluated by dividing SSres by the number of degrees of freedom, i.e. [Pg.545]

Once we have estimated the unknown parameter values in a linear regression model and the underlying assumptions appear to be reasonable, we can proceed and make statistical inferences about the parameter estimates and the response variables. [Pg.32]

Multiple linear regression (MLR) is a classic mathematical multivariate regression analysis technique [39] that has been applied to quantitative structure-property relationship (QSPR) modeling. However, when using MLR there are some aspects, with respect to statistical issues, that the researcher must be aware of ... [Pg.398]

Using a 70-drug set, Ghafourian et al., conducted a statistical analysis of computed chemical descriptors to predict human VD [36]. The descriptors found to provide a predictive model included logP, logDi, and Pmm (a measure of dipole moment), in a linear regression ... [Pg.483]

Using multivariable linear regression, a set of equations can be derived from the parameterized data. Statistical analysis yields the "best equations to fit the en irical data. This mathematical model forms a basis to correlate the biologicsd activity to the chemical structures. [Pg.152]

Neter, J., Wasserman, W and Kutner, M.H. (1990), Applied Linear Statistical Models Regression, Analysis of Variance, and Experimental Designs, 3rd ed., Irwin, Homewood, IL. [Pg.425]

A more common use of informatics for data analysis is the development of (quantitative) structure-property relationships (QSPR) for the prediction of materials properties and thus ultimately the design of polymers. Quantitative structure-property relationships are multivariate statistical correlations between the property of a polymer and a number of variables, which are either physical properties themselves or descriptors, which hold information about a polymer in a more abstract way. The simplest QSPR models are usually linear regression-type models but complex neural networks and numerous other machine-learning techniques have also been used. [Pg.133]

To avoid over-fitting, a commonly used approach is to select a subset of descriptors to build models. GAs are widely used to select descriptors prior to using other statistical tools, such as MLR, to build models. Certainly, principal component analysis and PLS fitting are also widely used in reducing the dimensions of descriptors. Traditionally, stepwise linear regression is used to select certain descriptors to enter the regression equations. [Pg.120]

In Equation 5.28, s is a function of the concentration residuals observed during calibration, r is tlie measurement vector for the prediction sample, and R contains the calilxation measurements for the variables used in the model. Because the assumptions of linear regression are often not rigorously obeyed, the statistical pret ion error should be used empirically rather than absolutely. It is useful for validating the prediction samples by comparing the values for... [Pg.135]

Designed Experiments Produce More Precise Models. In the context of linear regression, this is demonstrated by examining the statistical uncertainties of the regression coefficients. Equation 2.1 is the regression model where the response for the th sample (r ) of an instrument is shown as a linear function of the sample concentration (c.) with measurement error... [Pg.192]

Statistical Prediction Errors (Model and Sample Diag Jostic) Uncertainties in the concentrations can be estimated because the predicted concentrations are regression coefficients from a linear regression (see Equations 5.7-5.10). These are referred to as statistical prediction errors to distinguish them from simple concentration residuals (c — c). Tlie statistical prediction errors are calculated for one prediction sample as... [Pg.281]

Empirical multiple linear regression models were developed to describe the foam capacity and stability data of Figures 2 and 4 as a function of pH and suspension concentration (Tables III and IV). These statistical analyses and foaming procedures were modeled after data published earlier (23, 24, 29, 30, 31). The multiple values of 0.9601 and 0.9563 for foam capacity and stability, respectively, were very high, indicating that approximately 96% of the variability contributing to both of these functional properties of foam was accounted for by the seven variables used in the equation. [Pg.158]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...