Statistical analysis linear regression

The kinds of calculations described above are done for all the molecules under investigation and then all the data (combinations of 3-point pharmacophores) are stored in an X-matrix of descriptors suitable to be submitted for statistical analysis. In theory, every kind of statistical analysis and regression tool could be applied, however in this study we decided to focus on the linear regression model using principal component analysis (PCA) and partial least squares (PLS) (Fig. 4.9). PCA and PLS actually work very well in all those cases in which there are data with strongly collinear, noisy and numerous X-variables (Fig. 4.9). [Pg.98]

From a data analytical point of view, data can be categorised according to structure, as exemplified in Table 1. Depending on the kind of data acquired, appropriate data analytical tools must be selected. In the simplest case, only one variable/number is acquired for each sample in which case the data are commonly referred to as zeroth-order data. If several variables are collected for each sample, this is referred to as first-order data. A typical example could be a ID spectrum acquired for each sample. Several ID spectra from different samples may be organised in a two-way table or a matrix. For such a matrix of data, multivariate data analysis is commonly employed. It is clearly not possible to analyse zeroth-order data by multivariate techniques and one is restricted to traditional statistics and linear regression models. When first- or second-order data are available, multivariate data analysis may be used and several advantages may be exploited,... [Pg.210]

Fig. 4. Scatter plots of total polychlorinated biphenyl (ZPCB) versus (a) coprostanol and (b) XLAB concentrations in surficial sediments from study site. Lines and statistics for linear regression analysis are shown (modified from Eganhouse and Sherblom, 2001).

Predictive Modeling is another Data Mining task that is addressed by Statistical methods. The most common type of predictive model used in Statistics is linear regression, where we describe one variable as a linear combination of other known variables. A number of other tasks that involve analysis of several variables for various purposes are categorized by statisticians under the umbrella term multivariate analysis. [Pg.85]

It must be noted that if the experiments are carefully planned, relatively simple data analysis techniques (variance analysis, linear regression, etc.) can be used Conversely, if the experiments arc not carefully planned, it is often necessary to use much more complex mathematical and statistical tools (factorial analyses. clas.siflcations. etc.) without even being sure of the results. [Pg.468]

Most researchers who have worked with discrete event simulation are familiar with classical statistical analysis. By classical, we mean those tests that deal with assessing differences in means or that perform correlation analysis. Included in these tests are statistic procedmes such as t-tests (paired and unpaired), analysis of variance (univariate and multivariate), factor analysis, linear regression (in its various forms ordinary least squares, LOGIT, PROBIT, and robust regression) and non-parametric tests. [Pg.114]

Multiple linear regression is strictly a parametric supervised learning technique. A parametric technique is one which assumes that the variables conform to some distribution (often the Gaussian distribution) the properties of the distribution are assumed in the underlying statistical method. A non-parametric technique does not rely upon the assumption of any particular distribution. A supervised learning method is one which uses information about the dependent variable to derive the model. An unsupervised learning method does not. Thus cluster analysis, principal components analysis and factor analysis are all examples of unsupervised learning techniques. [Pg.719]

We now consider a type of analysis in which the data (which may consist of solvent properties or of solvent effects on rates, equilibria, and spectra) again are expressed as a linear combination of products as in Eq. (8-81), but now the statistical treatment yields estimates of both a, and jc,. This method is called principal component analysis or factor analysis. A key difference between multiple linear regression analysis and principal component analysis (in the chemical setting) is that regression analysis adopts chemical models a priori, whereas in factor analysis the chemical significance of the factors emerges (if desired) as a result of the analysis. We will not explore the statistical procedure, but will cite some results. We have already encountered examples in Section 8.2 on the classification of solvents and in the present section in the form of the Swain et al. treatment leading to Eq. (8-74). [Pg.445]

Statistical testing of model adequacy and significance of parameter estimates is a very important part of kinetic modelling. Only those models with a positive evaluation in statistical analysis should be applied in reactor scale-up. The statistical analysis presented below is restricted to linear regression and normal or Gaussian distribution of experimental errors. If the experimental error has a zero mean, constant variance and is independently distributed, its variance can be evaluated by dividing SSres by the number of degrees of freedom, i.e. [Pg.545]

A central concept of statistical analysis is variance,105 which is simply the average squared difference of deviations from the mean, or the square of the standard deviation. Since the analyst can only take a limited number n of samples, the variance is estimated as the squared difference of deviations from the mean, divided by n - 1. Analysis of variance asks the question whether groups of samples are drawn from the same overall population or from different populations.105 The simplest example of analysis of variance is the F-test (and the closely related t-test) in which one takes the ratio of two variances and compares the result with tabular values to decide whether it is probable that the two samples came from the same population. Linear regression is also a form of analysis of variance, since one is asking the question whether the variance around the mean is equivalent to the variance around the least squares fit. [Pg.34]

Data were subjected to analysis of variance and regression analysis using the general linear model procedure of the Statistical Analysis System (40). Means were compared using Waller-Duncan procedure with a K ratio of 100. Polynomial equations were best fitted to the data based on significance level of the terms of the equations and values. [Pg.247]

They include simple statistics (e.g., sums, means, standard deviations, coefficient of variation), error analysis terms (e.g., average error, relative error, standard error of estimate), linear regression analysis, and correlation coefficients. [Pg.169]

Multiple linear regression (MLR) is a classic mathematical multivariate regression analysis technique [39] that has been applied to quantitative structure-property relationship (QSPR) modeling. However, when using MLR there are some aspects, with respect to statistical issues, that the researcher must be aware of ... [Pg.398]

Because of the large difference in the behavior of the thin plywood and the gypsum board, the type of interior finish was the dominant factor in the statistical analysis of the total heat release data (Table III). Linear regression of the data sets for 5, 10, and 15 min resulted in squares of the correlation coefficients R = 0.88 to 0.91 with the type of interior finish as the sole variable. For the plywood, the average total heat release was 172, 292, and 425 MJ at 5, 10, and 15 min, respectively. For the gypsum board, the average total heat release was 25, 27, and 29 MJ at 5, 10, and 15 min, respectively. [Pg.425]

Using a 70-drug set, Ghafourian et al., conducted a statistical analysis of computed chemical descriptors to predict human VD [36]. The descriptors found to provide a predictive model included logP, logDi, and Pmm (a measure of dipole moment), in a linear regression ... [Pg.483]

Without having conducted a full elasticity analysis across the entire portfolio, the analysis helps to prove market perceptions such as a higher elasticity exist in one market compared to another market or comparing elasticity between products being perceived to have a different elasticity. The statistical quality of the linear regression analysis in selected months is considered as good in terms of the number of customers involved and the R-squared value proving the applicability of the approach. [Pg.223]

The training of most pathologists in statistics remains limited to a single introductory course which concentrates on some theoretical basics. As a result, the armertarium of statistical techniques of most toxicologists is limited and the tools that are usually present (t-tests, chi-square, analysis of variance, and linear regression) are neither fully developed nor well understood. It is hoped that this chapter will help change this situation. [Pg.863]

Note also that we can use the correlation test statistic (described in the correlation coefficient section) to determine if the regression is significant (and, therefore, valid at a defined level of certainty. A more specific test for significance would be the linear regression analysis of variance (Pollard, 1977). To so we start by developing the appropriate ANOVA table. [Pg.932]

Linear regression analysis is a statistical technique to determine the equation for the straight line that best describes the relationship between two variables. [Pg.33]

Although the term theoretical techniques in relation to electronic effects may commonly be taken to refer to quantum-mechanical methods, it is appropriate also to mention the application of chemometric procedures to the analysis of large data matrices. This is in a way complementary to analysis through substituent constants based on taking certain systems as standards and applying simple or multiple linear regression. Chemometrics involves the analysis of suitable data matrices through elaborate statistical procedures,... [Pg.506]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...