Multilinear regression models

A significant improvement in the correlations was obtained by combining size/shape descriptors with MO indexes in multilinear regression models ... [Pg.169]

Another problem is to determine the optimal number of descriptors for the objects (patterns), such as for the structure of the molecule. A widespread observation is that one has to keep the number of descriptors as low as 20 % of the number of the objects in the dataset. However, this is correct only in case of ordinary Multilinear Regression Analysis. Some more advanced methods, such as Projection of Latent Structures (or. Partial Least Squares, PLS), use so-called latent variables to achieve both modeling and predictions. [Pg.205]

Furthermore, QSPR models for the prediction of free-energy based properties that are based on multilinear regression analysis are often referred to as LFER models, especially, in the wide field of quantitative structure-activity relationships (QSAR). [Pg.489]

Multiple linear regression analysis is a widely used method, in this case assuming that a linear relationship exists between solubility and the 18 input variables. The multilinear regression analy.si.s was performed by the SPSS program [30]. The training set was used to build a model, and the test set was used for the prediction of solubility. The MLRA model provided, for the training set, a correlation coefficient r = 0.92 and a standard deviation of, s = 0,78, and for the test set, r = 0.94 and s = 0.68. [Pg.500]

The models are applicable to large data sets with a rapid calculation speed, a wide range of compounds can be processed. Neural networks provided better models than multilinear regression analysis. [Pg.504]

More than just a few parameters have to be considered when modelling chemical reactivity in a broader perspective than for the well-defined but restricted reaction sets of the preceding section. Here, however, not enough statistically well-balanced, quantitative, experimental data are available to allow multilinear regression analysis (MLRA). An additional complicating factor derives from comparison of various reactions, where data of quite different types are encountered. For example, how can product distributions for electrophilic aromatic substitutions be compared with acidity constants of aliphatic carboxylic acids And on the side of the parameters how can the influence on chemical reactivity of both bond dissociation energies and bond polarities be simultaneously handled when only limited data are available ... [Pg.60]

The surface tensions themselves in the GB/SA and MST-ST models were developed by taking collections of experimental data for the free energy of solvation in a specific solvent, removing the electrostatic component as calculated by the GB or MST model, and fitting the surface tensions to best reproduce the residual free energy given the known SASA of the solute atoms. Such a multilinear regression procedure requires a reasonably sized collection of data to be statistically robust, and limitations in data have thus restricted these models to water, carbon tetrachloride, chloroform, and octanol as solvents. [Pg.409]

Algebraic expressions for terms M and C were derived using Dewar s PMO method (for C in a version similar to the co-technique [57] in order to calculate carbocation stabilization energies). The size factor S is simply a cubic function of the number of carbon atoms [97], The three independent variables of the model were assumed to be linearly related to the experimental Iball indices (vide supra). By multilinear regression analysis (sample size = 26) an equation was derived for calculating Iball indices from the three theoretical parameters. The correlation coefficient for the linear relation between calculated and experimental Iball indices is r = 0.961. [Pg.120]

Multilinear regression can be used where the investigated endpoint is correlated to a linear combination of independent variables (the descriptors). This technique assumes linearity over the whole data set with respect to the descriptors. In addition, normality of the data must be fulfilled, and the descriptors cannot be intercorrelated. Multilinear regression is widely used in (Q)SAR modeling and has the advantage that all numerical information is retained and the predicted endpoint may be better estimated. However, the model may eventually overfit the data, after which the addition of further descriptors causes a decrease in accuracy of the model however, this will typically be disclosed in the calibration step of development. [Pg.82]

Empirical mathematical models currently encountered are multilinear regression equations of the type... [Pg.262]

Table 14 shows results obtained for every formula development according to MODDE 4.0 software. The collected experimental data were fitted by a multilinear regression (MLR) model with which several responses can be dealt with simultaneously to provide an overview of how all the factors affect all the responses. The responses of the model, R2 and Q2 values, were over 0.99 and 0.93 for tm% and 0.98 and 0.89 for /30%, respectively, implying that the data fitted well with the model. Here, R2 is the fraction of the variation of the response that can be modeled and Q2 is the fraction of the variation of the response that can be predicted by the model. The relationship between a response y and the variables xh xh... can be described by the polynomial ... [Pg.1009]

The effects (coefficients) in the model are estimated, usually by multilinear regression. The values obtained bi are estimates because of the random experimental error (represented by in the equation). The next step is to decide which of the 15 effects calculated are active or important. [Pg.2456]

It is clear that for an unsymmetrical data matrix that contains more variables (the field descriptors at each point of the grid for each probe used for calculation) than observables (the biological activity values), classical correlation analysis as multilinear regression analysis would fail. All 3D QSAR methods benefit from the development of PLS analysis, a statistical technique that aims to find the multidimensional direction in the X space that explains the maximum multidimensional variance direction in the F space. PLS is related to principal component analysis (PCA)." ° However, instead of finding the hyperplanes of maximum variance, it finds a linear model describing some predicted variables in terms of other observable variables and therefore can be used directly for prediction. Complexity reduction and data... [Pg.592]

Kahn, L, Sdd, S. and Maran, U. (2007) Modeling the toxicity of chemicals to Tetrahymena pyriformis using heuristic multilinear regression and heuristic back-propagation neural networks. [Pg.1082]

Depending on what kind of information is needed, different possibilities exist for using a multilinear partial least squares regression model on new data. If only the prediction of y is wanted, it is possible to obtain a set of regression coefficients that relates X directly to y. This has been shown in detail by Smilde and de Jong for the multilinear PLS1 model [de Jong 1998, Smilde 1997] as follows. The first score vector ti is... [Pg.127]

Next to ADME phenomena, recent data mining studies also focused on the development or improvement of models predicting physicochemical properties relevant to the field of ADME. Examples are Henry s law constant [92], polar surface area [93], and log P [94]. These models try to overcome limitations of already existing models, see for example SlogP [94] vs. Clogp [95], or aqueous solubility [96], The latter study used more than 2000 compounds selected from the AQUASOL [97] and PHYSOPROP [98] databases. Comparison with a multilinear regression showed clear preference for the neural network. [Pg.691]

The complete quadratic model determined by multilinear regression for the mean particle size is ... [Pg.240]

In determining a mathematical model, whether by linear combinations or by multilinear regression, we have assumed the standard deviation of random experimental error to be (approximately) constant (homoscedastic) over the experimental region. Mathematical models were fitted to the data and their statistical significance or that of their coefficients was calculated on the basis of this constant experimental variance. Now the standard deviation is often approximately constant. All experiments may then be assumed equally reliable and so their usefulness depends solely on their positions within the domain. [Pg.312]

Another graphical method is to calculate the difference between the results at each speed, Ay for each experiment of the inner array. Ay is treated as a new response, and analysed by multilinear regression, according to the second-order model ... [Pg.329]

The experimental results are also listed in table 9.14. The data were analysed according to the two models, equations 9.9 and 9.10. Since some measurements of solubility were in duplicate, we can estimate the reproducibility of the experimental technique. It is possible to estimate the model with the data for the duplicated points 1-6, and then validate it with the test points 7-12. Instead we estimate it by multilinear regression over all the data and then test by analysis of variance. [Pg.412]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...