Multiple linear regression model prediction

Two models of practical interest using quantum chemical parameters were developed by Clark et al. [26, 27]. Both studies were based on 1085 molecules and 36 descriptors calculated with the AMI method following structure optimization and electron density calculation. An initial set of descriptors was selected with a multiple linear regression model and further optimized by trial-and-error variation. The second study calculated a standard error of 0.56 for 1085 compounds and it also estimated the reliability of neural network prediction by analysis of the standard deviation error for an ensemble of 11 networks trained on different randomly selected subsets of the initial training set [27]. [Pg.385]

Figures 11 and 12 illustrate the performance of the pR2 compared with several of the currently popular criteria on a specific data set resulting from one of the drug hunting projects at Eli Lilly. This data set has IC50 values for 1289 molecules. There were 2317 descriptors (or covariates) and a multiple linear regression model was used with forward variable selection the linear model was trained on half the data (selected at random) and evaluated on the other (hold-out) half. The root mean squared error of prediction (RMSE) for the test hold-out set is minimized when the model has 21 parameters. Figure 11 shows the model size chosen by several criteria applied to the training set in a forward selection for example, the pR2 chose 22 descriptors, the Bayesian Information Criterion chose 49, Leave One Out cross-validation chose 308, the adjusted R2 chose 435, and the Akaike Information Criterion chose 512 descriptors in the model. Although the pR2 criterion selected considerably fewer descriptors than the other methods, it had the best prediction performance. Also, only pR2 and BIC had better prediction on the test data set than the null model.

The multiple linear regression models are validated using standard statistical techniques. These techniques include inspection of residual plots, standard deviation, and multiple correlation coefficient. Both regression and computational neural network models are validated using external prediction. The prediction set is not used for descriptor selection, descriptor reduction, or model development, and it therefore represents a true unknown data set. In order to ascertain the predictive power of a model the rms error is computed for the prediction set. [Pg.113]

Hence, R in the multiple linear regression model that predicts y from two independent predictor variables, Xi and X2, explains 75.3% or when adjusted, 73.9% of the variability in the model. The other 1 — 0.753 = 0.247 is xmexplained error. In addition, note that a fit of = 50% would infer that the prediction of y based on x and X2 is no better than y. [Pg.206]

Comparison of Immunoassay with GC/MS Analysis. The relation between atrazlne concentration determined by GC/MS analysis and trlazlne concentration determined from immunoassay analysis on 127 samples is shown in figure 3. Samples with immunoassay results larger than 5 ug/L are not plotted. Although the two determinations are highly correlated (rank correlation coefficient is 0.90 p<0.0001) the relation is not linear over the 0.2 ug/L to 5 ug/L range of the immunoassay results. Linear and multiple-linear regression models were fitted to the data to enable prediction of atrazlne concentrations from the immunoassay data. [Pg.95]

Zhu et al. [37] have compared the prediction of cotton yam irregularity by neural network model and multiple-linear regression model. It was observed that the neural netwoik model has a better prediction result than regression model. [Pg.130]

Lesch SM, Strauss DJ, Rhoades JD (1995) Spatial prediction of soil salinity using electromagnetic induction techniques 2. An efficient spatial sampling algorithm suitable for multiple linear regression model identification and estimation. Water Resour Res 31 387-398... [Pg.57]

Multiple linear regression analysis is a widely used method, in this case assuming that a linear relationship exists between solubility and the 18 input variables. The multilinear regression analy.si.s was performed by the SPSS program [30]. The training set was used to build a model, and the test set was used for the prediction of solubility. The MLRA model provided, for the training set, a correlation coefficient r = 0.92 and a standard deviation of, s = 0,78, and for the test set, r = 0.94 and s = 0.68. [Pg.500]

FIGURE 4.24 PLS as a multiple linear regression method for prediction of a property y from variables xi,..., xm, applying regression coefficients b1,...,bm (mean-centered data). From a calibration set, the PLS model is created and applied to the calibration data and to test data. [Pg.165]

Neural networks are a relatively new tool in data modelling in the field of pharmacokinetics [54—56]. Using this approach, non-linear relationships to predicted properties are better taken into account than by multiple linear regression [45]. Human hepatic drug clearance was best predicted from human hepatocyte data, followed by rat hepatocyte data, while in the studied data set animal in vivo data did not significantly contribute to the predictions [56]. [Pg.138]

The multiple linear regression (MLR) method was historically the first and, until now, the most popular method used for building QSPR models. In MLR, a property is represented as a weighted linear combination of descriptor values F=ATX, where F is a column vector of property to be predicted, X is a matrix of descriptor values, and A is a column vector of adjustable coefficients calculated as A = (XTX) XTY. The latter equation can be applied only if the matrix XTX can be inverted, which requires linear independence of the descriptors ( multicollinearity problem ). If this is not the case, special techniques (e.g., singular value decomposition (SVD)26) should be applied. [Pg.325]

For evaluation of the PLS model and for comparison with multiple linear regression the independent parameters were varied in the calibration range and predictions were made. Tab. 8-15 illustrates the comparison of the predicted and the measured values. [Pg.310]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...