Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...

Articles Figures Tables About

Multiple regression, forward selection

How do we set about variable selection One obvious approach is to examine the pair-wise correlations between the response and the physicochemical descriptors. One form of model building, forward stepping multiple regression, begins by choosing the descriptor that has the highest correlation with a response variable. If the response is a categorical variable such as toxic/non-toxic,... [Pg.167]

Figures 11 and 12 illustrate the performance of the pR2 compared with several of the currently popular criteria on a specific data set resulting from one of the drug hunting projects at Eli Lilly. This data set has IC50 values for 1289 molecules. There were 2317 descriptors (or covariates) and a multiple linear regression model was used with forward variable selection the linear model was trained on half the data (selected at random) and evaluated on the other (hold-out) half. The root mean squared error of prediction (RMSE) for the test hold-out set is minimized when the model has 21 parameters. Figure 11 shows the model size chosen by several criteria applied to the training set in a forward selection for example, the pR2 chose 22 descriptors, the Bayesian Information Criterion chose 49, Leave One Out cross-validation chose 308, the adjusted R2 chose 435, and the Akaike Information Criterion chose 512 descriptors in the model. Although the pR2 criterion selected considerably fewer descriptors than the other methods, it had the best prediction performance. Also, only pR2 and BIC had better prediction on the test data set than the null model. Figures 11 and 12 illustrate the performance of the pR2 compared with several of the currently popular criteria on a specific data set resulting from one of the drug hunting projects at Eli Lilly. This data set has IC50 values for 1289 molecules. There were 2317 descriptors (or covariates) and a multiple linear regression model was used with forward variable selection the linear model was trained on half the data (selected at random) and evaluated on the other (hold-out) half. The root mean squared error of prediction (RMSE) for the test hold-out set is minimized when the model has 21 parameters. Figure 11 shows the model size chosen by several criteria applied to the training set in a forward selection for example, the pR2 chose 22 descriptors, the Bayesian Information Criterion chose 49, Leave One Out cross-validation chose 308, the adjusted R2 chose 435, and the Akaike Information Criterion chose 512 descriptors in the model. Although the pR2 criterion selected considerably fewer descriptors than the other methods, it had the best prediction performance. Also, only pR2 and BIC had better prediction on the test data set than the null model.
Stepwise multiple linear regression. This is a modified form of forward selection. The model starts out including only one variable, and more variables are subsequently added. But at each stage a BE-style test is also applied. If a variable is added, but becomes less important as a result of subsequent additions, SMLR will allow its removal from the model. [Pg.341]

Backward elimination is a variable selection algorithm for multiple linear regression it starts with all variables in the model and eliminates all nonsignificant variables see forward selection as well. [Pg.164]


See other pages where Multiple regression, forward selection is mentioned: [Pg.215]    [Pg.215]    [Pg.364]    [Pg.75]    [Pg.114]    [Pg.64]    [Pg.511]    [Pg.383]    [Pg.67]    [Pg.341]    [Pg.782]    [Pg.122]    [Pg.367]    [Pg.248]    [Pg.127]    [Pg.348]    [Pg.187]   
See also in sourсe #XX -- [ Pg.189 ]




SEARCH



Forward

Forward selection

Forwarder

Multiple regression

Multiple regression, forward

Multiplicity selection

© 2024 chempedia.info