Forward selection

Forward selection operates using only the F-to-In value, bringing only those x, variables into the equation that have Fc values exceeding the F-to-Enter value. It begins with bo in the model, then sequentially adds variables. In the example, we use F-to-Enter = 4.0, and set F-to-Remove = 0. That is, we are only bringing x, variables into the model that contribute at least 4.0, using the F table. Table 10.4 presents that forward selection data. [Pg.416]

Note that the results are exactly the same as those from the stepwise regression (Table 10.3). These values are again reflected in Table 10.5, the [Pg.416]

Original Data and Predicted and Error Values, Reduced Model, Example 10.1 [Pg.417]

Whitley DC, Ford MG, Livingstone DJ. Unsupervised forward selection a method for eliminating redundant variables. J Chem Inf Comput Sci 2000 40 1160-8. [Pg.489]

Zhang, Z. Wang, D. Harrington, P. d. B. Voorhees, K. J. Rees, J. Forward selection radial basis function networks applied to bacterial classification based on MALDI-TOF-MS. Talanta 2004, 63, 527-532. [Pg.159]

Figure 6. Canonical correspondence analysis for surface sediments of 41 lakes in British Columbia, Canada, that encompass a broad range of trophic states. Circles represent lakes and triangles represent the 25 most abundant diatom taxa. Arrows indicate environmental variables that correlate most strongly with the distribution of diatom taxa and lake-water chemistry, as detected by forward selection. Maximum depth (Zntax) and total phosphorus (TP) were transformed by using the In (x + 1) function. This analysis is discussed in detail in reference 46.

In forward selection, the first variable (wavelength) selected is that variable Xj that minimizes the residual sum of squares, RSS, according to... [Pg.136]

There are two important problems with the simple forward-selection procedure described above. [Pg.136]

There is no guarantee that forward selection will find the best-fitting subsets of any size except for m = 1 and m = w. [Pg.136]

In some cases, the first variable deleted in backward elimination is the first one inserted in forward selection. [Pg.137]

Both forward selection and backward elimination can fare arbitrarily poorly in finding the best-fitting subsets. [Pg.137]

The sequential-replacement algorithm can be obtained by taking the forward-selection algorithm and applying a replacement procedure after each new variable is added. [Pg.138]

Figure 11. Performance of competing criteria the number of descriptors in the model, for various criteria versus the root mean squared prediction error (RMSEP) in forward selection. (Reproduced with permission from the author.)...

Figures 11 and 12 illustrate the performance of the pR2 compared with several of the currently popular criteria on a specific data set resulting from one of the drug hunting projects at Eli Lilly. This data set has IC50 values for 1289 molecules. There were 2317 descriptors (or covariates) and a multiple linear regression model was used with forward variable selection the linear model was trained on half the data (selected at random) and evaluated on the other (hold-out) half. The root mean squared error of prediction (RMSE) for the test hold-out set is minimized when the model has 21 parameters. Figure 11 shows the model size chosen by several criteria applied to the training set in a forward selection for example, the pR2 chose 22 descriptors, the Bayesian Information Criterion chose 49, Leave One Out cross-validation chose 308, the adjusted R2 chose 435, and the Akaike Information Criterion chose 512 descriptors in the model. Although the pR2 criterion selected considerably fewer descriptors than the other methods, it had the best prediction performance. Also, only pR2 and BIC had better prediction on the test data set than the null model.

Lin (1993) suggested using stepwise variable selection and Wu (1993) suggested forward selection or all (estimable) subsets selection. Lin (1993) gave an illustrative analysis by stepwise selection of the data in Table 6. He found that this identified factors 15,12,19,4, and 10 as the active factors, when their main effects are entered into the model in this order. Wang (1995) analyzed the other half of the Williams experiment and identified only one of the five factors that Lin had identified as being nonnegligible, namely, factor 4. [Pg.181]

Abraham et al. (1999) studied forward selection and all subsets selection in detail. They showed, by simulating data from several different experiments, that the factors identified as active could change completely if a different fraction was used and that neither of these methods could reliably find three factors which have large effects. However, they concluded that all subsets selection is better than forward selection. Kelly and Voelkel (2000) showed more generally that the probabilities of type-II errors from stepwise regression are high. [Pg.181]

Alternatively, a form of forward selection could be used to try to eliminate effects that look large only because they are highly correlated with a dominant... [Pg.185]

Leave-one-out error rates for sequential forward selection 276 models... [Pg.264]

The initial selection of variables can be further reduced automatically using a selection algorithm (often backward elimination or forward selection). Such an automated procedure sounds as though it should produce the optimal choice of predictive variables, but it is often necessary in practice to use clinical knowledge to over-ride the statistical process, either to ensure inclusion of a variable that is known from previous studies to be highly predictive or to eliminate variables that might lead to overfitting (i.e. overestimation of the predictive value of the model by inclusion of variables that appear to be predictive in the derivation cohort, probably by chance, but are unlikely to be predictive in other cohorts). [Pg.187]

Forward selection (FS-SWR) is a technique starting with no variables in the model and adding one variable at a time until either all variables are entered or until a stopping criterion is satisfied. [Pg.468]

The most popular stepwise technique combines the two previous approaches (FW and BE) and is called Elimination-Selection (ES-SWR) [Efroymson, I960]. It is basically a forward selection, but at each step (when the number of model variables is greater than two) the possibility of deleting a variable as in the BE approach is considered. [Pg.468]

Figure 3 The effect of uncertainty in combinations of indicator values with a sequentially forward selection.

The forward selection technique starts with an empty equation, possibly containing a constant term only, with no independent variables. As the procedure progresses, variates are added to the test equation one at a time. The first variable included is that which has the highest correlation with the dependent variable y. The second variable added to the equation is the one with the highest... [Pg.182]

Finally, stepwise regression, a modified version of the forward selection technique, is often available with commercial programs. As with forward... [Pg.186]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...