Regression stepwise

Stepwise regression proposed by Efroymson, is a combination of forward inclusion and backward elimination. After each variable is added (other than the first two), a test is performed to see if any of the variables entered at an earlier step can be deleted. The procedure applies both Eqs. [32] and [33] in a sequential manner. The stepping stops when no more variables satisfy either the criterion for removal or the criterion for inclusion. To prevent the procedure from unnecessarily cycling the critical values of P-to-enter and P-to-remove should be such that Premove Center- [Pg.324]

All three of these methods that we use for variable selection are prone to entrapment in local minima, i.e., they find a combination of variables that cannot be improved on in the next step (removal or addition of one variable) for the criterion function, which can be avoided by performing either a Tabu search (TS) or the more computationally expensive all subsets regression. We discuss the second of these two methods in the next section and refer readers to the papers by Glover for details of the TS method. [Pg.324]

n observations are generated randomly from a normal distribution with zero mean and unit variance for k + 1 variables (one y variable and k descriptors). [Pg.325]

p of the k descriptors are selected with one of these stepping methods (or with an all subset approach), and the value of say is recorded. [Pg.325]

Those R values are then ordered smallest to largest. [Pg.325]

The first selection procedure we discuss is stepwise regression. We have dime this earlier, but not with a software package. Instead, we did a number of partial regression contrasts. Briefly, the F-to-Enter value is set, which can be interpreted as an Ft value minimum for an jc,- variable to be accepted into the final equation. That is, each x,- variable must contribute at least that level to be admitted into the equation. The variable is usually selected in terms of entering one variable at a time with n — k ldf. This would provide an Ft at a = 0.05 of Ft(o.o5,i,ii) = 4.84. The F-to-Enter (sometimes referred to as F in ) is arbitrary. For more than one x,- variable, the test is the partial F test, exactly as we have done earlier. We already know that only X2 would enter this model, because SSr sequential for X2 = 28.580 (Section C, Table 10.2). [Pg.414]

Neither Xi nor would enter the model, because their values are less than Fj = F(o.o5, 1,11) = 4.84, which is the cut-off value. [Pg.415]

The F-to-Remove command is a set Ft value such that, if the F value is lesser than the F-to-Remove value, it is dropped from the model. The defaults for F-to-Enter and F-to-Remove are F = 4.0 in MiniTab, but can be easily changed. F-to-Remove, also known as Fqut is a value lesser than or equal to F-to-Enter that is, F-to-Enter F-to-Remove. [Pg.415]

Stepwise regression is a very popular regression procedure, because it evaluates both values going into and values removed from the regression model. The stepwise regression in Table 10.3, a standard MiniTab output, contains both Fin and Four set at 4.00. Note that only X2 and bo (the constant) remain in the model after the stepwise procedure. [Pg.415]

Regression Model, Single Independent Variable, Example 10.1 [Pg.416]

A negative correlation was found between PbB and systolic pressure in Belgian men in the Cadmibel study (a cross-sectional population study of the health effects of environmental exposure to cadmium) (Staessen et al. 1991). In this study, blood pressure and urinary cation (positive ions found in the urine, such as sodium, potassium, and calcium) concentration data were obtained from 963 men and 1,019 women multiple stepwise regression analyses were conducted adjusting for age, body mass index, pulse... [Pg.55]

Schroeder et al. (1985) and Schroeder and Hawk (1987) evaluated 104 black children of lower socioeconomic status at ages 10 months to 6.5 years, using the Bayley Mental Development Index (MDI) or Stanford-Binet IQ Scale. Hierarchical backward stepwise regression analyses indicated that PbB levels (range 6-59 pg/dL) were a significant source of the variance in IQ and MDI scores after controlling for socioeconomic status and other factors. Fifty of the children were examined again 5 years later, at which time PbB levels were 30 pg/dL. The 5-year follow-up IQ scores were inversely correlated with... [Pg.98]

The literature of the past three decades has witnessed a tremendous explosion in the use of computed descriptors in QSAR. But it is noteworthy that this has exacerbated another problem rank deficiency. This occurs when the number of independent variables is larger than the number of observations. Stepwise regression and other similar approaches, which are popularly used when there is a rank deficiency, often result in overly optimistic and statistically incorrect predictive models. Such models would fail in predicting the properties of future, untested cases similar to those used to develop the model. It is essential that subset selection, if performed, be done within the model validation step as opposed to outside of the model validation step, thus providing an honest measure of the predictive ability of the model, i.e., the true q2 [39,40,68,69]. Unfortunately, many published QSAR studies involve subset selection followed by model validation, thus yielding a naive q2, which inflates the predictive ability of the model. The following steps outline the proper sequence of events for descriptor thinning and LOO cross-validation, e.g.,... [Pg.492]

In order to show the inflation of q2, which results from the use of improper statistical methods, we have performed comparative studies involving stepwise regression and RR [68,70]. In these studies, comparative models were developed for the prediction of rat fat air and human blood air partitioning of chemicals. For the former, proper statistical methods yielded a model with a q2 value of 0.841, while the stepwise approach was associated with an inflated q2 of 0.934. Likewise, the rat fat air model derived using proper methods had a q2 value of 0.854, while the stepwise approach yielded a model with an inflated q2 of 0.955. [Pg.492]

Several variations of these concepts (e.g., stepwise regression) have also been proposed (D4). [Pg.114]

The following three performance measures are commonly used for variable selection by stepwise regression or by best-subset regression. An example in Section 4.5.8 describes use and comparison of these measures. [Pg.129]

An often-used version of stepwise variable selection (stepwise regression) works as follows Select the variable with highest absolute correlation coefficient with the y-variable the number of selected variables is mo= 1. Add each of the remaining x-variables separately to the selected variable the number of variables in each subset is nii = 2. Calculate F as given in Equation 4.44,... [Pg.154]

FIGURE 4.41 Stepwise regression for the PAC data set. The BIC measure is reduced within each step of the procedure, resulting in models with a certain number of variables (left). The evaluation of the final model is based on PLS where the number of PLS components is determined by repeated double CV (right). [Pg.197]

FIGURE 4.42 Evaluation of the final model from stepwise regression. A comparison of measured and predicted y-values (left) using repeated double CV with PLS models for prediction, and resulting SEP values (right) from repeated CV using linear models directly with the 33 selected variables from stepwise regression. [Pg.198]

Data of corrosion rate of carbon steel, copper, zinc and aluminum together with different TOW and contaminants were statistically processed (stepwise regression) and the following results were obtained ... [Pg.73]

Computer packages such as SAS can fit these models, provide estimates of the values of the b coefficients together with standard errors, and give p-values associated with the hypothesis tests of interest. These hypotheses will be exactly as Hqj, Hq2 and Hq3 in Section 6.3. Methods of stepwise regression are also available for the identification of a subset of the baseline variables/factors that are predictive of outcome. [Pg.97]

R.I. Jennrich and P.F. Sampson, Application of stepwise regression to nonlinear estimation. Technometrics, 10 (1968) 63-72. [Pg.218]

Trivedi et al. utilized Sorby s experimental data for water-ethanol-propylene glycol and Ltted to a complete second order polynomial model and performed a stepwise regression to arrive at following equation where andy represent fractions of ethanol and propylene glycol, respectively ... [Pg.170]

Billings and Voon, 1986] Billings, S. A. and Voon, W. S. F. (1986). A prediction-error and stepwise-regression estimation algorithm for non-linear systems. Int. J. Control, 44(3) 803-822. [Pg.252]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...