Bootstrap residuals

The most common approach to constructing bootstrap pseudosamples is to bootstrap the pairs that is, one randomly selects data for a typical data set on a line-by-line or subject-by-subject basis that is inserted into the pseudosample and replaced. Bootstrapping residuals is another approach that has particular application to regression analyses. In a typical bootstrap data set (B,), data are chosen of the form... [Pg.407]

Bootstrapping the residual assumes that the residuals are not a function of the dependent variables and that the form of the error model is known. This is a strong assumption that is seldom met in regression analyses and pharmacometrics in particular. Bootstrapping pairs is less sensitive to assumptions than is bootstrapping residuals. [Pg.407]

Bootstrapping residuals is not without its own problems. Particularly, that an appropriate model has been chosen and that e —> s at least asymptotically (in fact, OLS residuals typically underestimate s). For this latter reason, sometimes residuals corrected for leverage are used... [Pg.361]

In this manner the modified bootstrapped residuals will maintain the variance model of the original data. It should be noted that bootstrapping nonlinear models is done in the same manner as bootstrapping linear models. [Pg.361]

The basis of all performance criteria are prediction errors (residuals), yt - yh obtained from an independent test set, or by CV or bootstrap, or sometimes by less reliable methods. It is crucial to document from which data set and by which strategy the prediction errors have been obtained furthermore, a large number of prediction errors is desirable. Various measures can be derived from the residuals to characterize the prediction performance of a single model or a model type. If enough values are available, visualization of the error distribution gives a comprehensive picture. In many cases, the distribution is similar to a normal distribution and has a mean of approximately zero. Such distribution can well be described by a single parameter that measures the spread. Other distributions of the errors, for instance a bimodal distribution or a skewed distribution, may occur and can for instance be characterized by a tolerance interval. [Pg.126]

Comparison of the success of different classification methods requires a realistic estimation of performance measures for classification, like misclassification rates (% wrong) or predictive abilities (% correct) for new cases (Section 5.7)—together with an estimation of the spread of these measures. Because the number of objects with known class memberships is usually small, appropriate resampling techniques like repeated double CV or bootstrap (Section 4.2) have to be applied. A difficulty is that performance measures from regression (based on residuals) are often used in the development of classifiers but not misclassification rates. [Pg.261]

Bias corrections are sometimes applied to MLEs (which often have some bias) or other estimates (as explained in the following section, [mean] bias occurs when the mean of the sampling distribution does not equal the parameter to be estimated). A simple bootstrap approach can be used to correct the bias of any estimate (Efron and Tibshirani 1993). A particularly important situation where it is not conventional to use the true MLE is in estimating the variance of a normal distribution. The conventional formula for the sample variance can be written as = SSR/(n - 1) where SSR denotes the sum of squared residuals (observed values, minus mean value) is an unbiased estimator of the variance, whether the data are from a normal distribution... [Pg.35]

An approach that is sometimes helpful, particularly for recent pesticide risk assessments, is to use the parameter values that result in best fit (in the sense of LS), comparing the fitted cdf to the cdf of the empirical distribution. In some cases, such as when fitting a log-normal distribution, formulae from linear regression can be used after transformations are applied to linearize the cdf. In other cases, the residual SS is minimized using numerical optimization, i.e., one uses nonlinear regression. This approach seems reasonable for point estimation. However, the statistical assumptions that would often be invoked to justify LS regression will not be met in this application. Therefore the use of any additional regression results (beyond the point estimates) is questionable. If there is a need to provide standard errors or confidence intervals for the estimates, bootstrap procedures are recommended. [Pg.43]

As a last comment, caution should be exercised when fitting small sets of data to both structural and residual variance models. It is commonplace in the literature to fit individual data and then apply a residual variance model to the data. Residual variance models based on small samples are not very robust, which can easily be seen if the data are jackknifed or bootstrapped. One way to overcome this is to assume a common residual variance model for all observations, instead of a residual variance model for each subject. This assumption is not such a leap of faith. For GLS, first fit each subject and then pool the residuals. Use the pooled residuals to estimate the residual variance model parameters and then iterate in this manner until convergence. For ELS, things are a bit trickier but are still doable. [Pg.135]

If, however, the predictor variables in an experiment are fixed, such as in a designed experiment where the predictor variables are under the experimenter s control, then bootstrapping of the residuals is done to preserve the fixed nature of the predictor variables. So the method would be as follows ... [Pg.361]

Resample the modified residuals (denoted e ) and holding the predictor variables as fixed, generate the bootstrap dependent variable as Y = f(x 0) + e. Sometimes instead of resampling the modified residuals, e is resampled from ei — e, e2 — e,... en — e instead. In the linear model the mean residual is zero so this is equivalent to bootstrapping the modified residuals. But, the mean residual is not necessarily equal to zero in the nonlinear case so correcting for the mean is necessary. [Pg.361]

In both cases, the correction factor (the denominator) fattens the residuals (Stine, 1990). Such modifications are necessary in developing bootstrapped prediction intervals (Stine, 1985). [Pg.361]

When the variance model is heteroscedastic, the algorithm for bootstrapping the residuals will not be valid because the bootstrapped data set might not have the same variance model as the original data. In fact, more than likely, bootstrapping heteroscedastic residuals will lead to a homoscedastic model. Heteroscedas-ticity is not a problem for the random case because heteroscedasticity will be preserved after bootstrapping. In the heteroscedastic case, the modified residuals need to be corrected for their variance so that Eq. (A. 102) becomes... [Pg.361]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...