Statistics bootstrap

Twenty-two clusters could be unambiguously detected from the present analysis of 30 amino acid positions (Fig. 5). These clusters were defined in order to encompass the maximum number of related entries within a branch characterized by the highest possible statistical bootstrap value. Thirty-four out of 372 entries could not be assigned to one of the existing 22 clusters and are considered as singletons. The herein presented tree... [Pg.115]

The statistical difference in performance of models was estimated with a bootstrap test using 10000 replicas (see details in Ref. [91]). The significance level of p<0.05 was used. For each dataset all methods were classified in four categories. [Pg.395]

Porter PS, Rao ST, Ku J-Y, Poirot RL, Dakins M (1997) Small sample properties of nonparametric bootstrap t confidence intervals. J Air Waste Management Assoc 47 1197-1203 Powell R, Hergt J, Woodhead J (2002) Improving isochron calcttlatiorrs with robust statistics and the bootstrap. Chem Geol 185 191- 204... [Pg.652]

For circumstances where wide variability is observed, or a statistical evaluation of f2 metric is desired, a bootstrap approach to calculate a confidence interval can be performed (8). [Pg.91]

Cross validation and bootstrap techniques can be applied for a statistically based estimation of the optimum number of PCA components. The idea is to randomly split the data into training and test data. PCA is then applied to the training data and the observations from the test data are reconstmcted using 1 to m PCs. The prediction error to the real test data can be computed. Repeating this procedure many times indicates the distribution of the prediction errors when using 1 to m components, which then allows deciding on the optimal number of components. For more details see Section 3.7.1. [Pg.78]

An approach that is sometimes helpful, particularly for recent pesticide risk assessments, is to use the parameter values that result in best fit (in the sense of LS), comparing the fitted cdf to the cdf of the empirical distribution. In some cases, such as when fitting a log-normal distribution, formulae from linear regression can be used after transformations are applied to linearize the cdf. In other cases, the residual SS is minimized using numerical optimization, i.e., one uses nonlinear regression. This approach seems reasonable for point estimation. However, the statistical assumptions that would often be invoked to justify LS regression will not be met in this application. Therefore the use of any additional regression results (beyond the point estimates) is questionable. If there is a need to provide standard errors or confidence intervals for the estimates, bootstrap procedures are recommended. [Pg.43]

Based on the discussion of criteria for parameter estimation, it is not necessarily important to use estimators that are unbiased in the statistical sense. The emphasis should be on the overall performance of the estimator, considering precision as well as accuracy. If bias is known to be large for practical purposes, bias correction may improve performance (bootstrap bias correction is easy). However, in practice, precision may be a greater concern than bias, particularly with few data, and bias correction may result in lower precision. [Pg.43]

A nonparametric approach can involve the use of synoptic data sets. In a synoptic data set, each unit is represented by a vector of measurements instead of a single measurement. For example, for synoptic data useful for pesticide fate, assessment could take the form of multiple physical-chemical measurements recorded for each of a sample of water bodies. The multivariate empirical distribution assigns equal probability (1/n) to each of n measurement vectors. Bootstrap evaluation of statistical error can involve sampling sets of n measurement vectors (with replacement). Dependencies are accounted for in such an approach because the variable combinations allowed are precisely those observed in the data, and correlations (or other dependency measures) are fixed equal to sample values. [Pg.46]

Efron B, Tibshirani RJ. 1993. An introduction to the bootstrap. Monographs on Statistical and Probability, 57. New York Chapman and Hall. [Pg.51]

Although this approach is still used, it is undesirable for statistical reasons error calculations underestimate the true uncertainty associated with the equations (17, 21). A better approach is to use the equations developed for one set of lakes to infer chemistry values from counts of taxa from a second set of lakes (i.e., cross-validation). The extra time and effort required to develop the additional data for the test set is a major limitation to this approach. Computer-intensive techniques, such as jackknifing or bootstrapping, can produce error estimates from the original training set (53), without having to collect data for additional lakes. [Pg.30]

Bustamante, P, D. V. Hinkley, A. Martin, and S. Shi. 1991. Statistical analysis of the extended Flansen method using the Bootstrap techniquA.Pharm. Sci80 971-977. [Pg.57]

There are often data sets used to estimate distributions of model inputs for which a portion of data are missing because attempts at measurement were below the detection limit of the measurement instrument. These data sets are said to be censored. Commonly used methods for dealing with such data sets are statistically biased. An example includes replacing non-detected values with one half of the detection limit. Such methods cause biased estimates of the mean and do not provide insight regarding the population distribution from which the measured data are a sample. Statistical methods can be used to make inferences regarding both the observed and unobserved (censored) portions of an empirical data set. For example, maximum likelihood estimation can be used to fit parametric distributions to censored data sets, including the portion of the distribution that is below one or more detection limits. Asymptotically unbiased estimates of statistics, such as the mean, can be estimated based upon the fitted distribution. Bootstrap simulation can be used to estimate uncertainty in the statistics of the fitted distribution (e.g. Zhao Frey, 2004). Imputation methods, such as... [Pg.50]

Note that when more than 85% of the drug is dissolved from both products within 15 minutes, dissolution profiles may be accepted as similar without further mathematical evaluation. For the sake of completeness, one should add that some concerns have been raised regarding the assessment of similarity using the direct comparison of the fi and /2 point estimates with the similarity limits [140-142], Attempts have been made to bring the use of the similarity factor /2 as a criterion for assessment of similarity between dissolution profiles in a statistical context using a bootstrap method [141] since its sampling distribution is unknown. [Pg.112]

DM can be applied to "small" structures (< 1000 atoms in the asymmetric unit). Since a crystal with, say, 10 C atoms requires finding only x, y, and z variables, but typically several thousand intensity data can be collected, then, statistically, this is a vastly overdetermined problem. There are relationships between the contributions to the scattering intensities of two diffraction peaks (with different Miller indices h, k, l, and h, k, / ), due to the same atom at (xm, ym, zm). DM solves the phase problem by a bootstrap algorithm, which guesses the phases of a few reflections and uses statistical tools to find all other phases and, thus, all atom positions xm, ym, zm. How to start ... [Pg.750]

Disadvantages are that these response surface models are not available in standard software packages. Like all nonlinear statistical methods, the methodology is still subject to research, which has 2 important consequences. First, correlation structure of the parameters in these nonlinear models is usually not addressed. Second, the assessment of the test statistic is based on approximate statistical procedures. The statistical analyses can probably be improved through bootstrap analysis or permutation tests. [Pg.140]

Efron, B. (1987). Better Bootstrap Confidence Intervals. Journal of American Statistical Association, 82,171-200. [Pg.562]

Uncertainties inherent to the risk assessment process can be quantitatively described using, for example, statistical distributions, fuzzy numbers, or intervals. Corresponding methods are available for propagating these kinds of uncertainties through the process of risk estimation, including Monte Carlo simulation, fuzzy arithmetic, and interval analysis. Computationally intensive methods (e.g., the bootstrap) that work directly from the data to characterize and propagate uncertainties can also be applied in ERA. Implementation of these methods for incorporating uncertainty can lead to risk estimates that are consistent with a probabilistic definition of risk. [Pg.2310]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...