Distributions, selection empirical distribution

Having selected an appropriate data set, we must select a type of distribution and fit the distribution to the data, or else use an empirical or other nonparametric distribution. There appears to be some mechanistic basis for the log-normal distribution, for environmental concentrations (Ott 1990, 1995). However, in a given situation there may not be very strong theoretical support for a specific type of distribution, log-normal or otherwise. Alternative distributions may need to be considered based on the quality of fit of the distribution to data. Therefore, it is desirable to have quantitative indices that can be used to compare or rank distributions based on agreement with data. The fit of the log-normal distribution (or whatever distributions we may choose) should be evaluated in particular situations, using graphical as well as statistical procedures. [Pg.31]

When enough data are available, the need to assume a specific parametric distribution can be avoided by using the empirical distribution. The empirical distribution based on n observations is the distribution that assigns equal probability (1/n) to each observed value. A particular focus of a workshop on distribution selection (USEPA 1998) was considerations for choosing between the use of parametric distribution functions. .. and empirical distribution functions. That report of the workshop emphasizes case-specific criteria. [Pg.41]

If a parametric distribution (e.g. normal, lognormal, loglogistic) is fit to empirical data, then additional uncertainty can be introduced in the parameters of the fitted distribution. If the selected parametric distribution model is an appropriate representation of the data, then the uncertainty in the parameters of the fitted distribution will be based mainly, if not solely, on random sampling error associated primarily with the sample size and variance of the empirical data. Each parameter of the fitted distribution will have its own sampling distribution. Furthermore, any other statistical parameter of the fitted distribution, such as a particular percentile, will also have a sampling distribution. However, if the selected model is an inappropriate choice for representing the data set, then substantial biases in estimates of some statistics of the distribution, such as upper percentiles, must be considered. [Pg.28]

Since a random variable such as age can take on a number of values for a group of study participants it is of interest to know something about the relative frequency of each value. The relative frequency is the count of the number of observations with a specific value (for example, the number of 30-year-old participants) divided by the total number in the sample. An informative first step in a statistical analysis is to examine characteristics of the relative frequency of values of the random variable of interest, which can also be called the empirical distribution of the random variable. This knowledge is an essential part of selecting the most appropriate statistical analysis. Statistical software packages offer a number of methods to describe the relative frequency of values including tabular frequency displays, dot plots, relative frequency histograms, and stem-and-leaf plots. [Pg.49]

The smoothed bootstrap has been proposed to deal with the discreteness of the empirical distribution function (F) when there are small sample sizes (A < 15). For this approach one must smooth the empirical distribution function and then bootstrap samples are drawn from the smoothed empirical distribution function, for example, from a kernel density estimate. However, it is evident that the proper selection of the smoothing parameter (h) is important so that oversmoothing or undersmoothing does not occur. It is difficult to know the most appropriate value for h and once the value for h is assigned it influences the variability and thus makes characterizing the variability terms of the model impossible. There are few studies where the smoothed bootstrap has been applied (21,27,28). In one such study the improvement in the correlation coefficient when compared to the standard non-parametric bootstrap was modest (21). Therefore, the value and behavior of the smoothed bootstrap are not clear. [Pg.407]

Based on model population analysis, here we propose to perform model comparison by deriving an empirical distribution of the difference of RMSEP or RMSECV between two models (variable sets), followed by testing the null hypothesis that the difference of RMSEP or RMSECV between two models is zero. Without loss of generality, we describe the proposed method by taking the distribution of difference of RMSEP as an example. We assume that the data X consists of m samples in row and p variables in column and the target value Y is an m-dimensional column vector. Two variable sets, say Vi and V2, selected from the p variables, then can be compared using the MPA-based method described below. [Pg.9]

It is clear that the prior distribution affects the modal class selection results. Therefore, the choice of prior distribution is important for model class selection because it offers a reference for comparison in quantifying the information gained from the data. The prior distribution expresses how much previous experience or information a user has about a model class. A more informative prior distribution is used if the user has more experience of the model class. The evidence of such a model class is surplus due to the lifting of the prior PDF. However, inappropriate previous information on the parameters will be penalized by the small value of the inner product of the prior distribution and the likelihood function. In general, it is more difficult to give the prior distribution for empirical models since the physical meaning of the parameters are not as obvious as physical models. More investigations are needed to explore further in this direction. [Pg.251]

The selection of suitable distribution models -e.g. Hjorth distribution, SB-Johnson distribution or Weibull distribution - is based on empirical knowledge on reliability analysis within the automotive industry (Bracke, S. 2008 Bracke, S. Haller, S. 2009a). [Pg.802]

Because corrosion phenomena are complex, deterministic models evolve continually as restrictive hypotheses are eased when additional, empirical knowledge is acquired. In essence, it is the scientific method that nudges a model to reality. Hybrid deterministic models have been developed in fracture and fatigue where a particular property or parameter is considered to be statistically distributed. This statistical distribution is carefully chosen for implementation to selected parameters in the deterministic model (a true deterministic model retains its probabilistic aspect as a placeholder until the statistical scatter can be replaced with true mechanistic understanding). [Pg.90]

The characteristics of other distributions that have been applied to aerosol particle size, such as the Rosin-Rammler, Nukiyama-Tanasawa, power law, exponential, and Khrgian-Mazin distributions are given in the appendix to this chapter. These distributions apply to special situations and And limited application in aerosol science. They (and the lognormal distribution) have been selected empirically to fit the wide range and skewed shape of most aerosol size distributions. [Pg.47]

Correlations of nucleation rates with crystallizer variables have been developed for a variety of systems. Although the correlations are empirical, a mechanistic hypothesis regarding nucleation can be helpful in selecting operating variables for inclusion in the model. Two examples are (/) the effect of slurry circulation rate on nucleation has been used to develop a correlation for nucleation rate based on the tip speed of the impeller (16) and (2) the scaleup of nucleation kinetics for sodium chloride crystalliza tion provided an analysis of the role of mixing and mixer characteristics in contact nucleation (17). Pubhshed kinetic correlations have been reviewed through about 1979 (18). In a later section on population balances, simple power-law expressions are used to correlate nucleation rate data and describe the effect of nucleation on crystal size distribution. [Pg.343]

The solvent triangle classification method of Snyder Is the most cosDBon approach to solvent characterization used by chromatographers (510,517). The solvent polarity index, P, and solvent selectivity factors, X), which characterize the relative importemce of orientation and proton donor/acceptor interactions to the total polarity, were based on Rohrscbneider s compilation of experimental gas-liquid distribution constants for a number of test solutes in 75 common, volatile solvents. Snyder chose the solutes nitromethane, ethanol and dloxane as probes for a solvent s capacity for orientation, proton acceptor and proton donor capacity, respectively. The influence of solute molecular size, solute/solvent dispersion interactions, and solute/solvent induction interactions as a result of solvent polarizability were subtracted from the experimental distribution constants first multiplying the experimental distribution constant by the solvent molar volume and thm referencing this quantity to the value calculated for a hypothetical n-alkane with a molar volume identical to the test solute. Each value was then corrected empirically to give a value of zero for the polar distribution constant of the test solutes for saturated hydrocarbon solvents. These residual, values were supposed to arise from inductive and... [Pg.749]

Media were devised that supported consistent growth of the rhizobia, and mannitol often was favored as the energy source in the media. Yeast extract commonly was added to supply micronutrients. Selection of effective strains of the rhizobia was an empirical process based on greenhouse and field testing. Usually, several effective strains were grown on liquid media and then mixed on a solid support for distribution. [Pg.104]

Extensive literature has developed related to the preferential interaction of different solvents with proteins or peptides in bulk solution.156-5X1 Similar concepts can be incorporated into descriptions of the RPC behavior of peptides and employed as part of the selection criteria for optimizing the separation of a particular peptide mixture. As noted previously, the dependency of the equilibrium association constant, /CassoCji, of a peptide and the concentration of the solvent required for desorption in RPC can be empirically described1441 in terms of nonmechanistic, stoichiometric solvent displacement or preferential hydration models, whereby the mass distribution of a peptide P, with n nonpolar ligands, each of which is solvated with solvent molecules Da is given by the following ... [Pg.562]

Traditionally toxicologists have used at least one rodent and nonrodent species for multidose toxicity studies. The use of two species is important for assessing potential variability of metabolism, for products with extensive distribution, and in cases where a relevant species has not been defined. The rat and dog are selected in most cases, usually on an empirical basis [2] without an open-minded consideration of whether alternate species might be better in terms of biochemistry and metabolism [1],... [Pg.54]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...