Fitting Distributions to Data

In the problem of selecting a distribution for a ID model of variation, there are 2 kinds of variables, namely, 1) the data, which we know and 2) distribution parameters, which will be assigned values based on the data. Here we will often follow statistical terminology by using the term estimation (of parameters) instead of fitting. In statistical terminology, the values assigned to distribution parameters are termed estimates the expressions used to compute estimates are estimators. [Pg.34]

1 Setting Parameters Equal to Statistics, Method of Moments (MOM) [Pg.34]

The most familiar estimation procedure is to assume that the population mean and variance are equal to the sample mean and variance. More generally, the method of moments (MOM) approach is to equate sample moments (mean, variance, skewness, and kurtosis) to the corresponding population. Software such as Crystal Ball (Oracle Corporation, Redwood Shores, CA) uses MOM to fit the gamma and beta distributions (see also Johnson et al. 1994). Use of higher moments is exemplified by fitting of the [Pg.34]

The general strategy of equating parameters to statistics is of course not restricted to moments. Reliance on sample percentiles (e.g., sample median) can lead to estimators that are not excessively sensitive to outliers. In general, to fit a distribution with k parameters, k parameters must be equated to distinct sample statistics. [Pg.35]

Fitting distributions to data using linear regression... [Pg.360]

When the underlying distribution is not known, tools such as histograms, probability curves, piecewise polynomial approximations, and general techniques are available to fit distributions to data. It may be necessary to assume an appropriate distribution in order to obtain the relevant parameters. Any assumptions made should be supported by manufacturer s data or data from the literature on similar items working in similar environments. Experience indicates that some probability distributions are more appropriate in certain situations than others. What follows is a brief overview on their applications in different environments. A more rigorous discussion of the statistics involved is provided in the CPQRA Guidelines. ... [Pg.230]

Leptokurtic distributions are more outlier-prone. When fitting distributions to data, it may sometimes be difficult to decide whether one should assume a leptokurtic distribution (say, a Student t distribution with relatively few degrees of freedom) or assume the presence of a few outliers. [Pg.33]

Having selected an appropriate data set, we must select a type of distribution and fit the distribution to the data, or else use an empirical or other nonparametric distribution. There appears to be some mechanistic basis for the log-normal distribution, for environmental concentrations (Ott 1990, 1995). However, in a given situation there may not be very strong theoretical support for a specific type of distribution, log-normal or otherwise. Alternative distributions may need to be considered based on the quality of fit of the distribution to data. Therefore, it is desirable to have quantitative indices that can be used to compare or rank distributions based on agreement with data. The fit of the log-normal distribution (or whatever distributions we may choose) should be evaluated in particular situations, using graphical as well as statistical procedures. [Pg.31]

Disadvantages arise mainly from the complexity of the statistical algorithms and the fact that fitting models to data is time consuming. The first-order (EO) method used in NONMEM also results in biased estimates of parameters, especially when the distribution of inter individual variability is specified incorrectly. The first-order conditional estimation (EOCE) procedure is more accurate but is even more time consuming. The objective function and adequacy of the model are based in part on the residuals, which for NONMEM are determined based on the predicted concentrations for the mean pharmacokinetic parameters rather than on the predicted concentrations for each individual. Therefore, the residuals are confounded by intraindividual, inter individual, and linearization errors. [Pg.134]

It would seem better to transform chemisorption isotherms into corresponding site energy distributions in the manner reviewed in Section XVII-14 than to make choices of analytical convenience regarding the f(Q) function. The second procedure tends to give equations whose fit to data is empirical and deductions from which can be spurious. [Pg.700]

Many distributions occurring in business situations are not symmetric but skewed, and the normal distribution cui ve is not a good fit. However, when data are based on estimates of future trends, the accuracy of the normal approximation is usually acceptable. This is particularly the case as the number of component variables Xi, Xo, etc., in Eq. (9-74) increases. Although distributions of the individual variables (xi, Xo, etc.) may be skewed, the distribution of the property or variable c tends to approach the normal distribution. [Pg.822]

The estimation of the mean and standard deviation using the moment equations as described in Appendix I gives little indication of the degree of fit of the distribution to the set of experimental data. We will next develop the concepts from which any continuous distribution can be modelled to a set of data. This ultimately provides the most suitable way of determining the distributional parameters. [Pg.140]

The praetieal utilization of linear reetifieation is demonstrated later through a worked example. Fitting statistieal distributions to sample data using the linear reetifieation method ean be found in Ayyub and MeCuen (1997), Edwards and MeKee (1991), Kottegoda and Rosso (1997), Leiteh (1995), Lewis (1996), Metealfe (1997), Misehke (1992), Rao (1992), and Shigley and Misehke (1989). [Pg.143]

Example - fitting a Normal distribution to a set of existing data... [Pg.145]

In this case a-y is 419 m. The peak concentration can be found from the measurements, or from the Gaussian distribution fitted to the data and the peak concentration obtained from the fitted distribution. Provided that the emission rate Q, the height of release H, and the mean wind speed u are known, the standard deviation of the vertical distribution of the pollutant can be approximated from either the peak concentration (actual or fitted) or the cross wind integrated (CWI) concentration from one of the following equations ... [Pg.314]

Reduce >f minimal cui failure data Mathematical combination of uncertainties output includes two moments of minimal cutsets and the lop event Johnson, empirical C le multiple. sy.siem fiiiJLuizjn with multiple data input descriptions can fit Johnsem-type distribution to the top event 1 t brnia... [Pg.132]

For any distribution, the cumulative hazard function and the cumulative distribution junction are connected by a simple relationship. The probability scale for the cumulative distribution function appears on the horizontal axis at the top of hazard paper and is determined from that relationship. Thus, the line fitted to data on hazard paper... [Pg.1050]

It is important to note that the fitting according to eq. (1) requires zero intercept behavior i.e., F =. 00 for H (for which Oj = Or =. 00). While we recognize that the data for the unsubstituted (H) member of a set may be as subject to experimental error as any other member, such error is generally relatively small for a set of reliable data. Any constant error from this source will be distributed among all of the substituents in such a manner as to achieve best fit. Any loss in precision of fitting of the set which may result by such a procedure we believe is a small price to pay compared to the violence done by introduction in eq. (I) of a completely variable constant parameter. The latter procedure has been utilized by other authors both in treatments by the simple Hammett equation and by the dual substituent parameter equation. [Pg.512]

This test is used to judge both the similarity of two distributions and the fit of a model to data. [Pg.76]

Figure 9. Data reduction and data analysis in EXAFS spectroscopy. (A) EXAFS spectrum x(k) versus k after background removal. (B) The solid curve is the weighted EXAFS spectrum k3x(k) versus k (after multiplying (k) by k3). The dashed curve represents an attempt to fit the data with a two-distance model by the curve-fitting (CF) technique. (C) Fourier transformation (FT) of the weighted EXAFS spectrum in momentum (k) space into the radial distribution function p3(r ) versus r in distance space. The dashed curve is the window function used to filter the major peak in Fourier filtering (FF). (D) Fourier-filtered EXAFS spectrum k3x (k) versus k (solid curve) of the major peak in (C) after back-transforming into k space. The dashed curve attempts to fit the filtered data with a single-distance model. (From Ref. 25, with permission.)...

The model suffers from the fact that it allows only integer values of n and that it may not be possible to obtain a match of the residence time distribution function at both high and low values of F(t) with the same value of n. Buffham and Gibilaro (16) have generalized the model to include noninteger values of n. The technique outlined by these individuals is particularly useful in obtaining better fits of the data for cases where n is less than 5. [Pg.407]

For a layer-stack material like polyethylene or other semicrystalline polymers the IDF presents clear hints on the shape of the layer thickness distributions, the range of order, and the complexity of the stacking topology. Based on these findings inappropriate models for the arrangement of the layers can be excluded. Finally the remaining suitable models can be formulated and tested by trying to fit the experimental data. [Pg.165]

Figure 3.1 shows the appearance of dihydromethysticin in the acceptor well as a function of time [15], The solid curve is a least-squares fit of the data points to Eq. (1), with the parameters Pe = 32 x 10-6 cm s 1, R = 0.42, and t s = 35 min. The membrane retention, R, is often stated as a mole percentage (%R) of the sample (rather than a fraction). Its value can at times be very high - up to 90% for chlor-promazine and 70% for phenazopyridine, when 2% wt/vol DOPC in dodecane is used. Figure 3.2 shows a plot of log %R versus log Ka(7.4), the octanol/water apparent partition coefficient. It appears that retention is due to the lipophilicity of molecules this may be a good predictor of the pharmacokinetic volume of distribution or of protein binding. [Pg.50]

Figure 28a shows the result of SAXS on sample BrlOOO. We used Guinier s formula (see eq. 6) for the small angle scattering intensity, I(k), from randomly located voids with radius of gyration, Rg. Although Guinier s equation assumes a random distribution of pores with a homogeneous pore size, it fits our experimental data well. The slope of the solid line in Fig. 28b gives R - 5.5 A and this value has been used for the calculated curve in Fig. 28a. This suggests a relatively narrow pore-size distribution with an equivalent spherical pore diameter of about 14A. Similar results were found for the other heated resin samples, except that the mean pore diameter changed from about 12 A for samples made at 700°C to about 15 A for samples made at 1100°C.

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...