Estimation of Mean and Variance

The mean p, and the variance of a random variable are constants characterizing the random variable s average value and dispersion, respectively, about its mean. The mean and variance can also be derived fiom the probability distribution function (pdf)—to be discussed shortly—of the random variable. If the pdf is unknown, however, the mean and the variance can be estimated on the basis of a random sample of some, but not all, observations on the random variable. [Pg.354]

Let X], X2. X denote a random sample of n observations on X. Then the sample mean X is defined by [Pg.354]

In the case of a random sample of observations on a continuous random variable assumed to have a so-called normal pdf, the graph of which is a bell-shaped curve, the following statements give a more precise interpretation of the sample standard deviation j as a measure of spread or dispersion. [Pg.355]

The source of these percentages is the normal probability distribution, which is discussed in more detail later in this chapter. [Pg.355]

Chebyshev s theorem provides an interpretation of the sample standard deviation (the positive square root of the sample variance) as a measure of the spread (dispersion) of sample observations about their mean. Chebyshev s theorem states that with it 1, at least (1 - l/k ), of the sample observations lie in the interval X (ks, X + ks). For k = 2, e.g., this means that at least 75% of the sample observations lie in the interval X (2s, X - - 2s). The smaller the value of s, the greater the concentration of observations in the vicinity of X. [Pg.355]

X is p and tliat tlie expected value of S is a. Because of this, X and S are called unbiased estimators of p and respectively. [Pg.562]

Cliebysliev s theorem provides an interpretation of the sample standard deviation, tlie positive square root of the sample variance, as a measure of the spread (dispersion) of sample observations about tlieir mean. Chebyshev s tlieorem states tliat at least (1 - 1/k ), k 1, of Uie sample observations lie in Uie [Pg.563]

FIGURE 5 Relationship between sampling rate and effective resolution of process capability assessment. The curve is based on the width of the confidence intervals for estimation of mean and variance. The relationship shown does not consider the effect of reference measurement precision, which would further reduce the ability to discern changes in process capability. [Pg.323]

ESTIMATION OF MEAN AND VARIANCE 357 median, the data provided must first be arranged in order of magnitude, such as... [Pg.357]

Polymerizations. Polymerizations were performed in solution with a 0.5-L continuous stirred tank reactor this apparatus provided polymers of constant composition. After steady-state operation was obtained (approximately three residence times, see Figure 1), 10-mL samples were periodically taken from the effluent, added to 200 xL of a hydroquinone solution, and stored at 10 °C. These samples were subsequently analyzed by HPLC to estimate the mean and variance of the residual monomer concentration and copolymer composition. The polymerization temperatures were 45 and 60 °C for the dimethylamines and 50 °C for DADMAC. The initial monomer concentration was 0.5 mol L" and the monomer feed ratio was varied between 0.3 and 0.7. Azocyanovaleric acid (ACV, Wako Chemical Co.) and potassium persulfate (KPS, BDH Chemicals) were used to initiate the reaction. The solution was agitated at 300 1 rpm for the duration of the polymerization. [Pg.177]

Table 2 shows tolerance limits for the selected point in the Pareto fiwnt, obtained from these assumptions on the input vector distribution in addition to further tolerance intervals obtained imder the assumption of normal independent distributions on the initial parameters, where every initial parameter follows a normal distribution with its point estimation as mean and variance equal to its correspondent element in the principal diagonal of covariance matrix in Eq. (17). [Pg.483]

Using this Equation 3.308 and the values of mean and variance calculafed from the E-curve data, the value of model parameter N is estimated for the given reaction vessel. It may be seen from Equation 3.308 that the variance (o = 0) is minimum for an ideal PER (N = CXD ) and is maximum (o = 9 ) for an ideal CSTR N = 1). Although N is defined as a whole number, that is, integer, it can also take a fractional value. [Pg.216]

There are many different ways to treat mathematically uncertainly, but the most common approach used is the probability analysis. It consists in assuming that each uncertain parameter is treated as a random variable characterised by standard probability distribution. This means that structural problems must be solved by knowing the multi-dimensional Joint Probability Density Function of all involved parameters. Nevertheless, this approach may offer serious analytical and numerical difficulties. It must also be noticed that it presents some conceptual limitations the complete uncertainty parameters stochastic characterization presents a fundamental limitation related to the difficulty/impossibility of a complete statistical analysis. The approach cannot be considered economical or practical in many real situations, characterized by the absence of sufficient statistical data. In such cases, a commonly used simplification is assuming that all variables have independent normal or lognormal probability distributions, as an application of the limit central theorem which anyway does not overcome the previous problem. On the other hand the approach is quite usual in real situations where it is only possible to estimate the mean and variance of each uncertainty parameter it being not possible to have more information about their real probabilistic distribution. The case is treated assuming that all uncertainty parameters, collected in the vector d, are characterised by a nominal mean value iJ-dj and a correlation =. In this specific... [Pg.535]

Statistical data analysis of operation time till failure shows that operation time till failure T as random variable follows Weibull distribution (according to performed goodness of fit tests). The parameters k and p are assumed as independent random variables with prior probability density functions p x)—gamma pdf with mean value equals to prior (DPSIA) estimate of k and variance—10% of estimate value, / 2(j)— inverse gamma (as conjugate prior (Bernardo et al, 2003 Berthold et al, 2003)) pdf with mean value equals to prior (DPSIA) estimate of p and variance—10% of estimate value. Failure data tj, j =1,2,. .., 28. Thus, likelihood function is... [Pg.421]

It is possible to compare the means of two relatively small sets of observations when the variances within the sets can be regarded as the same, as indicated by the F test. One can consider the distribution involving estimates of the true variance. With sj determined from a group of observations and S2 from a second group of N2 observations, the distribution of the ratio of the sample variances is given by the F statistic ... [Pg.204]

The statistical measures can be calculated using most scientific calculators, but confusion can arise if the calculator offers the choice between dividing the sum of squares by N or by W — 1 . If the object is to simply calculate the variance of a set of data, divide by N . If, on the other hand, a sample set of data is being used to estimate the properties of a supposed population, division of the sum of squares by W — r gives a better estimate of the population variance. The reason is that the sample mean is unlikely to coincide exactly with the (unknown) true population mean and so the sum of squares about the sample mean will be less than the true sum of squares about the population mean. This is compensated for by using the divisor W — 1 . Obviously, this becomes important with smaller samples. [Pg.278]

A brief digression. In the language of statistics, the results for each of the stepped distributions in Figure 10-1 constitute a sample1 of the population that is distributed according to the continuous curve for the universe. A sample thus contains a limited number of x s taken from the universe that contains all possible z s. All simple frequency distributions are characterized by a mean and a variance. (The square root of the variance is the standard deviation.) For the population, the mean is u and the variance is a2. For any sample, the mean is x and the (estimate of) variance is s2. Now, x and s2 for any sample can never be as reliable as p and a2 because no sample can contain the entire population ir and s2 are therefore only the experimental estimates of g and cr2. In all that follows, we shall be concerned only with these estimates for simplicity s sake, we shall call s2 the variance. We have already met s—for example, at the foot of Table 7-4. [Pg.268]

The mean and variance of the difference between B colla (from table) and B colu is determined for all 14 diets for each trial combination of dp, dpi and to, and the best values for dp, dpj and w chosen to minimize both the mean and the variance. These values turn out to be dp = +5, dN = +2 and CO = -0.75. Figure All.l shows a plot of the difference between the estimated and calculated collagen values for each diet for this particular DIFF, and it can be seen that, except for one point, the others are correctly estimated to within 1 or 1.5%o. [Pg.238]

It would be of obvious interest to have a theoretically underpinned function that describes the observed frequency distribution shown in Fig. 1.9. A number of such distributions (symmetrical or skewed) are described in the statistical literature in full mathematical detail apart from the normal- and the f-distributions, none is used in analytical chemistry except under very special circumstances, e.g. the Poisson and the binomial distributions. Instrumental methods of analysis that have Powjon-distributed noise are optical and mass spectroscopy, for instance. For an introduction to parameter estimation under conditions of linked mean and variance, see Ref. 41. [Pg.29]

The quantities AUMC and AUSC can be regarded as the first and second statistical moments of the plasma concentration curve. These two moments have an equivalent in descriptive statistics, where they define the mean and variance, respectively, in the case of a stochastic distribution of frequencies (Section 3.2). From the above considerations it appears that the statistical moment method strongly depends on numerical integration of the plasma concentration curve Cp(r) and its product with t and (r-MRT). Multiplication by t and (r-MRT) tends to amplify the errors in the plasma concentration Cp(r) at larger values of t. As a consequence, the estimation of the statistical moments critically depends on the precision of the measurement process that is used in the determination of the plasma concentration values. This contrasts with compartmental analysis, where the parameters of the model are estimated by means of least squares regression. [Pg.498]

This choice of Qi yields maximum likelihood estimates of the parameters if the error terms in each response variable and for each experiment (eu, i=l,...N j=l,...,w) are all identically and independently distributed (i.i.d) normally with zero mean and variance, o . Namely, (e,) = 0 and COV(s,) = a I where I is the mxm identity matrix. [Pg.26]

One must note that probability alone can only detect alikeness in special cases, thus cause-effect cannot be directly determined - only estimated. If linear regression is to be used for comparison of X and Y, one must assess whether the five assumptions for use of regression apply. As a refresher, recall that the assumptions required for the application of linear regression for comparisons of X and Y include the following (1) the errors (variations) are independent of the magnitudes of X or Y, (2) the error distributions for both X and Y are known to be normally distributed (Gaussian), (3) the mean and variance of Y depend solely upon the absolute value of X, (4) the mean of each Y distribution is a straight-line function of X, and (5) the variance of X is zero, while the variance of Y is exactly the same for all values of X. [Pg.380]

Let us consider the system of g overmeasured (redundant) variables in m balance equations. Assuming that all of the errors are normally distributed with zero mean and variance I>, it has been shown that the least squares estimate of the measurement errors is given by the solution of the following problem ... [Pg.133]

Let 1, x2,..., xn be a random sample of N observations from an unknown distribution with mean fi and variance o2. It can be demonstrated that the sample variance V, given by equation A.8, is an unbiased estimator of the population variance a2. [Pg.279]

The estimation of means, variances, and covariances of random variables from the sample data is called point estimation, because one value for each parameter is obtained. By contrast, interval estimation establishes confidence intervals from sampling. [Pg.280]

Now consider a small sample (n = 9, say) drawn from an infinite population. The responses in this sample can be used to calculate the sample mean, y and the sample variance, s. It is highly improbable that the sample mean will equal exactly the population mean (yj = p), or that the sample variance will equal exactly the population variance (s = cr ). It is true that the sample mean will be approximately equal to the population mean, and that the sample variance will be approximately equal to the population variance. It is also true (as would be expected) that as the number of responses in the sample increases, the closer the sample mean approximates the population mean, and the closer the sample variance approximates the population variance. The sample mean, yy, is said to be an estimate of the population mean, p, and the sample variance, is said to be an estimate of the population variance,... [Pg.52]

A function applied in statistics to predict the relative distribution if the frequency of occurrence of a continuous random variable (i.e., a quantity that may have a range of values which cannot be individually predicted with certainty but can be described probabilistically) from which the mean and variance can be estimated. [Pg.572]

One uses ANOVA when comparing differences between three or more means. For two samples, the one-way ANOVA is the equivalent of the two-sample (unpaired) t test. The basic assumptions are (a) within each sample, the values are independent and identically normally distributed (i. e., they have the same mean and variance) (b) samples are independent of each other (c) the different samples are all assumed to come from populations having the same variance, thereby allowing for a pooled estimate of the variance and (d) for a multiple comparisons test of the sample means to be meaningful, the populations are viewed as fixed, meaning that the populations in the experiment include all those of interest. [Pg.652]

The most familiar estimation procedure is to assume that the population mean and variance are equal to the sample mean and variance. More generally, the method of moments (MOM) approach is to equate sample moments (mean, variance, skewness, and kurtosis) to the corresponding population. Software such as Crystal Ball (Oracle Corporation, Redwood Shores, CA) uses MOM to fit the gamma and beta distributions (see also Johnson et al. 1994). Use of higher moments is exemplified by fitting of the... [Pg.34]

Figure 21.1 Cumulative frequency plot illustration 25th, 75th percentiles, and the interquartile range. estimate of the population variance. Population means and variances are by convention denoted by the Greek letters p and o2, respectively, while the corresponding sample parameters are denoted by X and s2.

When we are dealing with samples rather than populations, we cannot use the standard normal deviate, Z, to make predictions since this requires knowledge of the population mean and variance or standard deviation. In general, we do not know the value of these parameters. However, provided the sample is a random one, its mean 5 is a reliable estimate of the population mean p, and we can use the central limit theorem to provide an estimate of o. This esti mate, known as the standard error of the mean, is given by ... [Pg.302]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...