The Normal Distribution

Consider the situation in which a chemist randomly samples a bin of pharmaceutical granules by taking n aliquots of equal convenient sizes. Chemical analysis is then performed on each aliquot to determine the concentration (percent by weight) of pseudoephedrine hydrochloride. In this example, measurement of concentration is referred to as a continuous random variable as opposed to a discrete random variable. Discrete random variables include counted or enumerated items like the roll of a pair of dice. In chemistry we are interested primarily in the measurement of continuous properties and limit our discussion to continuous random variables. [Pg.43]

A probability distribution function for a continuous random variable, denoted by fix), describes how the frequency of repeated measurements is distributed over the range of observed values for the measurement. When considering the probability distribution of a continuous random variable, we can imagine that a set of such measurements will lie within a specific interval. The area under the curve of a graph of a probability distribution for a selected interval gives the probability that a measurement will take on a value in that interval. [Pg.43]

FIGURE 3.1 Distribution curves (a) normal and (b) standard normal. [Pg.44]

The highest point in the curve is represented by the mean because the measurements tend to cluster around some central or average value. Small deviations from the mean are more likely than large deviations, thus the curve is highest at the mean, and the tails of the curve asymptotically approach zero as the axes extend to infinity in both directions. The shape of the curve is symmetrical because negative deviations from the mean value are just as likely as positive deviations. [Pg.44]

In this example, the normal distribution for pseudoephedrine hydrochloride can be described as x = N(ji, a2), where a2 is termed the variance. When sampling an infinite population, as is the case in this example, it is impossible to determine the true population mean, p, and standard deviation, a. A reasonable, more feasible [Pg.44]

To obtain the cumulative undersize function, we must integrate this [Pg.42]

For this distribution, the mean particle size (x), the median, and the mode are all equal, and the spread is a. In this purely mathematical distribution, x can take on negative values, which is not physically meaningful. [Pg.42]

The normal distribution is expected when completely random, continuous data is collected from a population. The normal distribution is commonly referred to as the bell-shaped curve because of its shape, which resembles the outline of a bell. It is important for safety managers to become familiar with the normal distribution because the data they collect will most often be compared with this distribution. This comparison then may lead to conclusions about the population. [Pg.28]

The normal distribution has three main properties (Kuzma 1992, 81). First, it has the appearance of a bell-shaped curve that extends infinitely in both directions. Second, it is symmetrical about the mean. Finally, the number of cases expected to be found between points follows a few specific percentages. For example, the entire curve represents 100% of the population. One would expect to find approximately 34% of the subjects in a population to obtain a score between the mean and one standard deviation. One would expect to find approximately 47% of the subjects in a population to [Pg.28]

In order to distinguish between quantities measured in the sample and corresponding quantities in the population we use different symbols [Pg.29]

The mean in the sample is denoted x The mean in the population is denoted p. [Pg.29]

Remember, x and s are quantities that we calculate from our data while p. and a are theoretical quantities (parameters) that are unknown to us but nonetheless exist in the context of the broader population from which the sample (and therefore the data) is taken. If we had access to every single subject in the population then yes we could compute p. but this is never going to be the case. We can also think of p. and a as the true mean and true standard deviation respectively in the complete population. [Pg.29]

The calculation of mean and standard deviation only really makes sense when we are dealing with continuous, score or count data. These quantities have little relevance when we are looking at binary or ordinal data. In these situations we would tend to use proportions in the various categories as our summary statistics and population parameters of interest. [Pg.29]

DeMoivre was the first to show (in 1793) that the following equation closely represents experimental bell-shaped normal curves [Pg.378]

The mathematics associated with the normal distribution was independently developed by Gauss and La Place at about the same time that [Pg.378]

DeMoivre did his work. The normal distribution is sometimes called the Gaussian distribution and sometimes the Law of Errors. [Pg.379]

The derivation of Eq. (14.5) and (14.6) and Table 14.1 are beyond the scope of this presentation, but may be found in many texts concerned with statistics. The results presented here are merely to illustrate the nature of the topic and its application. [Pg.380]

Example Normal Distribution. When a metal rod was measured ten times, the following values of length (x) and deviation (d) were obtained [Pg.380]

Imagine now that we knew of a model that adequately represented the distribution of the masses of all the beans in the package. Then we would not need to weigh each bean to make inferences about the population. We could base our conclusions entirely on that model, without having to do any additional experimental work. [Pg.23]

This concept — using a model to represent a certain population — is the central theme of this book. It will be present, imphcitly or explicitly, in all of the statistical techniques we shall discuss. Even if in some cases we do not formally state our model, you will recognize it from context. Of course our inferences about the population will only be correct insofar as the chosen model is valid. For any situation, however, we will always foUow the same procedure [Pg.23]

One of the most important statistical models — arguably the most important — is the normal (or Gaussian) distribution that the famous mathematician Karl F. Gauss proposed at the beginning of the 19th century, to calculate the probabilities of occurrence of measurement [Pg.23]

Many of the results we present later are rigorously vahd only for data following a normal distribution. In practice this is not so severe a restriction, because almost aU the tests we will study remain efficient in the presence of moderate departures from normahty, and because we can use adequate experimental planning to reduce the effects of possible non-normahties. [Pg.24]

One particular version of a density curve is called the normal distribution. Height and many physiological variables conform closely (not perfectly) to this distribution. Since the word normal is used in everyday language, and since its meaning in Statistics is different and very important, the word Normal is written in this book with an upper case N when it is used in its statistical sense. [Pg.93]

This final point is expanded upon in Section 6.6.1. [Pg.94]

The area under the Normal curve is of considerable interest in Statistics. That is, it is of considerable interest to define and quantify the area bounded by the Normal curve at the top and the x-axis at the bottom. This area will be defined as 1.0, or as 100%. Given this interest, the final point in Section 6.6 raised an issue that appears problematic. That is, it appears that, if the two lower slopes of the Normal curve never quite reach the x-axis, the area under the curve is never actually fully defined and can therefore never be calculated precisely. Fortunately, this apparent paradox can be solved mathematically. In the Preface of this book I noted that, in several cases, I had resisted the temptation to provide an explanation of subtle points. This case, I believe, is a worthwhile exception. An understanding of the qualities of the Normal distribution and the Normal curve is extremely helpful in setting the scene for topics covered in Chapters 7 and 8, namely statistical significance and clinical significance. [Pg.94]

The solution is related to the observation that the sum of an infinite series can converge to a finite solution. An example that effectively demonstrates the solution here is the geometric series 1/2 + 1/4 + 1/8 +. .. ad infinitum. That is, the series starts with 1/2, and every subsequent term is one half of the previous term. Given this, the terms of the series never vanish to zero. However, the sum of them is precisely 1. The proof of this is as follows, where the series is represented as 5 [Pg.95]

Both sides of this equation are then multiplied by the same value, namely 2 (multiplying both sides of an equation by a constant means that the sides are still of equal value) [Pg.95]

Many probability distributions have been developed to describe empirical engineering data. The description of all or even the most commonly used distributions is beyond the scope of this book. We limit our description here to the normal probability distribution. This theoretical curve has widespread use in many forms of engineering research and practice. [Pg.196]

The normal distribution for a random variable, x, is defined by the following equation [Pg.196]

Notice that Greek letters are used here for the mean and standard deviation to distinguish these values from sample parameters. [Pg.197]

The total area under the normal curve is equal to 1. If the mean and variance of a normally distributed random variable are known or can be assumed, integration can be used to calculate probabilities such as [Pg.197]

This transformation permits the use of one table for all normal distrihutions. An abbreviated version of the table is shown as Table 7.14. [Pg.198]

It would be of obvious interest to have a theoretically underpinned function that describes the observed frequency distribution shown in Fig. 1.9. A number of such distributions (symmetrical or skewed) are described in the statistical literature in full mathematical detail apart from the normal- and the f-distributions, none is used in analytical chemistry except under very special circumstances, e.g. the Poisson and the binomial distributions. Instrumental methods of analysis that have Powjon-distributed noise are optical and mass spectroscopy, for instance. For an introduction to parameter estimation under conditions of linked mean and variance, see Ref. 41. [Pg.29]

For a long time it was widely believed that experimental measurements accurately conformed to the normal distribution. On the whole this is a pretty fair approximation, perhaps arrived at by uncritical extrapolation from a few well-documented cases. It is known that real distributions are wider than the normal one r-distributions for 4 to 9 degrees of freedom (see Section 1.2.2) are said to closely fit actual data.20 [Pg.29]

Does this mean that one should abandon the normal distribution As will be shown in Sections 1.8.1 through 1.8.3 the practicing analyst rarely gets [Pg.29]

For general use, the normal distribution has a number of distinct advantages over other distributions. Some of the more important advantages are as follows [Pg.30]

The normal or Gaussian distribution a bell-shaped frequency profile defined by the function [Pg.31]

One option is to first generate two random numbers and 2 between 0 and 1. T1 corresponding two numbers from the normal distribution are then calculated using... [Pg.381]

These two methods generate random numbers in the normal distribution with zero me< and unit variance. A number (x) generated from this distribution can be related to i counterpart (x ) from another Gaussian distribution with mean (x ) and variance cr using... [Pg.381]

Table 2.26a Ordinates (V) of the Normal Distribution Curve at Values of z 2.121...

Table 2.26b Areas Under the Normal Distribution Curve from 0 to z 2.122...

The normal distribution of measurements (or the normal law of error) is the fundamental starting point for analysis of data. When a large number of measurements are made, the individual measurements are not all identical and equal to the accepted value /x, which is the mean of an infinite population or universe of data, but are scattered about /x, owing to random error. If the magnitude of any single measurement is the abscissa and the relative frequencies (i.e., the probability) of occurrence of different-sized measurements are the ordinate, the smooth curve drawn through the points (Fig. 2.10) is the normal or Gaussian distribution curve (also the error curve or probability curve). The term error curve arises when one considers the distribution of errors (x — /x) about the true value. [Pg.193]

The standardized variable (the z statistic) requires only the probability level to be specified. It measures the deviation from the population mean in units of standard deviation. Y is 0.399 for the most probable value, /x. In the absence of any other information, the normal distribution is assumed to apply whenever repetitive measurements are made on a sample, or a similar measurement is made on different samples. [Pg.194]

Table 2.26a lists the height of an ordinate (Y) as a distance z from the mean, and Table 2.26b the area under the normal curve at a distance z from the mean, expressed as fractions of the total area, 1.000. Returning to Fig. 2.10, we note that 68.27% of the area of the normal distribution curve lies within 1 standard deviation of the center or mean value. Therefore, 31.73% lies outside those limits and 15.86% on each side. Ninety-five percent (actually 95.43%) of the area lies within 2 standard deviations, and 99.73% lies within 3 standard deviations of the mean. Often the last two areas are stated slightly different viz. 95% of the area lies within 1.96cr (approximately 2cr) and 99% lies within approximately 2.5cr. The mean falls at exactly the 50% point for symmetric normal distributions. [Pg.194]

To predict the properties of a population on the basis of a sample, it is necessary to know something about the population s expected distribution around its central value. The distribution of a population can be represented by plotting the frequency of occurrence of individual values as a function of the values themselves. Such plots are called prohahility distrihutions. Unfortunately, we are rarely able to calculate the exact probability distribution for a chemical system. In fact, the probability distribution can take any shape, depending on the nature of the chemical system being investigated. Fortunately many chemical systems display one of several common probability distributions. Two of these distributions, the binomial distribution and the normal distribution, are discussed next. [Pg.71]

Significance test in which the null hypothesis is rejected for values at either end of the normal distribution. [Pg.84]

Relationship between confidence intervals and results of a significance test, (a) The shaded area under the normal distribution curves shows the apparent confidence intervals for the sample based on fexp. The solid bars in (b) and (c) show the actual confidence intervals that can be explained by indeterminate error using the critical value of (a,v). In part (b) the null hypothesis is rejected and the alternative hypothesis is accepted. In part (c) the null hypothesis is retained. [Pg.85]

Interpreting Control Charts The purpose of a control chart is to determine if a system is in statistical control. This determination is made by examining the location of individual points in relation to the warning limits and the control limits, and the distribution of the points around the central line. If we assume that the data are normally distributed, then the probability of finding a point at any distance from the mean value can be determined from the normal distribution curve. The upper and lower control limits for a property control chart, for example, are set to +3S, which, if S is a good approximation for O, includes 99.74% of the data. The probability that a point will fall outside the UCL or LCL, therefore, is only 0.26%. The... [Pg.718]

For example, the proportion of the area under a normal distribution curve that lies to the right of a deviation of 0.04 is 0.4840, or 48.40%. The area to the left of the deviation is given as 1 - P. Thus, 51.60% of the area under the normal distribution curve lies to the left of a deviation of 0.04. When the deviation is negative, the values in the table give the proportion of the area under the normal distribution curve that lies to the left of z therefore, 48.40% of the area lies to the left, and 51.60% of the area lies to the right of a deviation of -0.04. [Pg.726]

Furthermore, when both np and nq are greater than 5, the binomial distribution is closely approximated by the normal distribution, and the probability tables in Appendix lA can be used to determine the location of the solute and its recovery. [Pg.759]

Ohm s law the statement that the current moving through a circuit is proportional to the applied potential and inversely proportional to the circuit s resistance (E = iR). (p. 463) on-column injection the direct injection of thermally unstable samples onto a capillary column, (p. 568) one-taUed significance test significance test in which the null hypothesis is rejected for values at only one end of the normal distribution, (p. 84)... [Pg.776]

The proof that these expressions are equivalent to Eq. (1.35) under suitable conditions is found in statistics textbooks. We shall have occasion to use the Poisson approximation to the binomial in discussing crystallization of polymers in Chap. 4, and the distribution of molecular weights of certain polymers in Chap. 6. The normal distribution is the familiar bell-shaped distribution that is known in academic circles as the curve. We shall use it in discussing diffusion in Chap. 9. [Pg.48]

We can imagine measuring experimental curves equivalent to those in Fig. 9.11 by, say, scanning the length of the diffusion apparatus by some optical method for analysis after a known diffusion time. Such results are then interpreted by rewriting Eq. (9.85) in the form of the normal distribution function, P(z) dz. This is accomplished by defining a parameter z such that... [Pg.631]

This shows that Schlieren optics provide a means for directly monitoring concentration gradients. The value of the diffusion coefficient which is consistent with the variation of dn/dx with x and t can be determined from the normal distribution function. Methods that avoid the difficulty associated with locating the inflection point have been developed, and it can be shown that the area under a Schlieren peak divided by its maximum height equals (47rDt). Since there are no unknown proportionality factors in this expression, D can be determined from Schlieren spectra measured at known times. [Pg.634]

To determine R(/) for the normal distribution, a standard normal variate must be calculated by the following formula ... [Pg.9]

Example 3 illustrated the use of the normal distribution as a model for time-to-failure. The normal distribution has an increasing ha2ard function which means that the product is experiencing wearout. In applying the normal to a specific situation, the fact must be considered that this model allows values of the random variable that are less than 2ero whereas obviously a life less than 2ero is not possible. This problem does not arise from a practical standpoint as long a.s fija > 4.0. [Pg.10]

A convenient approximate limit based on the normal distribution given by... [Pg.14]

A remarkable property of the normal distribution is that, almost regardless of the distribution of x, sample averages x will approach the gaussian distribution as n gets large. Even for relatively small values of n, of about 10, the approximation in most cases is quite close. For example, sample averages of size 10 from the uniform distribution will have essentially a gaussian distribution. [Pg.488]

Also, in many apphcations involving count data, the normal distribution can be used as a close approximation. In particular, the approximation is quite close for the binomial distribution within certain guidelines. [Pg.488]

TABLE 3-4 Ordinates and Areas between Abscissa Values -z and +z of the Normal Distribution Curve... [Pg.491]

Many distributions occurring in business situations are not symmetric but skewed, and the normal distribution cui ve is not a good fit. However, when data are based on estimates of future trends, the accuracy of the normal approximation is usually acceptable. This is particularly the case as the number of component variables Xi, Xo, etc., in Eq. (9-74) increases. Although distributions of the individual variables (xi, Xo, etc.) may be skewed, the distribution of the property or variable c tends to approach the normal distribution. [Pg.822]

As X increases, the Poisson distribution approaches the normal distribution, with the relationship... [Pg.823]

The probabihty-density function for the normal distribution cui ve calculated from Eq. (9-95) by using the values of a, b, and c obtained in Example 10 is also compared with precise values in Table 9-10. In such symmetrical cases the best fit is to be expected when the median or 50 percentile Xm is used in conjunction with the lower quartile or 25 percentile Xl or with the upper quartile or 75 percentile X[j. These statistics are frequently quoted, and determination of values of a, b, and c by using Xm with Xl and with Xu is an indication of the symmetry of the cui ve. When the agreement is reasonable, the mean v ues of o so determined should be used to calculate the corresponding value of a. [Pg.825]

In practice, we can compute K as follows [19,23]. We start with a set of trajectories at the transition state q = q. The momenta have initial conditions distributed according to the normalized distribution functions... [Pg.205]

Step 1. From a histogram of the data, partition the data into N components, each roughly corresponding to a mode of the data distribution. This defines the Cj. Set the parameters for prior distributions on the 6 parameters that are conjugate to the likelihoods. For the normal distribution the priors are defined in Eq. (15), so the full prior for the n components is... [Pg.328]

Because each side chain can be identifiably assigned to a particular component, the mixture coefficients and the normal distribution parameters can be detennined separately. [Pg.340]

Produet toleranee ean signifieantly influenee produet variability. Unfortunately, we have dilfieulty in finding the exaet relationship between them. An approximate relationship ean be found from the proeess eapability index, a quality metrie interrelated to manufaeturing eost and toleranee (Lin et al., 1997). The random manner by whieh the inherent inaeeuraeies within a manufaeturing proeess are generated produees a pattern of variation for the dimension whieh resembles the Normal distribution (Chase and Parkinson, 1991 Mansoor, 1963) and therefore proeess eapability indiees, whieh are based on the Normal distribution, are suitable for use. See Appendix IV for a detailed diseussion of proeess eapability indiees. [Pg.41]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...