Distributions for Count Data

In this section we will look at two common distributions for count data. The observations are numbers of occurrences, so the only possible values are non-negative integers. [Pg.62]

The binomial distribution arises when we are counting the number of successes that occur in n independent trials, where each trial results in either a success or a failure and the probability of success remains constant over all n trials. Let IT be the probability of success on an individual trial, and Y be the total number of successes observed over the n trials. We know n, so it is a known constant, not a parameter. Then Y has the binomial n, n) distribution with probability function given by [Pg.63]

The total number of successes Y = Wj, and the W are all independent of each other. Thus the mean and variance of Y are given by [Pg.63]

Conjugate prior for binomial(n, tt) distribution is the befa(a, b) distribution. The conjugate prior will have the form [Pg.64]

Observations come from a Bernoulli process when they are from a sequence of independent Bernoulli trials. For Bernoulli trials, each trial has two possible outcomes which we label success and failure. The probability of success, tt, remains constant over all the trials. The binomial n, n) distribution arises when Y is the number of successes in a sequence of n Bernoulli trials with success probability tt. [Pg.65]

Many commonly used distributions are members of the one-dimensional exponential family. These include the binomial n, n) and Poisson fi) distributions for count data, the geometric ir) and negative binomial r, tt) distributions for waiting time in a Bernoulli process, the exponential ) and gamma n, A) distributions for waiting times in a Poisson process, and the normal p, o ) where [Pg.89]

The sampling distribution of count data can be charac terized through probabihty distributions. In many cases, count data are appropriately interpreted through their corresponding distributions. However, in other situations analysis is greatly facilitated through distributions which have been developed for measurement data. Examples of each will be illustrated in the following subsections. [Pg.489]

Also, in many apphcations involving count data, the normal distribution can be used as a close approximation. In particular, the approximation is quite close for the binomial distribution within certain guidelines. [Pg.488]

The chi-square distribution can be applied to other types of apph-catlon which are of an entirely different nature. These include apph-cations which are discussed under Goodness-of-Fit Test and Two-Way Test for Independence of Count Data. In these applications, the mathematical formulation and context are entirely different, but they do result in the same table of values. [Pg.493]

Most often the hypothesis H concerns the value of a continuous parameter, which is denoted 0. The data D are also usually observed values of some physical quantity (temperature, mass, dihedral angle, etc.) denoted y, usually a vector, y may be a continuous variable, but quite often it may be a discrete integer variable representing the counts of some event occurring, such as the number of heads in a sequence of coin flips. The expression for the posterior distribution for the parameter 0 given the data y is now given as... [Pg.316]

Because we are dealing with count data and proportions for the values qi, the appropriate conjugate prior distribution for the q s is the Dirichlet distribution,... [Pg.328]

There is some confusion in using Bayes rule on what are sometimes called explanatory variables. As an example, we can try to use Bayesian statistics to derive the probabilities of each secondary structure type for each amino acid type, that is p( x r), where J. is a, P, or Y (for coil) secondary strucmres and r is one of the 20 amino acids. It is tempting to writep( x r) = p(r x)p( x)lp(r) using Bayes rule. This expression is, of course, correct and can be used on PDB data to relate these probabilities. But this is not Bayesian statistics, which relate parameters that represent underlying properties with (limited) data that are manifestations of those parameters in some way. In this case, the parameters we are after are 0 i(r) = p( x r). The data from the PDB are in the form of counts for y i(r), the number of amino acids of type r in the PDB that have secondary structure J.. There are 60 such numbers (20 amino acid types X 3 secondary structure types). We then have for each amino acid type a Bayesian expression for the posterior distribution for the values of xiiry. [Pg.329]

Thompson and Goldstein [89] improve on the calculations of Stolorz et al. by including the secondary structure of the entire window rather than just a central position and then sum over all secondary strucmre segment types with a particular secondary structure at the central position to achieve a prediction for this position. They also use information from multiple sequence alignments of proteins to improve secondary structure prediction. They use Bayes rule to fonnulate expressions for the probability of secondary structures, given a multiple alignment. Their work describes what is essentially a sophisticated prior distribution for 6 i(X), where X is a matrix of residue counts in a multiple alignment in a window about a central position. The PDB data are used to form this prior, which is used as the predictive distribution. No posterior is calculated with posterior = prior X likelihood. [Pg.339]

The inherent limitations of attribute data prevent their use for preliminary statistical studies since specification values are not measured. Attribute data have only two values (conforming/nonconforming, pass/fail, go/no-go, present/absent) but they can be counted, analyzed, and the results plotted to show variation. Measurement can be based on the fraction defective, such as parts per million (PPM). While variables data follows a distribution curve, attribute data varies in steps since you can t count a fraction. There will either be zero errors or a finite number of errors. [Pg.368]

The noise-free Stern-Volmer lifetime plots are clearly curved, which indicates a failure of a two discrete site model. However, this is a difficult nonlinear least-squares fitting problem, and the unquenched apparent lifetimes are within a factor of two of each other. Thus, for real data, it is much more difficult to pick up on the nonlinearities and exclude a discrete two-site model. For distributions with smaller R s, of course, fitting becomes too difficult for reliable model testing at least at 104 counts in the peak channel. [Pg.98]

While radioactive decay is itself a random process, the Gaussian distribution function fails to account for probability relationships describing rates of radioactive decay Instead, appropriate statistical analysis of scintillation counting data relies on the use of the Poisson probability distribution function ... [Pg.172]

The distribution of number of new lesions (count data) is clearly not normal within each of the treatment groups. There is a peak at zero in each of the groups with then fewer and fewer patients as the number of lesions increases. A log transformation would not work here because of the presence of the zero values for the endpoint. The authors used the Mann-Whitney U-test to compare each of the natalizumab dose groups with placebo obtaining p < 0.001 in each case. Each dose level is significantly better than placebo in reducing the number of new enhancing lesions. [Pg.168]

Chi square distribution the distribution of the sum of the squares of n independent normal variates in standard form. It is used for testing the deviation of observed from expected frequencies in counted data,... [Pg.49]

The distribution coefficients calculated from the alpha counting data are given in Table VIII. The gamma counting of the solutions is still in progress. The symbols are the same as used previously. For K (alpha), the counting precision is + 10 percent for the U, Am and Cm samples and + 20-30 percent for the Np and Pu samples. For K (filter), the counting precision is + 10 percent for U, + 10-20 percent for Np, Am and Cm and +30-40 percent for the Pu samples. [Pg.239]

Poisson distribution can also be approximated by a normal distribution for a large number of counts. Therefore, nonlinear LS regression analysis is an efficient estimation procedure for the data even though the counts are Poisson distributed. [Pg.36]

Transformation based on square root from data X = /X is applied when the test values and variances are proportional as in Poisson s distribution. If the data come from counting up and the number of units is below 10 transformation form X --fX + 1 and text X =s/X + /X I 1 is used. If the test averages and their standard deviations are approximately proportional, we use the logarithm transformation X =log X. If there are data with low values or they have a zero value, we use X =log (X+l). When the squares of arithmetical averages and standard deviations are proportional we use the reciprocal transformation X =l/X or X =1/(X+1) if the values are small or are equal to zero. The transformation arc sin [X is used when values are given as proportions and when the distribution is Binomial. If the test value of the experiment is zero then instead of it we take the value l/(4n), and when it is 1, l-l/(4n) is taken as the value and n is the number of values. Transforming values where the proportion varies between 0.30 and 0.70 is practically senseless. This transformation is done by means of special tables suited for the purpose. [Pg.114]

Even though the lifetime distributions appear to be quite different, the recreated data are almost identical except for a small deviation near 200 ns and less obvious ones at shorter times. If the relative differences are plotted, systematic differences beyond the statistical noise are noticeable up to 200 ns, particularly when several channels are binned together. Given sufficient statistics, in principle, one can tell the difference between a bimodal and a monomodal distribution. The shown simulated spectra are based on 108 counts, five to 10 times the amount collected for the data discussed here. [Pg.200]

An example of data that we would expect to find distributed in a Poisson fashion is the number of radioactive disintegrations per unit time. One of the main uses for the Poisson distribution is to quantify errors in count data such as the number of minor accidents in the chemical laboratory over the course of an academic year. To decide whether data are Poisson distributed ... [Pg.273]

The particle-size distribution function (PSDF) is expressed as the number of particles per milliliter of solution per class size (particles mP pm ) a representative PSDF for the storms (as computed for storm 1) is shown in Figure 10. A cubic regression, determined to be the best fit for the data, was used to compute the PSDF at 4 and 10 pm (1S14 and Nioi ) and dmax, defined as the particle diameter for which only 10 particles were counted. Changes in these three parameters in response to the storms are shown in Figure 11. [Pg.35]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...