Statistics parametric distributions

Statistical tests will have relatively low power. In particular, there will be low power for testing the fit of a parametric distribution. [Pg.46]

The corrected / -values at any given locus can then be obtained using an adjusted distribution that accounts for any inflation observed. The structured association approach differs from the genomic approach in that it estimates the population structure while genomic control assumes a particular parametric distribution of the value of the test statistic (70). Compared with the structured association, the genomic control approach is computationally simple and can be applied to both scanning and validation stages. [Pg.365]

If a parametric distribution (e.g. normal, lognormal, loglogistic) is fit to empirical data, then additional uncertainty can be introduced in the parameters of the fitted distribution. If the selected parametric distribution model is an appropriate representation of the data, then the uncertainty in the parameters of the fitted distribution will be based mainly, if not solely, on random sampling error associated primarily with the sample size and variance of the empirical data. Each parameter of the fitted distribution will have its own sampling distribution. Furthermore, any other statistical parameter of the fitted distribution, such as a particular percentile, will also have a sampling distribution. However, if the selected model is an inappropriate choice for representing the data set, then substantial biases in estimates of some statistics of the distribution, such as upper percentiles, must be considered. [Pg.28]

There are often data sets used to estimate distributions of model inputs for which a portion of data are missing because attempts at measurement were below the detection limit of the measurement instrument. These data sets are said to be censored. Commonly used methods for dealing with such data sets are statistically biased. An example includes replacing non-detected values with one half of the detection limit. Such methods cause biased estimates of the mean and do not provide insight regarding the population distribution from which the measured data are a sample. Statistical methods can be used to make inferences regarding both the observed and unobserved (censored) portions of an empirical data set. For example, maximum likelihood estimation can be used to fit parametric distributions to censored data sets, including the portion of the distribution that is below one or more detection limits. Asymptotically unbiased estimates of statistics, such as the mean, can be estimated based upon the fitted distribution. Bootstrap simulation can be used to estimate uncertainty in the statistics of the fitted distribution (e.g. Zhao Frey, 2004). Imputation methods, such as... [Pg.50]

Discuss the methods and report the goodness-of-fit statistics for any parametric distributions for input variables that were fitted quantitatively to measured data. [Pg.148]

Finally, mention should be made of the development of information theoretic methods for characterizing product energy distributions. Surprisal analysis183 may offer a means of compacting and parametrizing distributions for a wide range of reactions, by comparing with the statistically expected distribution in each case. [Pg.307]

From a statistical point of view, the validity of the test results based on a low test data volume and the distribution heterogeneities is considerable. For example parametrical distribution models are unverifiable therefore parametric significance tests are not applicable. [Pg.1849]

Multiple linear regression is strictly a parametric supervised learning technique. A parametric technique is one which assumes that the variables conform to some distribution (often the Gaussian distribution) the properties of the distribution are assumed in the underlying statistical method. A non-parametric technique does not rely upon the assumption of any particular distribution. A supervised learning method is one which uses information about the dependent variable to derive the model. An unsupervised learning method does not. Thus cluster analysis, principal components analysis and factor analysis are all examples of unsupervised learning techniques. [Pg.719]

Log normal distribution, the distribution of a sample that is normal only when plotted on a logarithmic scale. The most prevalent cases in pharmacology refer to drug potencies (agonist and/or antagonist) that are estimated from semilogarithmic dose-response curves. All parametric statistical tests on these must be performed on their logarithmic counterparts, specifically their expression as a value on the p scale (-log values) see Chapter 1.11.2. [Pg.280]

We also make a distinction between parametric and non-parametric techniques. In the parametric techniques such as linear discriminant analysis, UNEQ and SIMCA, statistical parameters of the distribution of the objects are used in the derivation of the decision function (almost always a multivariate normal distribution... [Pg.212]

For continuous variables you may be required to provide inferential statistics along with the descriptive statistics that you generate from PROC UNIVARIATE. The inferential statistics discussed here are all focused on two-sided tests of mean values and whether they differ significantly in either direction from a specified value or another population mean. Many of these tests of the mean are parametric tests that assume the variable being tested is normally distributed. Because this is often not the case with clinical trial data, we discuss substitute nonparametric tests of the population means as well. Here are some common continuous variable inferential tests and how to get the inferential statistics you need out of SAS. [Pg.255]

The analysis of rank data, what is generally called nonparametric statistical analysis, is an exact parallel of the more traditional (and familiar) parametric methods. There are methods for the single comparison case (just as Student s t-test is used) and for the multiple comparison case (just as analysis of variance is used) with appropriate post hoc tests for exact identification of the significance with a set of groups. Four tests are presented for evaluating statistical significance in rank data the Wilcoxon Rank Sum Test, distribution-free multiple comparisons, Mann-Whitney U Test, and the Kruskall-Wallis nonparametric analysis of variance. For each of these tests, tables of distribution values for the evaluations of results can be found in any of a number of reference volumes (Gad, 1998). [Pg.910]

There is often a particular concern for the effects of outliers or heavy-tailed distributions when using standard statistical techniques. To address this type of a situation, a parametric approach would be to use ML estimation assuming a heavy-tailed distribution (perhaps a Student t distribution with few degrees of freedom). However, simple ad hoc methods such as trimmed means may be useful. There is a large statistical literature on robust and outlier-resistant methods, (e.g., Hoaglin et al. 1983 Barnett and Lewis 1994). [Pg.39]

A basic assumption underlying r-tests and ANOVA (which are parametric tests) is that cost data are normally distributed. Given that the distribution of these data often violates this assumption, a number of analysts have begun using nonparametric tests, such as the Wilcoxon rank-sum test (a test of median costs) and the Kolmogorov-Smirnov test (a test for differences in cost distributions), which make no assumptions about the underlying distribution of costs. The principal problem with these nonparametric approaches is that statistical conclusions about the mean need not translate into statistical conclusions about the median (e.g., the means could differ yet the medians could be identical), nor do conclusions about the median necessarily translate into conclusions about the mean. Similar difficulties arise when - to avoid the problems of nonnormal distribution - one analyzes cost data that have been transformed to be more normal in their distribution (e.g., the log transformation of the square root of costs). The sample mean remains the estimator of choice for the analysis of cost data in economic evaluation. If one is concerned about nonnormal distribution, one should use statistical procedures that do not depend on the assumption of normal distribution of costs (e.g., nonparametric tests of means). [Pg.49]

The statistical methods discussed up to now have required certain assumptions about the populations from which the samples were obtained. Among these was that the population could be approximated by a normal distribution and that, when dealing with several populations, these have the same variance. There are many situations where these assumptions cannot be met, and methods have been developed that are not concerned with specific population parameters or the distribution of the population. These are referred to as non-parametric or distribution-free methods. They are the appropriate methods for ordinal data and for interval data where the requirements of normality cannot be assumed. A disadvantage of these methods is that they are less efficient than parametric methods. By less efficient is meant... [Pg.305]

Non-parametric methods statistical tests which make no assumptions about the distributions from which the data are obtained. These can be used to show iifferences, relationships, or association even when the characteristic observed can not be measured numerically. [Pg.51]

Current methods for supervised pattern recognition are numerous. Typical linear methods are linear discriminant analysis (LDA) based on distance calculation, soft independent modeling of class analogy (SIMCA), which emphasizes similarities within a class, and PLS discriminant analysis (PLS-DA), which performs regression between spectra and class memberships. More advanced methods are based on nonlinear techniques, such as neural networks. Parametric versus nonparametric computations is a further distinction. In parametric techniques such as LDA, statistical parameters of normal sample distribution are used in the decision rules. Such restrictions do not influence nonparametric methods such as SIMCA, which perform more efficiently on NIR data collections. [Pg.398]

There are two main families of statistical tests parametric tests, which are based on the hypothesis that data are distributed according to a normal curve (on which the values in Student s table are based), and non-parametric tests, for more liberally distributed data (robust statistics). In analytical chemistry, large sets of data are often not available. Therefore, statistical tests must be applied with judgement and must not be abused. In chemistry, acceptable margins of precision are 10, 5 or 1%. Greater values than this can only be endorsed depending on the problem concerned. [Pg.391]

The most commonly employed univariate statistical methods are analysis of variance (ANOVA) and Student s r-test [8]. These methods are parametric, that is, they require that the populations studied be approximately normally distributed. Some non-parametric methods are also popular, as, f r example, Kruskal-Wallis ANOVA and Mann-Whitney s U-test [9]. A key feature of univariate statistical methods is that data are analysed one variable at a rime (OVAT). This means that any information contained in the relation between the variables is not included in the OVAT analysis. Univariate methods are the most commonly used methods, irrespective of the nature of the data. Thus, in a recent issue of the European Journal of Pharmacology (Vol. 137), 20 out of 23 research reports used multivariate measurement. However, all of them were analysed by univariate methods. [Pg.295]

Two non-parametric methods for hypothesis testing with PCA and PLS are cross-validation and the jackknife estimate of variance. Both methods are described in some detail in the sections describing the PCA and PLS algorithms. Cross-validation is used to assess the predictive property of a PCA or a PLS model. The distribution function of the cross-validation test-statistic cvd-sd under the null-hypothesis is not well known. However, for PLS, the distribution of cvd-sd has been empirically determined by computer simulation technique [24] for some particular types of experimental designs. In particular, the discriminant analysis (or ANOVA-like) PLS analysis has been investigated in some detail as well as the situation with Y one-dimensional. This simulation study is referred to for detailed information. However, some tables of the critical values of cvd-sd at the 5 % level are given in Appendix C. [Pg.312]

Quantitative studies by means of parametric statistical methods are, however, often very unreliable because of high environment-related variations very often amounting to several orders of magnitude [FORSTNER and WITTMANN, 1983 EINAX, 1990], In other words environmental data sets often contain values which are extremely high or low, i.e. they are outliers in the statistical sense. Also, because environmental data are often not normally distributed, the application of parametric statistical methods results in distorted reflections of reality. [Pg.341]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...