Univariate statistics

Box plots, also known as box and whisker plots, are commonly used to display univariate statistics for a given variable across another variable. The statistics typically displayed in a box plot are the minimum, first quartile, median, third quartile, and maximum values. Mean values are often included in box plots as well. The following is a sample box plot of a clinical response measure showing how three different drug therapies compare to one another. [Pg.203]

The overall objective of the system is to map from three types of numeric input process data into, generally, one to three root causes out of the possible 300. The data available include numeric information from sensors, product-specific numeric information such as molecular weight and area under peak from gel permeation chromatography (GPC) analysis of the product, and additional information from the GPC in the form of variances in expected shapes of traces. The plant also uses univariate statistical methods for data analysis of numeric product information. [Pg.91]

In this chapter, we provide a general overview of the field of chemometrics. Some historical remarks and relevant literature to this subject make the strong connection to statistics visible. First practical examples (Section 1.5) show typical problems related to chemometrics, and the methods applied will be discussed in detail in subsequent chapters. Basic information on univariate statistics (Section 1.6) might be helpful to understand the concept of randomness that is fundamental in statistics. This section is also useful for making first steps in R. [Pg.17]

Some tests that are widely used in univariate statistics are listed here together with hints for their use within R and the necessary requirements, but without any mathematical treatment. In multivariate statistics these tests are rarely applied to single variables but often to latent variables for instance, a discriminant variable can be defined via a two-sample f-test. [Pg.37]

Principal Component Analysis (PCA) is performed on a human monitoring data base to assess its ability to identify relationships between variables and to assess the overall quality of the data. The analysis uncovers two unusual events that led to further investigation of the data. One, unusually high levels of chlordane related compounds were observed at one specific collection site. Two, a programming error is uncovered. Both events had gone unnoticed after conventional univariate statistical techniques were applied. These results Illustrate the usefulness of PCA in the reduction of multi-dimensioned data bases to allow for the visual inspection of data in a two dimensional plot. [Pg.83]

Data have been collected since 1970 on the prevalence and levels of various chemicals in human adipose (fat) tissue. These data are stored on a mainframe computer and have undergone routine quality assurance/quality control checks using univariate statistical methods. Upon completion of the development of a new analysis file, multivariate statistical techniques are applied to the data. The purpose of this analysis is to determine the utility of pattern recognition techniques in assessing the quality of the data and its ability to assist in their interpretation. [Pg.83]

The initial multivariate analysis consisted of a principal component analysis on the raw data to determine if any obvious relationships were overlooked by univariate statistical analysis. The data base was reviewed and records containing missing data elements were deleted. The data was run through the Statistical Analysis System (SAS) procedure PRINCOMP and the results were evaluated. [Pg.85]

All residue levels greater than 1,0 were coded in the analysis file with an extra 0 between the decimal point and the first unit s place. For example 2.46 was recorded as 20.46. The limited number of such levels did not significantly affect previously computed univariate statistics and these artificial outliers remained undetected. Figure 5. presents a plot of the PRINCOMP output after the analysis file was corrected. This plot shows a more uniform distribution of data points for specimens collected in each of the three years. [Pg.90]

Random Functions and Regionalized Variables. In univariate statistics, an observation y. is defined as a realization of a random... [Pg.204]

The description of large data tables by the usual univariate statistics (mean, standard deviation, range,. ..) and by histograms is still used in recent literature. Comparison between categories is made by the use of category means and ran s. Sometimes, the correlation coefficients are considered. The discussion of the extracted information can be wide-ranging and difficult to understand immediately. [Pg.98]

The most commonly employed univariate statistical methods are analysis of variance (ANOVA) and Student s r-test [8]. These methods are parametric, that is, they require that the populations studied be approximately normally distributed. Some non-parametric methods are also popular, as, f r example, Kruskal-Wallis ANOVA and Mann-Whitney s U-test [9]. A key feature of univariate statistical methods is that data are analysed one variable at a rime (OVAT). This means that any information contained in the relation between the variables is not included in the OVAT analysis. Univariate methods are the most commonly used methods, irrespective of the nature of the data. Thus, in a recent issue of the European Journal of Pharmacology (Vol. 137), 20 out of 23 research reports used multivariate measurement. However, all of them were analysed by univariate methods. [Pg.295]

In other words, the application of univariate statistical methods to multivariate data often results in a considerable loss of information and, hence, a loss of power. This is because the assumptions on which the univariate analysis rely are seldom fulfilled (for example, independence between variables). [Pg.298]

The state of pollution of the soil can be more objectively described by means of geostatistical methods. The computation of semivariograms and the use of kriging uncover spatial structures which are not discernible by means of simple univariate statistical tests. [Pg.355]

As concerns the former, statistical tests on the measured data are usually adopted to detect any abnormal behavior. In other words, an industrial process is considered as a stochastic system and the measured data are considered as different realizations of the stochastic process. The distribution of the observations in normal operating conditions is different from those related to the faulty process. Early statistical approaches are based on univariate statistical techniques, i.e., the distribution of a monitored variable is taken into account. For instance, if the monitored variable follows a normal distribution, the parameters of interest are the mean and standard deviation that, in faulty conditions, may deviate from their nominal values. Therefore, fault diagnosis can be reformulated as the problem of detecting changes in the parameters of a stochastic variable [3, 30],... [Pg.123]

In addition to univariate statistical analysis, the data were also examined by means of multivariate statistical techniques. In particular, R-mode factor analysis was used, which is a very effective tool to interpret anomalies and to help identify their sources. Factor analysis allows grouping of anomalies by compatible geochemical associations from a geologic-mineralogical point of view, the presence of mineralizing processes, or processes connected to the surface environment. Based on this analysis, six meaningful chemical associations were identified (Fig. 15.8). [Pg.365]

Exploratory data analysis (EDA). This analysis, also called pretreatment of data , is essential to avoid wrong or obvious conclusions. The EDA objective is to obtain the maximum useful information from each piece of chemico-physical data because the perception and experience of a researcher cannot be sufficient to single out all the significant information. This step comprises descriptive univariate statistical algorithms (e.g. mean, normality assumption, skewness, kurtosis, variance, coefficient of variation), detection of outliers, cleansing of data matrix, measures of the analytical method quality (e.g. precision, sensibility, robustness, uncertainty, traceability) (Eurachem, 1998) and the use of basic algorithms such as box-and-whisker, stem-and-leaf, etc. [Pg.157]

Py-MS data analysis with univariate statistical techniques. [Pg.163]

Before discussing some applications, a few basic aspects on univariate statistics will be presented. A large amount of information exists regarding this field, and more details can be found in the original literature (e.g. [70,71]). Also a variety of computer packages performing statistical data analysis is available (e.g. [71a]). [Pg.164]

Several applications of univariate statistical analysis for data evaluation in Py-MS are known [73]. One such application is the evaluation of reproducibility of a replicate of an analysis for the peak intensity at a given m/z value. If a series of measurements are made on identical specimens, this will provide a sample xi, X2...Xn. This sample will allow the calculation of parameters such as the mean m and the standard deviation s. By comparing the value s for different m/z values it is possible to select those m/z that are more reproducible (smaller s). [Pg.167]

The univariate statistical theory is used, for example, for rejecting one extreme value in a set of scattered results in a given sample. For this purpose, the extreme value x is temporarily eliminated from the sample. Then, from the sample Xi, X2...Xn - Xe there are calculated m, s and the value ... [Pg.167]

In univariate statistics a key question discussed previously was to evaluate how close the values of p and m are for a certain population and an experimental m. The answer to this question is used as a model for significance tests. One main tool used to evaluate statistical data is the distribution function, which describes the distribution of measurements about their mean. In other words, the distribution function gives the... [Pg.170]

These are molecular descriptors calculated as univariate statistical indices on the scores of each individual principal component t m = 1,2,3). [Pg.494]

Univariate statistical techniques are not appropriate for multivariate structures. Repeated ANOVAs are not warranted and can even be misleading. [Pg.67]

One of the difficulties of ecosystem-level analysis has been our inability to accurately present the dynamics of these multidimensional relationships. Conventional univariate statistics are still prevalent, although the shortcomings of these methods are well known. Several researchers have proposed different methods of visualizing ecosystems and the risks associated with xenobiotic inputs. [Pg.376]

In much the same way as the more common univariate statistics assume a normal distribution of the variable under study, so the most widely used multivariate models are based on the assumption of a multivariate normal distribution for each population sampled. The multivariate normal distribution is a generalization of its univariate counterpart and its equation in matrix notation is... [Pg.21]

The book follows a rational presentation structure, starting with the fundamentals of univariate statistical techniques and a discussion on the implementation issues in Chapter 2. After stating the limitations of univariate techniques, Chapter 3 focuses on a number of multivariate statistical techniques that permit the evaluation of process performance and provide diagnostic insight. To exploit the information content of process measurements even further. Chapter 4 introduces several modeling strategies that are based on the utilization of input-output process data. Chapter 5 provides statistical process monitoring techniques for continuous processes and three case studies that demonstrate the techniques. [Pg.4]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...