Descriptive Statistics Summarizing Data

Descriptive Statistics involves the presentation of summary statistics, which are concise yet meaningful summaries of large amounts of data. One category of descriptive statistics is the measurement of central tendency. [Pg.86]

One of the most commonly used measures of central tendency is the mean, more correctly (but rarely) called the arithmetic mean, a term that unambiguously distinguishes it from the geometric mean. While very informative in some circumstances, the geometric mean is less commonly used, and, in the absence of the prefix arithmetic or geometric, the default interpretation of the term mean is the arithmetic mean. This is the convention followed in the rest of this book. The mean of a set of data points is therefore defined as their sum divided by the total number of data points. [Pg.86]

Measures of central tendency provide an indication of the location of the data. For data measured on a scale of 1-100, a mean of 89 would suggest that the data are, in general, located closer to the top end of the scale than to the bottom end. [Pg.86]

Another common category of descriptive statistics is the measure of dispersion of a set of data about a central value. The range is the arithmetic difference between the greatest (maximum) and the least (minimum) value in a data set. While this characteristic is easily calculated and is useful in initial inspections of data sets, [Pg.86]

Descriptive statistics are used to summarize the general nature of a data set. As such, the parameters describing any single group of data have two components. One of these describes the location of the data, while the other gives a measure of the dispersion of the data in and about this location. Often overlooked is the fact that the choice of which parameters are used to give these pieces of information implies a particular type of distribution for the data. [Pg.871]

PK data The PK parameters of ABC4321 in plasma were determined by individual PK analyses. The individual and mean concentrations of ABC4321 in plasma were tabulated and plotted. PK variables were listed and summarized by treatment with descriptive statistics. An analysis of variance (ANOVA) including sequence, subject nested within sequence, period, and treatment effects, was performed on the ln-transformed parameters (except tmax). The mean square error was used to construct the 90% confidence interval for treatment ratios. The point estimates were calculated as a ratio of the antilog of the least square means. Pairwise comparisons to treatment A were made. Whole blood concentrations of XYZ1234 were not used to perform PK analyses. [Pg.712]

Descriptive statistics. A series of physical measurements can be described numerically. If for example, we have recorded the concentration of 1000 different samples in a research problem, it is not possible to provide the user with a table giving all 1000 results. In this case, it is normal to summarize the main trends. This can be done not only graphically, but also by considering the overall parameters such as mean and standard deviation, skewness etc. Specific values can be used to give an overall picture of a set of data. [Pg.323]

Descriptive statistics are used to summarize the general nature of a data set. As such, the parameters... [Pg.2480]

Descriptive statistics Used to summarize information and for the comparison of numbers in different sets of data mean, median, mode, range, variance, standard deviation are descriptive statistics. [Pg.266]

Table 2.3 summarizes the results for calculating the descriptive statistics for the spectrophotometric data given in Table 2.1. ... [Pg.24]

Statistics is a collection of methods of enquiry used to gather, process, or interpret quantitative data. The two main functions of Statistics are to describe and summarize data and to make inferences about a larger population of which the data are representative. These two areas are referred to as Descriptive and Inferential Statistics, respectively both areas have an important part to play in Data Mining. Descriptive Statistics provides a toolkit of methods for data summarization while Inferential Statistics is more concerned with data analysis. [Pg.84]

Many of the significance tests and other procedures summarized in this article (and many others) are very readily performed with the aid of Microsoft Excel , Minitab , and other widely available programs. Such software also gives instant access to the most important descriptive statistics (mean, median, standard deviation, s.e.m., confidence limits, etc.). In practice, the major problems are therefore (1) accurate entry of the experimental data into the program (this problem may not arise if an analytical instrument is directly interfaced to a PC) and (2) choice of the appropriate test once data entry has been successfully completed. Guidance on the latter... [Pg.568]

All those visualization tools which allow the exploration of uni- and oligo-variate data can be considered as instruments of descriptive statistics. Descriptive statistics is usually defined as a way to summarize/extract information out of one or a few variables compared to inferential statistics, whose aim is to assess the validity of a hypothesis made on measured data, descriptive statistics is merely explorative. In particular, some salient facts can be extracted about a variable ... [Pg.73]

In this section, we characterize size distributions and their properties by using examples based on a specific set of particle size data. The result of a careful size analysis might be a list of 1000 particle sizes. In some situations, keeping the data in this form may be desirable— for example, if the list is stored in a computer. In most situations, however, we would like to have a picture of how the particles are distributed among the various sizes and to be able to calculate several different kinds of statistics that describe the properties of the aerosol. For that purpose, a list of 1000 numbers is an awkward format, so it is necessary to resort to descriptive statistics to summarize the information. [Pg.32]

Quantile probability plots (QQ-plots) are useful data structure analysis tools originally proposed by Wilk and Gnanadesikan (1968). By means of probability plots they provide a clear summarization and palatable description of data. A variety of application instances have been shown by Gnanadesikan (1977). Durovic and Kovacevic (1995) have successfully implemented QQ-plots, combining them with some ideas from robust statistics (e.g., Huber, 1981) to make a robust Kalman filter. [Pg.229]

The use of the mean with either the SD or SEM implies, however, that we have reason to believe that the sample of data being summarized are from a population that is at least approximately normally distributed. If this is not the case, then we should rather use a set of statistical descriptions which do not require a normal distribution. These are the median, for location, and the semiquartile distance, for a measure of dispersion. These somewhat less familiar parameters are characterized as follows. [Pg.871]

Chapter 3 starts with the first and probably most important multivariate statistical method, with principal component analysis (PC A). PC A is mainly used for mapping or summarizing the data information. Many ideas presented in this chapter, like the selection of the number of principal components (PCs), or the robustification of PCA, apply in a similar way to other methods. Section 3.8 discusses briefly related methods for summarizing and mapping multivariate data. The interested reader may consult extended literature for a more detailed description of these methods. [Pg.18]

Nonsequential double and multiple ionization are to a large part classical phenomena. Indeed, the S-matrix approach suggests a pertinent classical limit. We have summarized evidence that the latter reproduces the fully quantum-mechanical results very well in parameter regions where this can be expected. Finally, we have extended such classical avenues to a statistical description of nonsequential triple and quadruple ionization. For neon, such a classical statistical model yields a fair description of the available data. While a more microscopic description of these extremely involved phenomena lies in the future, we believe that the simple models summarized in this paper will remain valuable as benchmark results. [Pg.90]

Sections 8.9 to 8.11 have given a brief description of methods for making a regression model for multivariate calibration. To summarize, MLR would rarely be used because it cannot be carried out when the number of predictor variables is greater than the number of specimens. Rather than select a few of the predictor variables, it is better to reduce their number to just a few by using PCR or PLS. These methods give satisfactory results when there is correlation between the predictor variables. The preferred method in a given situation will depend on the precise nature of the data an analysis can be carried out by each method and the results evaluated in order to find which method performs better. For example, for the data in Table 8.4 PCR performed better than PLS as measured by the PRESS statistic. [Pg.236]

The findings are completed with the second study, which focuses on statistics and real accident data from the viewpoint of an insurance company. A major database is provided by the Federation of Accident Insurance Instimtions in Finland (FAII). The database provides numerical background information from real accidents, supplemented with accident descriptions. This database includes all of the accidents in Finland which have been reported to the Finnish accident insurance institutions. Access to the database is primarily granted to insurance instimtions only. Statistics Finland hosts the open database of general statistics describing e.g. key numbers in Finnish industry. In this article, the referred accident cases are summarized from the (FAII 2014). [Pg.28]

National accident statistics and verbal accident descriptions were retrieved from the Finnish Federation of Accident Insurance Institution s (FAII) database. The FAII database allows restricted access to researchers. An open coding approach (Strauss and Corbin 1998) was used to summarize and synthesize the accident statistics and descriptions data. The analysis covered all accidents that occurred to road transportation sector employees somewhere other than in truck cabs, and which were reported to insurance companies in Finland in 2006. [Pg.101]

In previous studies (Kato et al., 2009a, 2011b 2011c), the stoichiometric compositions in (U,Pu)02 have been determined based on defect chemistry. The relationship between oxygen partial pressure and deviation x from stoichiometric composition has been analyzed in non-stoichiometric oxides. Kosuge (1993) used statistical thermodynamics considerations for description of non-stoichiometric compounds, and Karen (2006) reported a point-defect scheme for them. Recently their methods have been applied for nonstoichiometric (U,Pu)02, and experimental data, accurately measured in the near stoichiometric region, were analyzed as a function of temperature. In this report the measurement data and the measurement technique were reviewed and analysis results based on defect chemistry were summarized. [Pg.204]

The initial numerical data fitting for experimental data obtained with Smopex and Amberlyst catalysts revealed that good description can be achieved by setting a=l and estimating only four parameters (three rate constants feoi and P) dius improving the estimation statistics. The parameter errors were typically less than 8% and the degree of explanation was 99.8% and 99.5% for Smopex and Amberlyst, respectively. The parameter values and the estimation statistics are summarized in Table 11.6. [Pg.715]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...