Data analyses descriptive statistics

Statistics is a collection of methods of enquiry used to gather, process, or interpret quantitative data. The two main functions of Statistics are to describe and summarize data and to make inferences about a larger population of which the data are representative. These two areas are referred to as Descriptive and Inferential Statistics, respectively both areas have an important part to play in Data Mining. Descriptive Statistics provides a toolkit of methods for data summarization while Inferential Statistics is more concerned with data analysis. [Pg.84]

The bottleneck in utilizing Raman shifted rapidly from data acquisition to data interpretation. Visual differentiation works well when polymorph spectra are dramatically different or when reference samples are available for comparison, but is poorly suited for automation, for spectrally similar polymorphs, or when the form was previously unknown [231]. Spectral match techniques, such as are used in spectral libraries, help with automation, but can have trouble when the reference library is too small. Easily automated clustering techniques, such as hierarchical cluster analysis (HCA) or PCA, group similar spectra and provide information on the degree of similarity within each group [223,230]. The techniques operate best on large data sets. As an alternative, researchers at Pfizer tested several different analysis of variance (ANOVA) techniques, along with descriptive statistics, to identify different polymorphs from measurements of Raman... [Pg.225]

The number of subjects per cohort needed for the initial study depends on several factors. If a well established pharmacodynamic measurement is to be used as an endpoint, it should be possible to calculate the number required to demonstrate significant differences from placebo by means of a power calculation based on variances in a previous study using this technique. However, analysis of the study is often limited to descriptive statistics such as mean and standard deviation, or even just recording the number of reports of a particular symptom, so that a formal power calculation is often inappropriate. There must be a balance between the minimum number on which it is reasonable to base decisions about dose escalation and the number of individuals it is reasonable to expose to a NME for the first time. To take the extremes, it is unwise to make decisions about tolerability and pharmacokinetics based on data from one or two subjects, although there are advocates of such a minimalist approach. Conversely, it is not justifiable to administer a single dose level to, say, 50 subjects at this early stage of ED. There is no simple answer to this, but in general the number lies between 6 and 20 subjects. [Pg.168]

Classification, or the division of data into groups, methods can be broadly of two types supervised and unsupervised. The primary difference is that prior information about classes into which the data fall is known and representative samples from these classes are available for supervised methods. The supervised and unsupervised approaches loosely lend themselves into problems that have prior hypotheses and those in which discovery of the classes of data may be needed, respectively. The division is purely for organization purposes in many applications, a combination of both methods can be very powerful. In general, biomedical data analysis will require multiple spectral features and will have stochastic variations. Hence, the field of statistical pattern recognition [88] is of primary importance and we use the term recognition with our learning and classification method descriptions below. [Pg.191]

This chapter constitutes an attempt to demonstrate the utility of multivariate statistics in several stages of the scientific process. As a provocation, it is suggested that the multivariate approach (in experimental design, in data description and in data analysis) will always be more informative and make generalizations more valid than the univariate approach. Finally, the multivariate strategy can be really enjoyable, not the least for its capacity to reveal hidden treasures in data that in a univariate analysis look like a set of random numbers. [Pg.323]

The use of statistical tests to analyze and quantify the significance of sample data is widespread in the study of biological systems where precise physical models are not readily available. Statistical tests are used in conjunction with measured data as an aid to understanding the significance of a result. Their aid in data analysis fills a need to answer the question of whether or not the inferences drawn from the data set are probable and statistically relevant. The statistical tests go further than a mere qualitative description of relevance. They are designed to provide a quantitative number for the probability that the stated hypothesis about the data is either true or false. In addition, they allow for the assessment of whether there are enough data to make a reasonable assumption about the system. [Pg.151]

Exploratory data analysis (EDA). This analysis, also called pretreatment of data , is essential to avoid wrong or obvious conclusions. The EDA objective is to obtain the maximum useful information from each piece of chemico-physical data because the perception and experience of a researcher cannot be sufficient to single out all the significant information. This step comprises descriptive univariate statistical algorithms (e.g. mean, normality assumption, skewness, kurtosis, variance, coefficient of variation), detection of outliers, cleansing of data matrix, measures of the analytical method quality (e.g. precision, sensibility, robustness, uncertainty, traceability) (Eurachem, 1998) and the use of basic algorithms such as box-and-whisker, stem-and-leaf, etc. [Pg.157]

To use the Data Analysis tool, enter the data as above and then proceed through the menus Tools then Data Analysis, then select Descriptive Statistics. In the box labelled Input Range , enter A2 B11 and tick the box for Summary statistics . The mean, median and standard deviation will be shown for both data sets, but you will probably need to widen the columns to make the output clear. [Pg.24]

However, computers are quicker (and more reliable). Excel does not offer the SEM as a standard worksheet function, but it is included in the output from the Data Analysis tool (see Chapter 2). Both Minitab and SPSS include it in their Descriptive Statistics routines. [Pg.46]

Plasma EE Descriptive statistics and comparison of plasma EE concentrations in cycles 1 and 2. Analysis of variance on log-transformed data, 90 % confidence intervals for AUC ratio of EE + Drug XYZ and EE alone (AUCEE+Drug XYz/AUCee)-... [Pg.678]

PK data The PK parameters of ABC4321 in plasma were determined by individual PK analyses. The individual and mean concentrations of ABC4321 in plasma were tabulated and plotted. PK variables were listed and summarized by treatment with descriptive statistics. An analysis of variance (ANOVA) including sequence, subject nested within sequence, period, and treatment effects, was performed on the ln-transformed parameters (except tmax). The mean square error was used to construct the 90% confidence interval for treatment ratios. The point estimates were calculated as a ratio of the antilog of the least square means. Pairwise comparisons to treatment A were made. Whole blood concentrations of XYZ1234 were not used to perform PK analyses. [Pg.712]

When compounds are selected according to SMD, this necessitates the adequate description of their structures by means of quantitative variables, "structure descriptors". This description can then be used after the compound selection, synthesis, and biological testing to formulate quantitative models between structural variation and activity variation, so called Quantitative Structure Activity Relationships (QSARs). For extensive reviews, see references 3 and 4. With multiple structure descriptors and multiple biological activity variables (responses), these models are necessarily multivariate (M-QSAR) in their nature, making the Partial Least Squares Projections to Latent Structures (PLS) approach suitable for the data analysis. PLS is a statistical method, which relates a multivariate descriptor data set (X) to a multivariate response data set Y. PLS is well described elsewhere and will not be described any further here [42, 43]. [Pg.214]

Options should include descriptive statistics and exploratory data analysis techniques. [Pg.315]

The critical analytical data should be tabulated and analyzed in terms of descriptive statistics (mean, coefficient of variation, extrema), control charts, and trend analysis [17]. If the data of several years are included, yearly means may be... [Pg.398]

Each measure of an analysed variable, or variate, may be considered independent. By summing elements of each column vector the mean and standard deviation for each variate can be calculated (Table 7). Although these operations reduce the size of the data set to a smaller set of descriptive statistics, much relevant information can be lost. When performing any multivariate data analysis it is important that the variates are not considered in isolation but are combined to provide as complete a description of the total system as possible. Interaction between variables can be as important as the individual mean values and the distributions of the individual variates. Variables which exhibit no interaction are said to be statistically independent, as a change in the value in one variable cannot be predicted by a change in another measured variable. In many cases in analytical science the variates are not statistically independent, and some measure of their interaction is required in order to interpret the data and characterize the samples. The degree or extent of this interaction between variables can be estimated by calculating their covariances, the subject of the next section. [Pg.16]

Descriptive statistics were used to profile and characterize the articles within each data field abstracted by the reviewers, including the type of clinical service performed, the site of the study or evaluation, and the type of analysis performed. [Pg.302]

Spreadsheet Summary In Chapter 2 oi Applications of Microsoft Excel in Analytical Chemistry, we introduce the use of Excel s Analysis ToolPak to compute the mean, standard deviation, and other quantities. In addition, the Descriptive Statistics package finds the standard error of the mean, the median, the range, the maximum and minimum values, and parameters that reflect the symmetry of the data set. [Pg.123]

For t descriptive statistics should be given. If t jj is to be subjected to a statistical analysis this should be based on non-parametric methods and should be applied to untransformed data. A sufficient number of samples around predicted maximal concentrations should have been taken to improve the accuracy of the t jj estimate. For parameters describing the elimination phase (Tj/j) only descriptive statistics should be given. [Pg.370]

The heights of the bars or columns usually represent the mean values for the various groups, and the T-shaped extension denotes the standard deviation (SD), or more commonly, the standard error of the mean (discussed in more detail in Section 7.3.2.3). Especially if the standard error of the mean is presented, this type of graph tells us very litde about the data - the only descriptive statistic is the mean. In contrast, consider the box and whisker plot (Figure 7.2) which was first presented in Tukey s book Exploratory Data Analysis. The ends of the whiskers are the maximum and minimum values. The horizontal line within the central box is the median, fhe value above and below which 50% of the individual values lie. The upper limit of the box is the upper or third quartile, the value above which 25% and below which 75% of fhe individual values lie. Finally, the lower limit of the box is the lower or first quartile, the values above which 75% and below which 25% of individual values lie. For descriptive purposes this graphical presentation is very informative in giving information about the distribution of the data. [Pg.365]

The use of RDF descriptors suggests a way to calculate a single value that describes the diversity of a data set by means of descriptive statistics. A series of statistical algorithms is available for evaluating the similarity of larger data sets. By statistical analysis, the diversity — or similarity — of two data sets can be characterized. Two methods are straightforward. [Pg.194]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...