Univariate data outliers

This paper presents a method to decide the handling of seemingly Inconsistent data (outliers). The univariate and multivariate methods recommended are strongly based on statistics and the experience of the author In using them. [Pg.37]

The box plot has proved to be a popular graphical method for displaying and summarizing univariate data, to compare parallel batches of data, and to supplement more complex displays with univariate information. Its appeal is due to the simplicity of the graphical construction (based on quartiles) and the many features that it displays (location, spread, skewness, and potential outliers). Box plots are useful for summarizing distributions of treatment outcomes. A good example would be the comparison of the distribution of response to treatment at different dose levels or exposure (as measured by area under the plasma concentration-time curve) as in Figure 37.3. [Pg.931]

Many tests exist for detecting outliers in univariate data, but most are designed to check for the presence of a single rogue value. Univariate tests for outliers are not designed for multivariate outliers. Consider Figure 1.6, the majority of data exists in the highlighted pattern space with the exception of the two points denoted A and B. Neither of these points may be considered a univariate outlier in terms of variable x or x2, but both are well away from the main cluster of data. It is the combination of the two variables that identifies the presence of these outliers. Outlier detection and treatment is of major concern to analysts, particularly with multivariate data where the presence of outliers may not be immediately obvious from visual inspection of tabulated data. [Pg.15]

However, if the possible outlier xi = 15) is omitted, the median is barely changed to 5.00 while the mean x is shifted to 4.857 (Figure 8.3). Typically the median value is a more robust indicator of a typical value of a set of univariate data than is the mean, e.g., in comparisons of family incomes in two different countries if a small fraction of families can have very high incomes well removed from the vast majority this can lead to misleading conclusions based on the mean values. In analytical chemistry, the main value of comparisons exemplified by the fictional data in Figure 8.3 lies in their ability to highlight suspicious values x that should be examined as possible outliers whose exclusion can be justified by appropriate statistical tests (Section 8.2.7). [Pg.378]

All residue levels greater than 1,0 were coded in the analysis file with an extra 0 between the decimal point and the first unit s place. For example 2.46 was recorded as 20.46. The limited number of such levels did not significantly affect previously computed univariate statistics and these artificial outliers remained undetected. Figure 5. presents a plot of the PRINCOMP output after the analysis file was corrected. This plot shows a more uniform distribution of data points for specimens collected in each of the three years. [Pg.90]

Exploratory data analysis (EDA). This analysis, also called pretreatment of data , is essential to avoid wrong or obvious conclusions. The EDA objective is to obtain the maximum useful information from each piece of chemico-physical data because the perception and experience of a researcher cannot be sufficient to single out all the significant information. This step comprises descriptive univariate statistical algorithms (e.g. mean, normality assumption, skewness, kurtosis, variance, coefficient of variation), detection of outliers, cleansing of data matrix, measures of the analytical method quality (e.g. precision, sensibility, robustness, uncertainty, traceability) (Eurachem, 1998) and the use of basic algorithms such as box-and-whisker, stem-and-leaf, etc. [Pg.157]

Traditional univariate calibration techniques involve the use of a single instrumental measurement to determine a single analyte. In an ideal chemical measurement using high-precision instrumentation, an experimenter may obtain selective measurements linearly related to analyte concentration (Figure lA). However, univariate techniques are very sensitive to the presence of outlier points in the data used to fit a... [Pg.589]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...