Detection of Outliers in Measurements

The detection of outliers, particularly when working with a small number of samples, is discussed in the following papers. Efstathiou, G. Stochastic Galculation of Gritical Q-Test Values for the Detection of Outliers in Measurements, /. Chem. Educ. 1992, 69, 773-736. [Pg.102]

The deviation from mean (DFM) in well n is the relative deviation of the volume in one well from the mean volume over the plate. This value is obtained from precision OD or fluorescence measurements and is used for the detection of outliers and systematic errors. The DFM is defined as... [Pg.217]

The field of outlier detection and treatment is considerable, and a rigorous mathematical discussion is well beyond any treatment that is possible here. Moreover, the practice in the treatment of an2ilytical results is usually simplified, since the number of observations is often not very large. The two most common methods used by an2ilysts to detect outliers in measured data are versions of the Q-test (Refs. 1-3, 6) and Chauvenet s criterion (Refs. 4-6), both of which assume that the data are sampled from a population that is norm2Jly distributed. [Pg.1426]

Over the years an abundance of outlier tests have been proposed that have some theoretical rationale at their roots. ° Such tests have to be carefully adjusted to the problem at hand because otherwise one would either not detect true outliers (false negatives) in every case, or then throw out up to 50% of the good measurements as well (false positivesj. o Robust methods have been put forward to overcome this. Three tests will be described ... [Pg.58]

To construct the reference model, the interpretation system required routine process data collected over a period of several months. Cross-validation was applied to detect and remove outliers. Only data corresponding to normal process operations (that is, when top-grade product is made) were used in the model development. As stated earlier, the system ultimately involved two analysis approaches, both reduced-order models that capture dominant directions of variability in the data. A PLS analysis using two loadings explained about 60% of the variance in the measurements. A subsequent PCA analysis on the residuals showed that five principal components explain 90% of the residual variability. [Pg.85]

Outliers demand special attention in chemometrics for several different reasons. During model development, their extremeness often gives them an unduly high influence in the calculation of the calibration model. Therefore, if they represent erroneous readings, then they will add disproportionately more error to the calibration model. Furthermore, even if they represent informative information, it might be determined that this specific information is irrelevant to the problem. Outliers are also very important during model deployment, because they can be informative indicators of specific failures or abnormalities in the process being sampled, or in the measurement system itself. This use of outlier detection is discussed in the Model Deployment section (12.10), later in this chapter. [Pg.413]

The B score (Brideau et al., 2003) is a robust analog of the Z score after median polish it is more resistant to outliers and also more robust to row- and column-position related systematic errors (Table 14.1). The iterative median polish procedure followed by a smoothing algorithm over nearby plates is used to compute estimates for row and column (in addition to plate) effects that are subtracted from the measured value and then divided by the median absolute deviation (MAD) of the corrected measures to robustly standardize for the plate-to-plate variability of random noise. A similar approach uses a robust linear model to obtain robust estimates of row and column effects. After adjustment, the corrected measures are standardized by the scale estimate of the robust linear model fit to generate a Z statistic referred to as the R score (Wu, Liu, and Sui, 2008). In a related approach to detect and eliminate systematic position-dependent errors, the distribution of Z score-normalized data for each well position over a screening run or subset is fitted to a statistical model as a function of the plate the resulting trend is used to correct the data (Makarenkov et al., 2007). [Pg.249]

It should be noted that the error analysis methods using measurement models are sensitive to data outliers. Occasionally, outliers can be attributed to external influences. Most often, outliers appear near the line frequency and at the beginning of an impedance measurement. Data collected within 5 Hz of the line frequency and its first harmonic (e.g., 50 and 100 Hz in Europe or 60 and 120 Hz in the United States) should be deleted. Startup transients cause some systems to exhibit a detectable artifact at the first frequency measured. This point, too, should be deleted. [Pg.422]

By applying specially designed projection indices, the visual detection of clusters and outliers should be more evident than by using PCA. One of the most popular indices is entropy, which is a measure for the structure in the data. It can be calculated as follows ... [Pg.301]

Multivariate calibration methods offer several advantages over univariate calibration methods. Signal averaging is achieved, since more than one measurement channel is employed in the analysis. Also concentrations of multiple species may be measured if they are present in the calibration samples. A calibration model is built by using responses from calibration standards. The analysis of unknown samples will suffer if a species is present in the sample that is not accounted for in the calibration model.This is mitigated somewhat by the ability to detect whether a sample is an outlier from the calibration set. Multivariate calibration approaches permit selective quantitation of several analytes of interest in complex combinatorial libraries using low-resolution instruments when overlapping responses from different species preclude the use of univariate analysis. Quantitative... [Pg.100]

Certain plots and graphical presentations are frequently used in multivariate analysis and the most frequently used is perhaps the score plot. This is a two-dimensional scatter plot (or map) of scores for two specified components (PCs), in other words a two-dimensional version of Figure 9.32. The plot gives information about patterns in the samples. The score plot for PCI and PC2 may be especially useful because these two components. summarize more variation in the data than any other pair of components. One may look for groups of samples in the score plot and also detect outliers, which may be due to measurement error. In classification analysis the score plot will also show how well the model is able to separate between groups. An example is given in Section 10.4.3. [Pg.395]

By calculating the sum of the squares of the spectral residuals across all the wavelengths, an additional representative value can be generated for each spectrum. The spectral residual is effectively a measure of the amount of each spectrum left over in the secondary or noise vectors. This value is the basis of another type of discrimination method known as SIMCA (Refs. 13, 36). This is similar to performing an F test on the spectral residual to determine outliers in a training set (see Outlier Sample Detection in Chapter 4). In fact, one group combined the PCA-Mahalanobis distance method with SIMCA to provide a biparametric method of discriminant analysis (Ref. 41). In this method, both the Mahalanobis distance and the SIMCA test on the spectral residual had to pass in order for a sample to be classified as a match. [Pg.177]

The relationships among samples are revealed by their projections (scores) on the latent variables. This information is displayed in bivariate scoreplots. Similar samples group together in the score plots. The orthonormality between latent variable Wa vectors means that the distance between samples represents a quantitative measure of relatedness. Standardized scores can be used. In most instances, however, scores weighted in accordance with the size (proportional to ta ) of the latent variables are preferred. These are the scores used in Equation 6.1. Score plots are very useful for visual detection of atypical samples, that is, outliers. [Pg.149]

Outlier Detection. To obtain a good calibration model one has to remove outlying samples, that is, those which are extreme compared to the others in the calibration set. The outlying property may be due to interferences in the spectral data or to measurement error in the dependent variable. Interferences in spectral data extract additional latent variables, thus increasing the complexity and reducing the predictive ability of the calibration model. Such samples are outliers in the spectral data. On the other hand, some samples are well described by the calibration model but the predicted data are far away from the experimental value. These samples are outliers in the dependent variable. As mentioned above, score plots are very helpful in identifying outliers. [Pg.150]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...