Outliers, chemometrics

Robust SIMCA-bounding influence of outliers. Chemometrics and Intelligent Laboratory Systems. Vol. 87, pp. 95-103. ISSN 0169-7439... [Pg.36]

Daszykowski M, Kaezmarek K, Stanimirova I, Vander Heyden Y, Walczak B. Robust SIMCA—bounding influence of outliers. Chemometr Intell Lab Syst 2007 87 121-9. [Pg.354]

As explained in Section 33.2.1, one can prefer to consider each class separately and to perform outlier tests to decide whether a new object belongs to a certain class or not. The earliest approaches, introduced in chemometrics, were called SIMCA (soft independent modelling of class analogy) [27] and UNEQ [28]. [Pg.228]

A. Singh, Outliers and robust procedures in some chemometric applications. Chemom. Intell. Lab. Syst., 33 (1996) 75-100. [Pg.380]

Penrose R (1955) A generalized inverse for matrices. Proc Cambridge Phil Soc 51 406 Rousseeuw PJ, Leroy AM (1987) Robust regression and outlier detection. Wiley, New York Sachs L (1992) Angewandte Statistik. Springer, Berlin Heidelberg New York Sharaf MA, Illman DL, Kowalski BR (1986) Chemometrics. Wiley, New York... [Pg.200]

Outlier detection, in chemometrics, 6 56-57 Outokumpu flash smelting, 16 146 Outokumpu lead smelting process, 14 745 Outokumpu Oy process, selenium recovery via, 22 83... [Pg.659]

The t value is the number of standard deviations that the single value differs from the mean value. This t value is then compared to the critical t value obtained from a t-table, given a desired statistical confidence (i.e., 90%, 95%, or 99% confidence) and the number of degrees of freedom (typically iV-1), to assess whether the value is statistically different from the other values in the series. In chemometrics, the t test can be useful for evaluating outliers in data sets. [Pg.358]

Outliers demand special attention in chemometrics for several different reasons. During model development, their extremeness often gives them an unduly high influence in the calculation of the calibration model. Therefore, if they represent erroneous readings, then they will add disproportionately more error to the calibration model. Furthermore, even if they represent informative information, it might be determined that this specific information is irrelevant to the problem. Outliers are also very important during model deployment, because they can be informative indicators of specific failures or abnormalities in the process being sampled, or in the measurement system itself. This use of outlier detection is discussed in the Model Deployment section (12.10), later in this chapter. [Pg.413]

Once a chemometric model is built, and it is used to produce concentration or property values in real time from on-line analyzer profiles, the detection of outliers is a particularly critical task. This is the case for two reasons ... [Pg.283]

As a result, it is very important to evaluate process samples in real time for their appropriateness of use with the empirical model. Historically, this task has been often overlooked. This is very unfortunate not only because it is relatively easy to do, but also because it can effectively prevent the misuse of quantitative results obtained from a multivariate model. I would go so far as to say that it is irresponsible to implement a chemometric model without prediction outlier detection. [Pg.283]

Another type of classification is outlier selection or contamination identification. As an example, in Fig. 4.23(b), the butter is the desired material and bacteria the contamination. An arbitrary threshold for this image would be 0.02, in which all pixels >0.02 are considered suspect, and hopefully, because this is a food product, decontamination procedures are pursued. In these two examples of classification, only arbitrary thresholds have been defined and, as such, confidence in these classifications is lacking. This confidence can be achieved through statistical methods. Although this chapter is not the appropriate place for an involved discussion of application of statistics toward data analysis, we will give one example often used in chemometric classification. [Pg.108]

The remaining chapters of the book introduce some of the advanced topics of chemometrics. The coverage is fairly comprehensive, in that these chapters cover some of the most important advanced topics. Chapter 6 presents the concept of robust multivariate methods. Robust methods are insensitive to the presence of outliers. Most of the methods described in Chapter 6 can tolerate data sets contaminated with up to 50% outliers without detrimental effects. Descriptions of algorithms and examples are provided for robust estimators of the multivariate normal distribution, robust PC A, and robust multivariate calibration, including robust PLS. As such, Chapter 6 provides an excellent follow-up to Chapters 3, 4, and 5. [Pg.4]

The classical PCA is non-robust and sensitive to deviations of error distribution from the normal assumption, the PC directions being influenced by the presence of outlier(s). In PP PCA, the PC directions are determinated by the the inherent structure of the main body of the data. Using some robust projective index, the influence of the outliers is thus substantially reduced. The distorted appearance or misrepresentation of the projected data structure in the PC subspace caused by the presence of outlier(s) could be eliminated in PP PCA. This characteristic feature of PP PCA is essential for obtaining reliable results for exploratory data analysis, calibration and resolution in analytical chemometrics where PCA is used for dimension reduction. [Pg.71]

Fig. 5. An example of a scores plot as one might obtain in a principal components analysis. Distinct clustering or grouping of NMR spectra is observed in this type of plot, where the discrimination results from the analyzed metric used (e.g., principal components). The distance between samples (r ) within groups is used by many supervised methods to further describe and improve class or group separation. There are different chemometric techniques that can be used to identify outliers, or to provide a group assignment.

Once the data are prepared, they can be explored chemometrically with techniques as PCA, rPCA, PP, and clustering. These enable visualization of the structure of the data set more specifically, they detect outliers and group similar samples. For several applications, it was confirmed that this approach outperforms the visual comparison of electropherograms. Chemometric techniques can also be apphed to classify samples based on their CE profile. When the classes in the data set are a priori known, supervised classification techniques as EDA, QDA, kNN, CART, PLSDA, SIMCA, and SVM can be used. The choice of techniques will often depend on the preference of the analyst and the complexity of the data. However, when nonlinear classification problems occur, a more complex technique as, for instance, SVM, will be outper-... [Pg.318]

The search for a linear correlation between log k and solute descriptors (Eq. 15.3) allows one to establish in a qualitative and quantitative manner which intermolecular forces govern the phenomenon under investigation. Building QSRR thus demands the use of refined chemometric tools for variable selection, criteria to detect and eliminate outliers, and, ultimately, data validation procedures. [Pg.347]

Riu J, Bro R, Jack-knife for estimation of standard errors and outlier detection in PARAFAC models, Chemometrics and Intelligent Laboratory Systems, 2002, 65, 35—49. [Pg.364]

R. Leardi, Application of a Genetic Algorithm to Feature Selection Under Full Validation Conditions and to Outlier Detection, Journal of Chemometrics, 8 (1994). 65-79. [Pg.349]

Mertens, B. Thompson, M. Fearn, T. (1994). Principal component outlier detection and SIMCA a synthesis. Analyst. Vol. 119, pp. 2777-2784. ISSN 0003-2654 Miller, J.N. Miller, J.C. (2005). Statistics and Chemometrics for Analytical Chemistry. 4 edition. Prentice-Hall, Pearson. ISBN 0131291920. Harlow, UK Naes, T. Isaksson, T. Fearn, T. Davies, T. (2004). A user-friendly guide to multivariate calibration and classification. NIR Publications, ISBN 0952866625, Chichester, UK Pardo, M. Sberveglieri, G. (2005). Classification of electronic nose data with support vector machines. Sensors and Actuators. Vol. 107, pp. 730-737. ISSN 0925-4005 Pretsch, E. Wilkins, C.L. (2006). Use and abuse of Chemometrics. Trends in Analytical Chemistry. Vol. 25, p. 1045. ISSN 0165-9936... [Pg.38]

Exploratory data analysis aims to extract important information, detect outliers and identify relationships between samples and its use is recommended prior to the application of other chemometric techniques. Examples of the use of exploratory data analysis tools applied to separations data include principal component analyisis (PCA) (de la Mata-Espinosa et al., 2011a Ruiz-Samblas et al., 2011) and factor analysis (Stanimirova et al., 2011). [Pg.319]

J.-H. Wang, J.-H. Jiang, and R.-Q. Yu, Chem. Intel Lah. Syst., 34, 109 (1996). Robust Backpropagation Algorithm as a Chemometric Tool to Prevent Overfitting to Outliers. [Pg.138]

Macho, S., et al.. Outlier Detection in the Ethylene Content Determination in Propylene Copolymer by Near-Infrared Spectroscopy and Multivariate Calibration. Appl. Spectrosc., 2001. 55 1532-1536. Furukawa, T, et al.. Discrimination of Various Poly(Propylene) Copolymers and Prediction of Their Ethylene Content by Near-Infrared and Raman Spectroscopy in Combination with Chemometric... [Pg.564]

Pattern recognition is to unravel patterns in the data. Although patterns are perceived automatically, the process is difficult to define A pattern is a natural or chance configuration, reliable sample of traits, tendencies, or other observable characteristics of data. In chemometrics, patterns are usually simplified to groupings (clusters) and outliers. [Pg.143]

PLSR is an extension of the multiple linear regression model. It is probably the least restrictive of the various multivariate extensions of the multiple linear regression model. This flexibility allows it to be used in situations where the use of traditional multivariate methods is severely limited, such as the case that when there are fewer observations than predictor variables. Furthermore, PLSR can be used as an exploratory analysis tool to select suitable predictor variables and to identify outliers before classical linear regression. Especially in chemometrics, PLSR has become a standard tool for modeling linear relationships between multivariate measurements. [Pg.194]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...