Outliers diagnostics

R.D. Guenard, C.M. Wehlberg, R.J. Pell and D.M. Haaland, Importance of prediction outlier diagnostics in determining a successful inter-vendor multivariate calibration model transfer, Appl. Spectrosc., 61, 747 (2007). [Pg.436]

The classical KNN approach does not have outlier detection capabilities. That is, adassification is always made, whether or not the unknown is a member of any of the classes in the training set. In Section 4.3-1, the method presented indudes outlier diagnostics which are generally not present in commercial satistical software. [Pg.95]

FIGURE 10.6 Outlier diagnostics for five-factor PLS model for calibration of (a) moisture, (b) oil, (c) protein, and (d) starch. [Pg.220]

Outlier identification is best done with a diagnostic plot based on robust PCA (Section 3.7.3) classical PCA indicates only extreme outliers. [Pg.81]

Diagnostics can and should be done with robustly estimated PCs (Section 3.7.3). The reason is that both score and orthogonal distance are aimed at measuring outlyingness within and from the PCA space, but outliers themselves could spoil PCA if the PCs are not estimated in a robust way. [Pg.82]

For diagnostics it will be interesting to compute score distance and orthogonal distance for each object and to plot them together with critical boundaries this will allow distinguishing regular observations from outliers. The score distance SD, of object i is computed by... [Pg.92]

In literature the above diagnostic measures are known under different names. Instead of the score distance from Equation 3.27 which measures the deviation of each observation within the PCA space, often the Hotelling T2-test is considered. Using this test a confidence boundary can be constructed and objects falling outside this boundary can be considered as outliers in the PCA space. It can be shown that this concept is analogous to the concept of the score distance. Moreover, the score distances are in fact Mahalanobis distances within the PCA space. This is easily... [Pg.94]

Note that the approximation by the chi-square distribution is only possible for multivariate normally distributed data which somehow is in conflict if outliers are present that should be identified with this measure. We recommend that robust PCA is used whenever diagnostics is done because robust methods tolerate deviations from multivariate normal distribution. [Pg.95]

Outliers may heavily influence the result of PCA. Diagnostic plots help to find outliers (leverage points and orthogonal outliers) falling outside the hyper-ellipsoid which defines the PCA model. Essential is the use of robust methods that are tolerant against deviations from multivariate normal distributions. [Pg.114]

The remaining diagnostic plots shown in Figure 4.17 are the QQ-plot for checking the assumption of normal distribution of the residuals (upper right), the values of the y-variable (response) versus the fitted y values (lower left), and the residuals versus the fitted y values (lower right). The symbols + for outliers were used for the same objects as in the upper left plot. Thus it can be seen that the... [Pg.148]

Predicted vs. Known Concentration Plot (Model a7id Sample Diagnostic) The predicted versus known concentration plot shov m in Figure 5.114 displays little variafailiw off the ideal line. No unusual patterns are obsen ed nor are any outliers indicated. [Pg.163]

Cook s Distance Plot (Model Diagnostic) A statistic known as Cook s distance can be used to detect calibration data outliers by identilying which samples are most influential on the model. Now that the selected variables have been finalized, it is good practice to examine the calibration data for influential samples. These samples should be investigated and removed if it is determined that they have an unusual effect on the model. [Pg.313]

Statistical Prediction Error vs. Sample Number Plot (Sample Diagnostic) The statistical prediction errors for the validation data are shown in Figure 5-84. There are no samples which have an error that is unusual relative to the rest of the validation data. This further confirms the earlier conclusion that there are no outlier samples. The maximum of 0.029 will be used for assessing the reliability of prediction in Habit 6. [Pg.321]

Summary of Prediction Diagnostic Tools for PLS/PCP, Example 1 The predicted concentration of component A in 17 of the 20 samples were deemed reliable by the prediction diagnostics. Therefore, diese predictions are expected to be within 0.25 of the true value. Four samples were identified as unusual. The predicted concentration of component A in sample 13 was accepted despite being outside the range of the calibration because of the acceptable value. If predictions are consistently outside of the calibration range, it is prudent to consider expanding the range of the model. The predictions of the other tliree outlier samples were not considered to be reliable. [Pg.340]

Figure 4.24 Studentised residuals leverage plot to detect outlier samples on the calibration set (a) general rules displayed on a hypothetical case and (b) diagnostics for the data from the worked example (Figure 4.9, mean centred, four factors in the PLS model).

The use of QSARs for the final evaluation of the test data further decreases the risk for false positives and false negatives. This is because all data are analysed together, and thereby the random variation of individual test results are averaged out. Moreover, outliers and erroneous test results are identified by model diagnostics, which further stabilises the results. [Pg.214]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...