Outlier elimination

Because outlier elimination is something that is not to be taken lightly, the authors have decided to not provide on-line outlier deletion options in the programs. Instead, the user must first decide which points he regards as outliers, for example, by use of program HUBER, then start program DATA and use options (Edit) or (Delete Row), and finally create a modified data file with option (Save). This approach was chosen to reinforce GMP-like procedures and documentation. [Pg.61]

Whereas application of LMS regression (Eq. 9-5) results in considerable improvement of the rank correlation coefficients, the correlation coefficients obtained by RLS regression (with outlier elimination Eq. 9-6) are again markedly lower. [Pg.343]

Attempts to correlate analytical performance with other seemingly indicative laboratory characteristics, such as participation in proficiency testing schemes, regular use of certified RMs, number of years of experience and number of samples analysed per year were all equally unsuccessful. Therefore, in the absence of any simple and obvious means of identifying and preselecting only reliable laboratories as participants in certification studies, an investigation was undertaken of the validity of adopting the consensus mean (after outlier elimination) from an interlaboratory study as a certified value. [Pg.179]

QC validation by non-spectroscopists. However, this requires careful tailoring of models to articulate process and chemical information as well as close screening of training sets to insure outlier elimination. This form of validation without quantitation is directly applied to on-line data [1],... [Pg.642]

Since the data sets in industrial technical records usually have the higher noise/signal ratio, sometimes the elimination of outliers is necessary. In the field of data processing, the definition of outlier is a confused concept. Some authors defined all sample points deviated from linear relation as outliers. This is of course not suitable for the data processing of the nonlinear data sets. A more reasonable method for the outlier elimination of complicated data set is based on KNN method. If the class of a sample point is different from the class predicted by its nearest neighbors, it will be considered as an outlier. [Pg.277]

Another more reliable method of outlier elimination is based on SVM. If a sample point is misclassified in LOO cross-validation test by using several kinds of kernel functions, it can be eliminated to improve the classification. Figure 14.2 shows an example of the result of outlier elimination by using this method. In this example, a data file about the recovery of propylene in a petrochemical factory is used for the optimization of propylene production. The classification of samples of two classes becomes clear-cut after the elimination of the sample points misclassified in LOO cross-validation test with several kinds of kernel functions in computation. [Pg.277]

Fig. 14.2 Result of outlier elimination by support vector classification.

Each of the required three individual values for each nuclide was corrected with the factor that resulted from the weights of labelled and imlabelled spinach powder. The arithmetic mean and standard deviation for each laboratory was calculated. The data were visually inspected for outlier elimination. In addition, outliers were identified using Mandel s wilhin- and between-laboratory consistency test statistic k and A-values, respectively) and Grubbs I and II tests according to DIN ISO 5125-2 The outher-free data sets were used to calculate repeatability and reproducibility. Individual z-scores were used as a measure of performance characteristic of the participating laboratories. ... [Pg.164]

Ideally, the results should be validated somehow. One of the best methods for doing this is to make predictions for compounds known to be active that were not included in the training set. It is also desirable to eliminate compounds that are statistical outliers in the training set. Unfortunately, some studies, such as drug activity prediction, may not have enough known active compounds to make this step feasible. In this case, the estimated error in prediction should be increased accordingly. [Pg.248]

Since the 1993 court decision against Barr Laboratories, 5 tjjg elimination of outliers has taken on a decidedly legal aspect in the U.S. (any non-U.S. company that wishes to export pharmaceuticals or preciwsor products to the U.S. market must adhere to this decision concerning out-of-specifica-tion results, too) the relevant section states that ... An alternative means to invalidate an individual OOS result... is the (outlier test). The court placed specific restrictions on the use of this test. (1) Firms cannot frequently reject results on this basis, (2) The USP standards govern its use in specific areas, (3) The test cannot be used for chemical testing results. ... A footnote explicitly refers only to a content uniformity test, 5 but it appears that the rule must be similarly interpreted for all other forms of inherently precise physicochemical methods. For a possible interpretation, see Section 4.24. [Pg.61]

Example 53 If the standard deviation before elimination of the purported outlier is not much higher than the upper CLf method), as in the case = 0.358 < CL(/(0.3) 0.57 factor Chu/sx 1.9 for = 9, see program MSD), an outlier test should not even be considered both for avoiding fruitless discussions and reducing the risk of chance decisions, the hurdle should be set even higher, say at p < 0.01, so that CLu/sx > 2.5. [Pg.243]

Table 4.34. Content Uniformity of Dosage Form Resnits After Elimination of Three Outliers Are in Italics...

Plot analogous critical values for the mean/SD before resp. after elimination of points (dotted lines). Since the standard deviation will decrease on elimination of suspected outliers, the dotted sensitivity curve for after elimination will be higher than the one for before . Huber s k changes, too, but to a lesser degree. (See Fig. 1.1.)... [Pg.373]

Outliers are random errors in principle. However, they have to be eliminated because of their disproportionate deviation, so that the mean will not be misrepresented. [Pg.92]

The Mahalanobis Distance statistic provides a useful indication of the first type of extrapolation. For the calibration set, one sample will have a maximum Mahalanobis Distance, Z) ax. This is the most extreme sample in the calibration set, in that, it is the farthest from the center of the space defined by the spectral variables. If the Mahalanobis Distance for an unknown sample is greater than ZTax, then the estimate for the sample clearly represents an extrapolation of the model. Provided that outliers have been eliminated during the calibration, the distribution of Mahalanobis Distances should be representative of the calibration model, and ZEax can be used as an indication of extrapolation. [Pg.499]

The covariance matrix of measurement errors is a very useful statistical property. Indirect methods can deal with unsteady sampling data, but unfortunately they are very sensitive to outliers and the presence of one or two outliers can cause misleading results. This drawback can be eliminated by using robust approaches via M-estimators. The performance of the robust covariance estimator is better than that of the indirect methods when outliers are present in the data set. [Pg.214]

Figures 1 to 4 illustrate the results of the reconciliation for the four variables involved. As can be seen, this approach does not completely eliminate the influence of the outliers. For some of the variables, the prediction after reconciliation is actually deteriorated because of the presence of outliers in some of the other measurements. This is in agreement with the findings of Albuquerque and Biegler (1996), in the sense that the results of this approach can be very misleading if the gross error distribution is not well characterized.

As discussed before, the outliers generated by the heavy-tails of the underlying distribution have a considerable influence on the OLS problem arising in a conventional data reconciliation procedure. To solve this problem, a limiting transformation, which operates on the data set, is defined to eliminate or reduce the influence of outliers on the performance of a conventional rectification scheme. [Pg.231]

Number of laboratories retained after eliminating outliers Number of outliers (laboratories)... [Pg.97]

First calculate the residual standard deviation Sy Ai using all results than eliminate the potential outlier and calculate again Sy j... [Pg.191]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...