Outliers prediction

The ratio of this RESp to the RESTOt obtained from the calibration data (RESTOt> Equation 8.44), which can be called the residual ratio, can then be used to determine whether the sample is a potential outlier. Prediction samples for which this ratio is much greater than, say, three or four could be flagged as potential outliers. [Pg.284]

Of the six exposures sampled with their combined 42 soil units, only one sample appears to be significantly misplaced. It is found in Unit IV with a pattern 2a rather than a pattern 3, and is one of the two possible outliers predicted from the expected REE pattern placement in Table III. [Pg.93]

Ideally, the results should be validated somehow. One of the best methods for doing this is to make predictions for compounds known to be active that were not included in the training set. It is also desirable to eliminate compounds that are statistical outliers in the training set. Unfortunately, some studies, such as drug activity prediction, may not have enough known active compounds to make this step feasible. In this case, the estimated error in prediction should be increased accordingly. [Pg.248]

This experiment uses the change in the mass of a U.S. penny to create data sets with outliers. Students are given a sample of ten pennies, nine of which are from one population. The Q-test is used to verify that the outlier can be rejected. Glass data from each of the two populations of pennies are pooled and compared with results predicted for a normal distribution. [Pg.97]

These data show that both models identify important variables that affect 5 Obody w.ier and 8 Ophospha in mammals. Both serve to identify the dikdik as an outlier which may be explained by their sedentary daytime pattern. On the other hand, the body-size model (Bryant and Froelich 1995), which may reliably predict animal 5 0 in temperate, well-watered regions, does not predict 8 Opho,phaw in these desert-adapted species. The second model (Kohn 1996), by emphasizing animal physiology independent of body size, serves to identify species with different sensitivities to climatic parameters. This, in conjunction with considerations of behavior, indicate that certain species are probably not useful for monitoring paleotemperature because their 5 Obodyw er is not tied, in a consistent way, to The oryx, for example, can... [Pg.135]

Figures 1 to 4 illustrate the results of the reconciliation for the four variables involved. As can be seen, this approach does not completely eliminate the influence of the outliers. For some of the variables, the prediction after reconciliation is actually deteriorated because of the presence of outliers in some of the other measurements. This is in agreement with the findings of Albuquerque and Biegler (1996), in the sense that the results of this approach can be very misleading if the gross error distribution is not well characterized.

Both assumptions are mainly needed for constructing confidence intervals and tests for the regression parameters, as well as for prediction intervals for new observations in x. The assumption of normal distribution additionally helps avoid skewness and outliers, mean 0 guarantees a linear relationship. The constant variance, also called homoscedasticity, is also needed for inference (confidence intervals and tests). This assumption would be violated if the variance of y (which is equal to the residual variance a2, see below) is dependent on the value of x, a situation called heteroscedasticity, see Figure 4.8. [Pg.135]

Outliers or inhomogeneous data can affect traditional regression methods, hereby leading to models with poor prediction quality. Robust methods, like robust regression (Section 4.4) or robust PLS (Section 4.7.7), internally downweight outliers but give full weight to objects that support the (linear) model. Note that to all methods discussed in this chapter robust versions have been proposed in the literature. [Pg.203]

From the best model, that is, Eq. (42), the r2 and standard deviation between the predicted and experimental log BB for the 55 training set compounds were 0.790 and 0.35, respectively. However for the 13 test set compounds, the corresponding values were 0.419 and 0.60. Excluding two outliers, the r2 and standard deviation were improved to 0.838 and 0.30, respectively. [Pg.527]

The use of and Q prediction outlier metrics as described above is an example of a model-specific health monitor , in that the metrics refer to the specific analyzer response space that was used to develop a PLS, PCR or PCA prediction model. However, many PAT applications involve the deployment of multiple prediction models on a single analyzer. In such cases, one can also develop an analyzer-specific health monitor, where the and Q outlier metrics refer to a wider response space that covers all normal analyzer operation. This would typically be done by building a separate PCA model using a set of data that covers all normal analyzer responses. Of course, one could extend this concept further, and deploy multiple PCA health monitor models that are designed to detect different specific abnormal conditions. [Pg.431]

R.D. Guenard, C.M. Wehlberg, R.J. Pell and D.M. Haaland, Importance of prediction outlier diagnostics in determining a successful inter-vendor multivariate calibration model transfer, Appl. Spectrosc., 61, 747 (2007). [Pg.436]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...