To Determine Outliers

The accuracy of hydrate data has seldom been specified by experimentalists. In the following data, only a cursory effort has been made to exclude inaccurate data for simple hydrates. All three-phase data sets for simple hydrates were plotted (as logarithm pressure versus absolute temperature) to determine outliers. [Pg.358]

Leardi and co-workers > 9 use a GA to determine which of a set of features best explains a set of observations and to determine outliers in the data. Their SGA scheme was shown to outperform partial least squares. The tastiest aspect of this method was its application to the correlation of chemical composition with the age of provola cheese. [Pg.58]

By calculating the sum of the squares of the spectral residuals across all the wavelengths, an additional representative value can be generated for each spectrum. The spectral residual is effectively a measure of the amount of each spectrum left over in the secondary or noise vectors. This value is the basis of another type of discrimination method known as SIMCA (Refs. 13, 36). This is similar to performing an F test on the spectral residual to determine outliers in a training set (see Outlier Sample Detection in Chapter 4). In fact, one group combined the PCA-Mahalanobis distance method with SIMCA to provide a biparametric method of discriminant analysis (Ref. 41). In this method, both the Mahalanobis distance and the SIMCA test on the spectral residual had to pass in order for a sample to be classified as a match. [Pg.177]

If Q is greater than values from a table yielding Q values for 90% probability of difference, then the value may be removed from the data set (p<0.10). An example of how this test is used is given in Table 11.17a. In this case, the pKB value of 8.1 appears to be an outlier with respect to the other estimates made. The calculated Q is compared to a table of Q values for 90% confidence (Table 11.17b) to determine the confidence with which this value can be accepted into the data set. In the case shown in Table 8.17, Q<0.51. Therefore, there is <90% probability that the value is different. If this level of probability is acceptable to the experimenter, then the value should remain in the set. [Pg.252]

If one lives by the regulations and does not cast out outliers, then a comparison between the gel and the emulsion results shows that the standard deviations are comparable (F = 1.57), but the means are significantly different action is now called her to determine the cause for this discrepancy Manufacturing error, incomplete extraction, or interference by excipients ... [Pg.285]

After inspecting the tabular and graphic data, the operator is allowed to remove runs which appear to be outliers. Any run can be deleted or restored in any order, and the comparative statistics are recalculated with each operation. By comparing the standard deviation before and after deleting a run, the effect of that run can by determined. The editing process can continue indefinitely until the operator is satisfied with the validity of his results. [Pg.126]

We make five replicate measurements using an analytical method to calculate basic statistics regarding the method. Then we want to determine if a seemingly aberrant single result is indeed a statistical outlier. The five replicate measurements are 5.30%, 5.44%, 5.78%, 5.00%, and 5.30%. The result we are concerned with is 6.0%. Is this result an outlier To find out we first calculate the absolute values of the individual deviations ... [Pg.494]

Outlier detection methods, n - statistical tests which are conducted to determine if the analysis of a spectrum using a multivariate model represents an interpolation of the model. [Pg.511]

The radial velocities have been computed with the low resolution set-up (more spectral lines, no telluric line), using a cross-correlation technique. When excluding the seven outliers, the peak in centered at 83.0 0.4kms 1 with a dispersion of 1.9 0.2kms 1. Lithium abundance is being determined using Li i 6707.8 A. We used the B — V index to determined the ([3]), and the curve of growth from [7] to derive AT(Li). [Pg.155]

These outliers must be carefully considered to determine whether their high QTc value is due to chance or they have clinically silent long QT syndrome... [Pg.74]

It may at times appear that a single measurement (an outlier) is so different from the others that the analyst wonders if there was some determinate error that was not detected. In that case, a decision must be made as to whether this measurement should be "rejected," meaning not included in the calculation of the mean. This measurement should not be immediately rejected as being "bad" because, in the absence of a full investigation to determine a cause, it may, in fact, be legitimate. If a legitimate measurement is rejected, then a bias is introduced, and the mean, while assumed to be the correct answer, actually is flawed. There must be some criterion adopted for the rejection or retention of such data. [Pg.26]

For small data sets (n < 10), which are often encountered in chemical analysis, a simple method to determine if an outlier is rejectable is the Q test. In this test, a value for Q is calculated and compared to a table of Q values that represent a certain percentage of confidence that the proposed rejection is valid. If the calculated Q value is greater than the value from the table, then the suspect value is rejected and the mean recalculated. If the Q value is less than or equal to the value from the table, then the calculated mean is reported. Q is defined as follows ... [Pg.27]

GRUBBY TEST for rejection of an observation is applied in order to determine if one of the observations should be rejected as being an outlier. The following equation was used for the test ... [Pg.516]

Cross-validation of the DFA model is conducted by casewise deletion, reestimation of functions, and classification. In other words, for each observation in the data set, that observation is omitted and the discriminant functions are re-estimated using the full data set minus that observation. Then that observation is classified based on the re-estimated functions. The accuracy of the cross-validation can be used to evaluate the reliability of the DFA and the potential impact of group outliers. In essence, the cross-validation process is the same process used to determine provenance of an unsourced archaeological sample where the discriminant functions are developed independently of the sample and then used to determine its most likely source. [Pg.466]

In general, the data accuracy was surprisingly good. For example, while Deaton and Frost (1946, p. 13) specified that their pure ethane contained 2.1% propane and 0.8% methane, effects of those impurities may have counterbalanced each other those impurities were insufficient to cause the data to fall outside the line formed by other ethane data sets. On the other hand, the simple hydrate data of Hammerschmidt (1934) for propane and isobutane all appear to be outliers on such semilogarithmic plots, because they are at temperatures much too far above the upper quadruple (Q2) point. Obvious outlying data were excluded from this work less obvious outliers may be determined by inspection of the plots and subsequent numerical comparisons. The data, followed by the semilogarithmic plots... [Pg.358]

The ratio of this RESp to the RESTOt obtained from the calibration data (RESTOt> Equation 8.44), which can be called the residual ratio, can then be used to determine whether the sample is a potential outlier. Prediction samples for which this ratio is much greater than, say, three or four could be flagged as potential outliers. [Pg.284]

X-variables. This leads to the presence of model residuals (E in Equations 8.19 and 8.35). The residuals of the model can be used to indicate the nature of unmodeled information in the calibration data. For process analytical spectroscopy, plots of individual sample residuals versus wavelength ( residual spectra ) can be used to provide some insight regarding chemical or physical effects that are not accounted for in the model. In cases where a sample or variable outlier is suspected in the calibration data, inspection of that sample or variable s residual can be used to help determine whether the sample or variable should be removed from the calibration data. When a model is operating on-line, the X-residuals of prediction (see Equation 8.55) can be used to determine whether the sample being analyzed is appropriate for application to a quantitative model (see Section 8.4.3). In addition, however, one could also view the prediction residual vector ep as a profile (or residual spectrum ) in order to provide some insight into the nature of the prediction sample s inappropriateness. [Pg.302]

Outlier detection is of concern in certain areas of science. The aim is to spot samples that do not appear to conform to the structure of the training set used to determine the calibration model. If outlying samples are treated in the normal way, inaccurate concentrations may be predicted this is a con-... [Pg.26]

There is strong evidence for making the assumption that the increase in luminescence observed is caused by induction of a metabolite for most of the compounds tested. First, outliers in QSAR regressions can be used to determine the limits of applicability of a QSAR (Lipnick, 1991). If the biosensor response to all compounds other than naphthalene was a non-specific response with no relationship to biotransformation, then it would be expected that the value for naphthalene would be a clear outlier. Instead, the value for naphthalene is close to the predicted value, as shown in Figure 17.3. A dose-response behavior is indicative of a specific mechanism. [Pg.386]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...