Statistical outlier

Ideally, the results should be validated somehow. One of the best methods for doing this is to make predictions for compounds known to be active that were not included in the training set. It is also desirable to eliminate compounds that are statistical outliers in the training set. Unfortunately, some studies, such as drug activity prediction, may not have enough known active compounds to make this step feasible. In this case, the estimated error in prediction should be increased accordingly. [Pg.248]

A compound can be an extrapolation from a structural point of view, such as compound 24 that has a n-butyl substituent in the 3-position while the training set (compounds 1-17) only contain methyl-, ethyl- and n-propyl-substituents (among other non-alkyl substituents). Still, compound 24 is not considered to be an outlier from a statistical point of view. At the same time, compounds 18 and 19 which also contains substituents - in this case in the 5-position -that are not part of the training set are statistical outliers as well as structural outliers. [Pg.401]

We make five replicate measurements using an analytical method to calculate basic statistics regarding the method. Then we want to determine if a seemingly aberrant single result is indeed a statistical outlier. The five replicate measurements are 5.30%, 5.44%, 5.78%, 5.00%, and 5.30%. The result we are concerned with is 6.0%. Is this result an outlier To find out we first calculate the absolute values of the individual deviations ... [Pg.494]

We must not accept outliers in a calibration data set. We also can test this with an statistical outlier test. [Pg.191]

If our suspect value lies within this interval, we cannot reject it as an statistical outlier. [Pg.192]

My levels of PBDEs, it turned out, were smack in the middle of the American average, or 36 parts per billion. This was reassuring in a way, because I wasn t one of those unfortunate statistical outliers, those women and kids who have levels nearing 1,000, for reasons no one understands. But even if I were, it might be insignificant. We simply don t know what these levels mean for ourselves and our babies. And so far, very few Americans have been tested, probably fewer than 300. As one academic told me, it makes more sense to stress yourself out over things you can control, like trans fats and whether your kids car seats are properly installed. [Pg.190]

The full model can reproduce all the systematic features of the absorption curves in Fig. 2.5 [11]. In the present form, the model does not account for the interpatient variability observed in Fig. 2.4 or for the nonsystemahc phenomena discussed in connection with Fig. 2.5. Ideally, in our mechanism-based modeling approach, any form of unsystematic variation in the experimental results should be given a separate explanation, i.e. specific studies should be undertaken to explain the observed variability in absorption rates and binding capacities. Statistical outliers should be considered as potential sources of new information. [Pg.45]

Are there data points that appear to be statistical outliers Why are they outliers Should they be considered in the development of AEGL values or discarded because of faulty experimental technique ... [Pg.161]

The repeat standard deviation describes the scattering of the measuring results under repeat conditions (same laboratory, same equipment, same staff). Whereas, the between laboratory standard deviation expresses the differences between the laboratories. The reproduce standard deviation contains the two above mentioned scatter components. It is the deviation under reproduce conditions (different laboratories, different equipment, different staff). To get a unique repeat standard deviation it must be assumed that it does not vary (significantly) with the laboratory. For this reason the standard recommends a statistical outlier test (Cochran test) for the individual standard deviations of the laboratories. Furthermore, the individual laboratory means are a subject to an outlier test (Grubbs test). [Pg.461]

If averaging is employed, OOS results must be included along with within-specification findings, unless they may be discarded by an approved statistical outlier test (either approved in the compendia or in the IND). [Pg.597]

Data evaluation was carried out by the Tool4PT Cortez MERMAYDE, 2002-2004 software. The normality of average data was checked by the Kolmogorov-Smimov test. The statistical outliers were identified by the application of the Hampel test (test of averages) (Davies, 1988 Linsinger et al., 1998) and Cochran test (test of variances) at 95% of significance level. [Pg.359]

Reference dietary indices are based on an average population performing average duties. Increased physical activity or medical needs make these people statistical outliers beyond the requirements of the "average" healthy population group. [Pg.365]

Finally, let s conclude this chapter with an extremely important discussion of statistics and the value of data replication and confirmation. We all know that all statistics have an associated probability that goes along with them. Without going into a long discussion on the subject, what this means to us is that there is always a chance that a wrong conclusion may be drawn from a given data set. There is always a chance that we may obtain some bad data or even a statistical outlier in our final response data. This is especially true for small sets of data. [Pg.235]

In 1996, we reported that there were more than 1,000,000 outliers in the PDB [20]. These outliers reflect discrepancies with conventions, statistical outliers, and probable errors. At that time, that corresponded to an average of about 1 error per amino acid. Many protein crystallographers reacted to that report by saying that all these... [Pg.398]

Eirst, the extension primers for the SNP of interest were tested on seven replicates of individual genomic DNA samples to determine if the individual is heterozygous for the SNP. Before subsequent statistical analyses, obvious technical failures or statistical outliers were eliminated. Eor the individual who is heterozygous for the SNP, the ratio of the magnitude of the peak heights from the two nucleotides at the same sequencing position was calculated. The average ratio from the seven replicates was then derived. [Pg.36]

For assays performed on the cDNA, the fractional allelic experiment for each sample was also determined through seven replicates and obvious technical failures or statistical outliers were eliminated. The allelic ratio of expression based on the magnitudes of SNP peak heights were determined. Each individual s average allelic differential expression ratio value can be normalized based on the average allelic ratio derived from the same individual s genomic template. [Pg.37]

The second step provides for the elimination of read-in outliers, i.e. outliers that are not recognized in the first step. This is accomplished by means of a statistical outlier test. [Pg.156]

Assessment and identification of statistical outliers from the data set... [Pg.1996]

Descriptive statistics—mean, median, trimmed means, standard deviation and standard error, variance, minimum, maximum, range, interquartile range, skewness, kurtosis Frequency statistics—outlier identification boxplots, stem-and-leaf plots, and histograms Frequency statistics—description percentiles, probability plots, robust estimates or M-estimators, Kolmogorov-Smirnov and Shapiro-Wilk normality tests Variance homogeneity—Levene s test for equality of variance... [Pg.61]

Prior to each day s FTIR measurements, the ICV standard should be measured and a log kept of its daily interpolated concentration. If the ICV value becomes a statistical outlier, rerun the calibration. [Pg.544]

Compound pairs detected as informative activity cliffs often illustrate key chemical features for activity. These pairs, however, may also often be detected as apparent statistical outliers in quantitative SAR analysis methods [56], since the assumption of SAR continuity is fundamental for QSAR model building and affinity prediction. [Pg.210]

The above examples illustrate the importance of examining QSAR predictions for reasonableness before using them to establish regression equations or before substituting them in established regression equations to predict toxicity. QSAR data bases should be screened to eliminate values such as water solubilities of 1,000,000 mg L"T Regression equations should be examined for outliers and the chemical or biological basis for statistical outliers considered. [Pg.266]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...