Inflation of the type I error

The problem with this so-called multiplicity or multiple testing arises when we make a claim on the basis of a positive result which has been generated simply because we have undertake lots of comparisons. Inflation of the type I error rate in this way is of great concern to the regulatory authorities they do not want to be registering treatments that do not work. It is necessary therefore to control this inflation. The majority of this chapter is concerned with ways in which the potential problem can be controlled, but firstly we will explore ways in which it can arise. [Pg.147]

Statistical Thinking for Non-Statisticians in Drug Regulation Richard Kay 2007 John Wiley Sons, Ltd ISBN 978-0-470-31971-0 [Pg.147]

In Chapter 10 we spoke extensively about the dangers of multiple testing and the associated inflation of the type I error. Methods were developed to control that inflation and account for multiplicity in an appropriate way. [Pg.213]

These comments are directed primarily at efficacy and do not tend to be applied to safety, unless a safety claim (e.g. drug A is less toxic than drug B) is to be made. With the routine evaluation of safety, where p-values are being used as a flag for potential concerns, we tend to be conservative and not worry about inflating the type I error. It is missing a real effect, the type II error, that concerns us more. [Pg.149]

A common concern for a group sequential trial utilizing repeated interim ANOVA analysis is an inflated chance of observing a spurious result leading to an inflated Type I (false positive) error rate. Table 31.1 (24) illustrates the impact on the Type I error rate (a) based on the number of interim analyses each controlled at a = 0.05. [Pg.821]

They concluded that the Type I error rate for EBE-based methods were near the nominal level (a = 0.05) under most cases. The Type I errors for the NL-based methods were near the nominal level in most cases, but were smaller under sparse data conditions and with small sample sizes. The LRT cases consistently inflated the Type I error rate and that, not surprisingly, the LRT was the most powerful of the methods examined. This latter result can be rationalized as thinking that the inflated Type I error rate acts as a constant to inflate statistical power at nonzero effect sizes. They concluded that the LRT was too liberal for sparse data, while the NL-based methods were too conservative, and that the EBE-based methods were the most reliable for covariate selection. [Pg.240]

In summary, the Type I error rate from using the LRT to test for the inclusion of a covariate in a model was inflated when the data were heteroscedastic and an inappropriate estimation method was used. Type I error rates with FOCE-I were in general near nominal values under most conditions studied and suggest that in most cases FOCE-I should be the estimation method of choice. In contrast, Type I error rates with FO-approximation and FOCE were very dependent on and sensitive to many factors, including number of samples per subject, number of subjects, and how the residual error was defined. The combination of high residual variability with sparse sampling was a particularly disastrous combination using... [Pg.271]

Although the Bergman-Hynen statistic provides a clever correction to some problems with the Box-Meyer statistic, it remains problematic in the face of multiple dispersion effects (see Brenneman and Nair, 2001 and McGrath and Lin, 2001). If factor j alone has a dispersion effect, the numerator and denominator of the statistic D H in (7) are unbiased estimators of the variances at the high and low levels of j. However, if several factors have dispersion effects, one has instead unbiased estimates of the average variances at these two levels, where the averaging includes the effects of all the other dispersion effects. This dependence of I)1-11 on additional dispersion effects can lead to inflated type I error probabilities and thus to spurious identification of dispersion effects. [Pg.34]

The issue of multiplicity is that when performing multiple statistical tests, the error probability associated with the inferences made is inflated. To see this, let us consider a simple situation where one is interested in performing two statistical tests on independent sets of data, each at a significance level of 0.05. Thus, the probability that each of the two tests will be declared significant erroneously (type I error) is 0.05. However, the probability that at least one of the two tests will be declared significant erroneously is 0.0975. The probability that at least one of the tests of interest will be declared significant erroneously is called the experiment-wise error rate. If we perform three 0.05 level tests, the experiment-wise error rate increases to 0.143. In practical terms, this means that if we perform multiple tests and make multiple inferences, each one at a reasonably low error... [Pg.336]

The important point to note here is that the a = 0.05 level is deemed appropriate when a single test is being conducted. Multiple comparisons, by definition, mean that more that one test is being conducted. When testing a number of pairwise comparisons - for example, after an ANOVA where the null hyqiothesis has been rejected - it is not acceptable to test each pairwise comparison at the a = 0.05 level because of the potential inflation of the overall type I error rate. [Pg.159]

The issue of type I error inflation caused by multiple testing appears in many guises in the realm of new drug development. This issue is of great importance to decision-makers, and we discuss this topic again later in the chapter. For now, we have not yet provided a full answer to our research question our description of analysis of variance is incomplete without a discussion of at least one analysis method that controls the overall type I error rate when evaluating pairwise comparisons from an ANOVA. [Pg.160]

A comparison of the power to determine DDI with theoretical and empirical critical values for the FO method is presented in Figure 12.6. As expected, the theoretical power to determine DDI is higher than the empirically determined power, given the inflated Type I error with the theoretical critical value. The empirical power is a more accurate representation of the true power. [Pg.320]

Due to the complex nature of population modeling, it can be difficult to foresee all potential outcomes and prespecify the corresponding strategies. If some unexpected outcomes that were not prespecified occur, it may be best to decide on a solution as close as possible to one that would have been prespecified. An alternative could be conducting several reasonable analyses and examining the robustness of the conclusions. This might be fine in some circumstances, however, the number of analyses could easily become impractical and arguably inflate Type I error. [Pg.429]

Z-tests are valid for large sample sizes, but are typically biased upwards, resulting in inflated Z-scores and Type I error rate, because of their underestimation in the variability of 3. Note that the covariance matrix C [Eq. (6.37)] uses an estimate of G and R in its calculation and that no correction in Z is made for the uncertainty in G and R. Rather than adjust for the inflation, most software packages account for the bias by comparing Z to a t-distribution, which has wider tails than a Z-distribution, and adjust the degrees of freedom accordingly. [Pg.189]

There is also the issue of Type I error rate, which is the rate at which a covariate is deemed statistically important when in fact it is not. Hypothesis testing a large number of models is referred to as multiplicity and results in an inflated Type I error rate. Because a large number of models are tested as some level of significance, usually p < 0.05, then 5% of those models will be selected as being an improvement over the comparator model based on chance alone. [Pg.237]

Laboratory tests may also show a high degree of correlation amongst each other. For example, aspartate aminotransferase (AST) is correlated with alanine aminotransferase (ALT) with a correlation coefficient of about 0.6 and total protein is correlated with albumin, also with a correlation coefficient of about 0.6. Caution needs to be exercised when two or more correlated laboratory values enter in the covariate model simultaneously because of the possible collinearity that may occur (Bonate, 1999). Like in the linear regression case, inclusion of correlated covariates may result in an unstable model leading to inflated standard errors and deflated Type I error rate. [Pg.274]

One factor that leads to low power or inflated Type II error risk is an inadequate sample size. As a convention. Type I error rates of 0.05 and implicit Type II error rates of 0.20 (power of 0.80) are adopted. However, in practice in many areas of investigation, one rarely has adequate power (i.e., power greater or equal to 0.80) and, therefore, the Type II error rate is much higher (Cohen, 1992). [Pg.63]

Inflation of Type I and Type II error rates, bias parameter estimates, and the degradation of the performance of confidence intervals are possible consequences of improperly addressing the issue of missingness in data analysis. Because a loss of... [Pg.259]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...