Prevalence in Classified Datasets

The Cooper statistics do not consider the prevalence within the training set, which will introduce a bias in the ability of the model to predict one or other class. For example, if the training set has 75% actives relative to inactives, the null probability will be three times as likely to predict a compound as active rather than an inactive compound. Cohen defined the kappa index to overcome the problem of prevalence when assessing the significance of classification [Pg.255]

Sensitivity(true positive al(a + h) Fractions of actives correctly [Pg.256]

Positive predictivity al(a + c) Fraction of chemicals correctly assigned as active out all predicted actives [Pg.256]

Negative predictivity dl(h + d) Fraction of compounds correctly assigned as not-active out all predicted not-actives [Pg.256]

False positive (over cl(c + d) 1-specificity Fraction of not-actives falsely [Pg.256]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...