Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...

Articles Figures Tables About

The Significance and Validity of QSAR Regression Equations

In general, a regression equation can be accepted in QSAR studies, if the correlation coefficient r is around or better than 0.9 for in vitro data and 0.8 for whole animal data (as already discussed, its value depends not only on the quality of fit but also on the overall variance of the biological data compare eqs. 124—126, chapter 5.1), [Pg.95]

In addition, the biological data should cover a range of at least one, better two or even more logarithmic units they should be well distributed over the whole distance i.e., no clustering of activity values should occur, as discussed in chapter 2). Also the physicochemical parameters should be spread over a certain range and should be more or less evenly distributed if a certain parameter has identical values for all but one or two objects, then this parameter must be considered as a hidden indicator variable and should be replaced by such a term. In parabolic and especially in bilinear equations the nonlinear parameter should cover a range of at least two logarithmic units, in order to justify the presence of a nonlinear term. [Pg.96]

Sometimes a certain parameter (e.g. a nonlinear term) is only justified by a single activity value. Due to the most often small number of data points being included in a QSAR equation, the best way to deal with this problem is to present both regression equations, one including all variables, the other one eliminating this term and the corresponding activity value which led to its consideration. [Pg.99]

Cross-validation, in which objects are eliminated and only the excluded objects are predicted from the resulting model to check its stability and validity (see chapter 5.3 for a detailed description), seems to be a too crude instrument to (automatically) decide on the validity of a QSAR regression equation. Cross-validation may be applied to relatively large data sets. But if only few compounds are included in the QSAR equation, if a certain parameter is mainly based on a single data point, or if the compounds have been selected according to a rational design procedure, e.g. a D-optimal design (chapter 6), cross-validation may incorrectly indicate a lack of validity of the QSAR model. [Pg.99]

Outliers, i.e. data that cannot be explained by the model, constitute a serious problem in QSAR studies. Most often they are omitted from the data set without further comments, which is not a good practice. A lot of information might be derived from the careful inspection and consideration of the residuals of a multiple regression analysis (e.g. [574]) and of so-called outliers (e.g. [575, 576]). [Pg.99]


See other pages where The Significance and Validity of QSAR Regression Equations is mentioned: [Pg.95]    [Pg.95]    [Pg.99]   


SEARCH



QSAR

Regression equation

Regression significance

Regression validation

Regressive equation

Significance of regression

Significance of the regression

© 2024 chempedia.info