Regression proper application

Some problems related to the proper application of regression analysis and of other multivariate statistical methods in QSAR studies and concerning the validity of the obtained results have recently been reviewed [403, 408, 409] (compare chapter 4.1). [Pg.99]

The statistical analysis of data requires a proper design of experiments to prove or disprove a certain hypothesis which has been formulated in advance. From the viewpoint of a puritanical statistician most QSAR analyses are forbidden , because they are retrospective studies and, in addition, many different hypotheses (i.e. combinations of independent variables) are tested sequentially. Indeed, many problems arise from the application of regression analysis in ill-conditioned data sets. Only in later stages of lead structure optimization are certain hypotheses, e.g. on the influence of more lipophilic, electronegative, polar, or bulky substituents in a certain position, systematically tested, now fulfilling the requirements for the proper application of statistical methods. [Pg.109]

From a theoretical point of view, the proper application of regression analysis requires the formulation of a working hypothesis, the design of experiments (i.e., compounds to be tested), the selection of a mathematical model, and the test of statistical significance of the obtained result. In QSAR studies, this is pure theory. Reality is different QSAR studies are most often retrospective studies and in several cases many different variables are tested to find out whether some of them, alone or in combination, are able to describe the data. In principle, there are no objections against this method because QSAR equations should be used to derive new hypotheses and to design new experiments, based on these hypotheses. Then the requirements for the application of statistical methods are fulfilled. [Pg.2317]

There are two statistical assumptions made regarding the valid application of mathematical models used to describe data. The first assumption is that row and column effects are additive. The first assumption is met by the nature of the smdy design, since the regression is a series of X, Y pairs distributed through time. The second assumption is that residuals are independent, random variables, and that they are normally distributed about the mean. Based on the literature, the second assumption is typically ignored when researchers apply equations to describe data. Rather, the correlation coefficient (r) is typically used to determine goodness of fit. However, this approach is not valid for determining whether the function or model properly described the data. [Pg.880]

A disadvantage is that multiple regression, by definition, only allows application of the CA concept there is no possibility to compare the response with the IA concept. In addition, the researcher is limited to using 1 type of concentration-response curve for the complete data set through the choice of the link function. It may, however, be more appropriate to use different types of concentration-response curves for the different mixture components. Finally, deviations from CA can be properly tested for through the interaction parameters, but concentration-ratio- or concentration-level-dependent deviations from CA cannot be detected. [Pg.138]

Testing is necessary to determine whether the change works properly and has not compromised the system s functionality. The scope of testing should be based on the impact analysis. Where potential impact on other system functionality or other applications is identihed, testing must be extended to include affected areas. This is sometimes referred to as regression testing. [Pg.83]

The application of universal calibration requires a primary column calibration with elution of narrow MWD standards. For SEC in tetrahydrofuran, polystyrene (PS) standards are generally used. Intrinsic viscosities of the standards are either known or calculated from the proper Mark-Houwink equation, so that the plot of log[77]psMps values vs. retention volumes Vr may be created. The universal calibration equation is obtained by polynomial regression, in the same way described for the calibration with narrow MWD standards. [Pg.1006]

In PCR, a principal component analysis (PCA) is first made of the X matrix (properly transformed and scaled), giving as the result the score matrix T and the loading matrix P. Then in a second step a few of the first score vectors tg are used as predictor variables in a multiple linear regression with Y as the response matrix. In the case that the few first components of PCA indeed contain most of the information of X related to Y, PCR indeed works as well as PLS. This is often the case in spectroscopic data, and here PCR is an often used alternative. In more complicated applications, however, such as QSAR and process modeling, the first few principal components of X rarely contain a sufficient part of the relevant information, and PLS works much better than PCR. ... [Pg.2019]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...