Regression with indicator variables

Examination of the log values showed that the replacement of oxygen by sulphur did not produce the expected increase in lipophilidty and it was found that a second indicator variable, 5, to show the presence or absence of sulphur could be added to the equation to give [Pg.129]

An interesting technique which dates from the early days of modem QSAR, known as the Free-Wilson method (Free and Wilson 1964) represents an extreme case of the use of indicator variables, since regression equations are generated which contain no physicochemical parameters. This technique relies on the following assumptions. [Pg.132]

There is a constant contribution to activity from the parent structure. [Pg.132]

Substituents on the parent make a constant contribution (positive or negative) to activity and this is additive. [Pg.132]

There are no interaction effects between substituents, nor between substituents and the parent. [Pg.132]

Polynomial regression with indicator variables is another recommended statistical method for analysis of fish-mercury data. This procedure, described by Tremblay et al. (1998), allows rigorous statistical comparison of mercury-to-length relations among years and is considered superior to simple hnear regression and analysis of covariance for analysis of data on mercury-length relations in fish. [Pg.105]

Tremblay G, Legendre P, Doyon J-F, Verdon R, Schetagne R. 1998. The use of polynomial regression analysis with indicator variables for interpretation of mercury in fish data. Biogeochemistry 40 189-201. [Pg.121]

Identification of Analysis of Covariance Model A general procedure, based on regression analysis, to identify the analysis-of-covariance model that applies to a given set of assay results to determine the shelf life is introduced here. We call this procedure the regression model with indicator variables for testing poolability of... [Pg.618]

Solution Based on these data, we can extract the following information to build the regression model with indicator variables 7 = 5 batches, J = 2 packages, r - 4, 5 = 1, and n = 6 samphng times for all batches 0, 3, 6, 9,12, and 18 months. The indicator variables are shown in Table 25 and the indicator variables model for this... [Pg.624]

Free Wilson analysis [31,32] is much easier to apply. Biological activity values are correlated with indicator variables, which, for each position of substitution and every substituent, indicate the presence (value 1) or absence (value 0) of the corresponding substituent (Table 2). If there is more than one substituent in a certain position or if symmetrical positions (e.g., meta,metd-disubstituted compounds) are condensed into one variable, numbers of two or higher are used instead of one. Regression analysis leads to Eq. (17) [30-32] ... [Pg.543]

Aqueous solubility is selected to demonstrate the E-state application in QSPR studies. Huuskonen et al. modeled the aqueous solubihty of 734 diverse organic compounds with multiple linear regression (MLR) and artificial neural network (ANN) approaches [27]. The set of structural descriptors comprised 31 E-state atomic indices, and three indicator variables for pyridine, ahphatic hydrocarbons and aromatic hydrocarbons, respectively. The dataset of734 chemicals was divided into a training set ( =675), a vahdation set (n=38) and a test set (n=21). A comparison of the MLR results (training, r =0.94, s=0.58 vahdation r =0.84, s=0.67 test, r =0.80, s=0.87) and the ANN results (training, r =0.96, s=0.51 vahdation r =0.85, s=0.62 tesL r =0.84, s=0.75) indicates a smah improvement for the neural network model with five hidden neurons. These QSPR models may be used for a fast and rehable computahon of the aqueous solubihty for diverse orgarhc compounds. [Pg.93]

Table 37.3 shows the complete table of eight indicator variables for 10 triply substituted tetracyclines [31 ] that have been tested for bacteriostatic activity (1/Z), which is defined here as the ratio of the number of colonies grown with a substituted and with the unsubstituted tetracycline. In this application we have three substitution positions, labelled U, V and W. The number of substituents at the three sites equals 2,3 and 3, respectively. Arbitrarily, we chose the compound with substituents H, NOj and NO2 at the sites U, V and W as the reference compound. This leads to a reduction of the number of indicator variables from eight to five, as shown in Table 37.4. The solution of the Free-Wilson model can be obtained directly by means of multiple regression ...

At this point, a considerable amount of theory on Hansch analysis has been presented with almost no examples of practice. The next three Case Studies will hopefully solidify ideas on Hansch analysis that have already been discussed. Each Case Study introduces a different idea. The first is an example of a very simple Hansch equation with a small data set. The second demonstrates the use of squared parameters in Hansch equations. The third and final Case Study shows how indicator variables are used in QSAR studies. If you are unfamiliar with performing linear regressions, be sure to read Appendix B on performing a regression analysis with the LINEST function in almost any common spreadsheet software. A section in the appendix describes in great detail how to derive Equations 12.20 through 12.22 in the first Case Study. [Pg.307]

I = 21 y = 0.394 r = 0.9476 F = 49.8 where log P is the hydrophobicity, bondrefr is the molecular refractivity, delta is the submolecular polarity parameter, ind indicator variable (0 for heterocyclics and 1 for benzene derivatives). Calculations indicated that PBD-coated alumina behaves as an RP stationary phase, the bulkiness and the polarity of the solute significantly influencing the retention. The separation efficiency of PBD-coated alumina was compared with those of other stationary phases for the analysis of Catharanthus alkaloids. It was established that the pH of the mobile phase, the concentration and type of the organic modifier, and the presence of salt simultaneously influence the retention. In this special case, the efficiency of PBD-coated alumina was inferior to that of ODS. The retention characteristics of polyethylene-coated alumina (PE-Alu) have been studied in detail using various nonionic surfactants as model compounds.It was found that PE-Alu behaves as an RP stationary phase and separates the surfactants according to the character of the hydrophobic moiety. The relationship between the physicochemical descriptors of 25 aromatic solutes and their retention on PE-coated silica (PE-Si) and PE-Alu was elucidated by stepwise regression analysis. [Pg.121]

The most important indices are listed below. The quantity df refers to the degrees of freedom of the error, i.e. to n - p, where n is the number of objects (samples), p the number of model parameters (for example, n - p - 1 for a regression model with p variables and the intercept), dfu and dfi refer to the degrees of freedom of the model and the total degrees of freedom, respectively. [Pg.368]

Statistical methods. Certainly one of the most important considerations in QSAR is the statistical analysis of the correlation of the observed biological activity with structural parameters - either the extrathermodynamic (Hansch) or the indicator variables (Free-Wilson). The coefficients of the structural parameters that establish the correlation with the biological activity can be obtained by a regression analysis. Since the models are constructed in terms of multiple additive contributions the method of solution is also called multiple linear regression analysis. This method is based on three requirements (223) i) the independent variables (structural parameters) are fixed variates and the dependent variable (biological activity) is randomly produced, ii) the dependent variable is normally and independently distributed for any set of independent variables, and iii) the variance of the dependent variable must be the same for any set of independent variables. [Pg.71]

Let us compute the same problem using mily one regression equation with an indicator variable. Let... [Pg.363]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...