Regression indicator variables

Aqueous solubility is selected to demonstrate the E-state application in QSPR studies. Huuskonen et al. modeled the aqueous solubihty of 734 diverse organic compounds with multiple linear regression (MLR) and artificial neural network (ANN) approaches [27]. The set of structural descriptors comprised 31 E-state atomic indices, and three indicator variables for pyridine, ahphatic hydrocarbons and aromatic hydrocarbons, respectively. The dataset of734 chemicals was divided into a training set ( =675), a vahdation set (n=38) and a test set (n=21). A comparison of the MLR results (training, r =0.94, s=0.58 vahdation r =0.84, s=0.67 test, r =0.80, s=0.87) and the ANN results (training, r =0.96, s=0.51 vahdation r =0.85, s=0.62 tesL r =0.84, s=0.75) indicates a smah improvement for the neural network model with five hidden neurons. These QSPR models may be used for a fast and rehable computahon of the aqueous solubihty for diverse orgarhc compounds. [Pg.93]

Polynomial regression with indicator variables is another recommended statistical method for analysis of fish-mercury data. This procedure, described by Tremblay et al. (1998), allows rigorous statistical comparison of mercury-to-length relations among years and is considered superior to simple hnear regression and analysis of covariance for analysis of data on mercury-length relations in fish. [Pg.105]

Tremblay G, Legendre P, Doyon J-F, Verdon R, Schetagne R. 1998. The use of polynomial regression analysis with indicator variables for interpretation of mercury in fish data. Biogeochemistry 40 189-201. [Pg.121]

Table 37.3 shows the complete table of eight indicator variables for 10 triply substituted tetracyclines [31 ] that have been tested for bacteriostatic activity (1/Z), which is defined here as the ratio of the number of colonies grown with a substituted and with the unsubstituted tetracycline. In this application we have three substitution positions, labelled U, V and W. The number of substituents at the three sites equals 2,3 and 3, respectively. Arbitrarily, we chose the compound with substituents H, NOj and NO2 at the sites U, V and W as the reference compound. This leads to a reduction of the number of indicator variables from eight to five, as shown in Table 37.4. The solution of the Free-Wilson model can be obtained directly by means of multiple regression ...

In the above expression the indicator variable I(X) takes the value 0 or 1, depending upon the absence or presence of the substituent X in a particular compound. The overall result of the regression is not significant at the 0.05 level of probability. This may be due to the unfavorable proportion of the number of compounds to the number of parameters in the regression equation (10 to 6). Only the indicator variable for substituent NHj at position W in the tetracycline molecule reaches significance (p = 0.02). This can be confirmed by looking at Table 37.4... [Pg.394]

Identification of Analysis of Covariance Model A general procedure, based on regression analysis, to identify the analysis-of-covariance model that applies to a given set of assay results to determine the shelf life is introduced here. We call this procedure the regression model with indicator variables for testing poolability of... [Pg.618]

Solution Based on these data, we can extract the following information to build the regression model with indicator variables 7 = 5 batches, J = 2 packages, r - 4, 5 = 1, and n = 6 samphng times for all batches 0, 3, 6, 9,12, and 18 months. The indicator variables are shown in Table 25 and the indicator variables model for this... [Pg.624]

At this point, a considerable amount of theory on Hansch analysis has been presented with almost no examples of practice. The next three Case Studies will hopefully solidify ideas on Hansch analysis that have already been discussed. Each Case Study introduces a different idea. The first is an example of a very simple Hansch equation with a small data set. The second demonstrates the use of squared parameters in Hansch equations. The third and final Case Study shows how indicator variables are used in QSAR studies. If you are unfamiliar with performing linear regressions, be sure to read Appendix B on performing a regression analysis with the LINEST function in almost any common spreadsheet software. A section in the appendix describes in great detail how to derive Equations 12.20 through 12.22 in the first Case Study. [Pg.307]

QSAR Quantitative structure-activity relationship. Quantitative structure-bio-logical activity model derived using regression analysis and containing as parameters physical-chemical constants, indicator variables, or theoretically calculated values. [Pg.225]

I = 21 y = 0.394 r = 0.9476 F = 49.8 where log P is the hydrophobicity, bondrefr is the molecular refractivity, delta is the submolecular polarity parameter, ind indicator variable (0 for heterocyclics and 1 for benzene derivatives). Calculations indicated that PBD-coated alumina behaves as an RP stationary phase, the bulkiness and the polarity of the solute significantly influencing the retention. The separation efficiency of PBD-coated alumina was compared with those of other stationary phases for the analysis of Catharanthus alkaloids. It was established that the pH of the mobile phase, the concentration and type of the organic modifier, and the presence of salt simultaneously influence the retention. In this special case, the efficiency of PBD-coated alumina was inferior to that of ODS. The retention characteristics of polyethylene-coated alumina (PE-Alu) have been studied in detail using various nonionic surfactants as model compounds.It was found that PE-Alu behaves as an RP stationary phase and separates the surfactants according to the character of the hydrophobic moiety. The relationship between the physicochemical descriptors of 25 aromatic solutes and their retention on PE-coated silica (PE-Si) and PE-Alu was elucidated by stepwise regression analysis. [Pg.121]

Statistical methods. Certainly one of the most important considerations in QSAR is the statistical analysis of the correlation of the observed biological activity with structural parameters - either the extrathermodynamic (Hansch) or the indicator variables (Free-Wilson). The coefficients of the structural parameters that establish the correlation with the biological activity can be obtained by a regression analysis. Since the models are constructed in terms of multiple additive contributions the method of solution is also called multiple linear regression analysis. This method is based on three requirements (223) i) the independent variables (structural parameters) are fixed variates and the dependent variable (biological activity) is randomly produced, ii) the dependent variable is normally and independently distributed for any set of independent variables, and iii) the variance of the dependent variable must be the same for any set of independent variables. [Pg.71]

Free Wilson analysis [31,32] is much easier to apply. Biological activity values are correlated with indicator variables, which, for each position of substitution and every substituent, indicate the presence (value 1) or absence (value 0) of the corresponding substituent (Table 2). If there is more than one substituent in a certain position or if symmetrical positions (e.g., meta,metd-disubstituted compounds) are condensed into one variable, numbers of two or higher are used instead of one. Regression analysis leads to Eq. (17) [30-32] ... [Pg.543]

When performing an indicator variable regression, it is often useful to compare two separate regressions for j-intercepts. This can be done using the six-step procedure. [Pg.356]

Let us compute the same problem using mily one regression equation with an indicator variable. Let... [Pg.363]

A very useful application of dummy or indicator variable regression is in modeling regression functions that are nonlinear. For example, in steam sterilization death kinetic calculations, the thermal death curve for bacterial spores often looks sigmoidal (Figure 9.9). [Pg.387]

Indicator variables (I 1.0/0.0) are used to code the presence or absence of a key substructure. Regression of real numbers (pIC50 s) against a matrix of indicator variables is a valid procedure for large sets, as in the Free-Wllson method. However, many of the sets in this study are small (n = 7-10) and it is probable that statistical measures for these sets are only approximate. The overall consistency of substructure dependence in both small and larger sets is considered to validate these measures in a seml-quantltatlve sense. [Pg.282]

Sigma-rho corrections were assigned equally to the substituents at positions 6 and 7. Revised STERIMOL (L, B1 and B5) parameters were calculated following established methods.f27.28l The data matrices consist of lipophilicity (F), molar refractivity (MR), STERIMOL (L, Bl, B5) parameters and de novo indicator variables for substituents in positions 1, 6 and 7. The contribution to the partition coefficient by the substituents at positions 6 and 7 were summed, 2F(6,7), as were the molar refractivity contributions, 2MR(6,7). These two sets of summed variables paralleled a similar approach used by Koga in his QSAR analysis of a set of quinoline derivatives. (231 At the same time, the limitations that apply to summed n values, particularly where there is a significant a-p interaction were kept in mind.f22.29.301 The latter is represented as API in the data tables and its significance was evaluated in some of the regression models. In order to check for parabolic relationships, squared terms for the partition coefficient, molar refraction, and STERIMOL parameters were evaluated. [Pg.306]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...