Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...

Articles Figures Tables About

Statistical analysis multiple-descriptor models

In this chapter we presented two structural measures of molecular shape that can be used as predictor variables in MLR (multiple linear regression) analysis of structure-activity studies - cylindrical (8,G) and ovality ( in, i = 1,2,3) molecular descriptors - and two inexpensive overlapping methods useful for quick receptor mapping - MTD (minimal topological difference) and MVD (minimal volume difference). A subsequent statistical analysis of QSAR models developed with these shape molecular descriptors explained well the variance in the observed reactivity data (8 descriptor of cylindrical shape) and biological activity of retinoids (MTD) and sulfonamides (MVD). [Pg.375]

For statistical reasons, multiple regression analysis cannot be used for 3D-QSAR methods that consider many more 3D descriptors than compounds or for which the descriptors are mutually correlated. The alternative strategies described next can be used to find a quantitative model in such situations. As will be seen, cross-validation is an important technique for assessing the robustness of a proposed model. [Pg.189]

The pool of descriptors that is calculated must be winnowed down to a manageable set before constructing a statistical or neural network model. This operation is called feature selection. The first step of feature selection is to use a battery of objective statistical methods. Descriptors that contain little information, descriptors that have little variation across the data set, or descriptors that are highly correlated with other descriptors are candidates for elimination. Multivariate correlations among descriptor can also be discovered with multiple linear regression analysis, and these relationships can be broken down by elimination of descriptors. [Pg.2325]

When compounds are selected according to SMD, this necessitates the adequate description of their structures by means of quantitative variables, "structure descriptors". This description can then be used after the compound selection, synthesis, and biological testing to formulate quantitative models between structural variation and activity variation, so called Quantitative Structure Activity Relationships (QSARs). For extensive reviews, see references 3 and 4. With multiple structure descriptors and multiple biological activity variables (responses), these models are necessarily multivariate (M-QSAR) in their nature, making the Partial Least Squares Projections to Latent Structures (PLS) approach suitable for the data analysis. PLS is a statistical method, which relates a multivariate descriptor data set (X) to a multivariate response data set Y. PLS is well described elsewhere and will not be described any further here [42, 43]. [Pg.214]

The system constants in Eqs. (1.6) and (1.7) are obtained by multiple linear regression analysis for a number of solute property determinations for solutes with known descriptors. The solutes used should be sufficient in number and variety to establish the statistical and chemical validity of the model [72-74]. In particular, there should be an absence of significant cross-correlation among the descriptors, clustering of either descriptor or dependent variable values should be avoided, and an exhaustive fit should be obtained. Table 1.4 illustrates part of a typical output. The overall correlation coefficient, standard error in the estimate, Fischer F-statistic, and the standard deviation in the individual system constants are used to judge whether the results are statistically sound. An exhaustive fit is obtained when small groups of solutes selected at random can be deleted from the model with minimal change in the system constants. [Pg.18]

In this paper we focus on linear relationships between descriptors and biological properties, which are detected by statistical techniques such as multiple linear regression (MLR) or partial least squares analysis (PLS) [64]. A further development are hierarchical PLS models [65-67], which can be employed if fhe descriptors can be grouped into several subsets After a PLS analysis for each subsef, fhe results of these base-level PLS models are combined into a top-level PLS analysis. [Pg.67]

Once the descriptors have been selected, investigators need to select the statistical approach for developing the QSAR model. This can involve a number of techniques, such as multiple linear regression, partial least squares analysis, neural networks, and a variety of others [9]. These techniques need to be applied to both the training set (model development) and the validation set (assessment of predictability). [Pg.26]

Under circiim.stances in which the molecular descriptors are highly intercorrelated (e.g., molecular connectivity indices), there are statistical limitations with respect to the use of a classical multiple regression analysis. Such data sets can be satisfactorily treated by the application of principal components regression (PCR) and partial least squares (PLS) methods.Numerous environmental QSAR model.s use... [Pg.934]


See other pages where Statistical analysis multiple-descriptor models is mentioned: [Pg.3]    [Pg.113]    [Pg.344]    [Pg.437]    [Pg.165]    [Pg.217]    [Pg.176]    [Pg.136]    [Pg.183]    [Pg.339]    [Pg.134]    [Pg.8]    [Pg.36]    [Pg.398]    [Pg.168]    [Pg.292]    [Pg.317]    [Pg.330]    [Pg.168]    [Pg.246]    [Pg.83]    [Pg.99]    [Pg.311]    [Pg.345]    [Pg.2068]    [Pg.168]    [Pg.303]    [Pg.200]    [Pg.271]    [Pg.169]    [Pg.44]    [Pg.61]    [Pg.88]    [Pg.4]    [Pg.84]    [Pg.1931]    [Pg.317]    [Pg.141]    [Pg.212]    [Pg.468]    [Pg.129]    [Pg.75]   
See also in sourсe #XX -- [ Pg.19 , Pg.20 , Pg.21 , Pg.22 , Pg.23 ]




SEARCH



Descriptor analysis

Model analysis

Model multiple

Modeling Statistics

Multiple analyses

Multiplicity analysis

Statistical analysis

Statistical modeling

Statistical models

© 2024 chempedia.info