Descriptor Sets

The descriptor set can then be reduced by eliminating candidates that show such bad characteristics. Optimization techniques such as genetic algorithms (see Section 9.7) are powerful means of automating this selection process. [Pg.490]

On the other hand, techniques like Principle Component Analysis (PCA) or Partial Least Squares Regression (PLS) (see Section 9.4.6) are used for transforming the descriptor set into smaller sets with higher information density. The disadvantage of such methods is that the transformed descriptors may not be directly related to single physical effects or structural features, and the derived models are thus less interpretable. [Pg.490]

This study was done because we wanted to see whether 3D descriptors can improve on the models obtained by 2D descriptors. Futhermore, we wanted to use the descriptor set as initially chosen, without any tedious selection of descriptor as reported in the Tutorial in Section 10.1.5.2. [Pg.501]

A second example of a VS exercise that was largely fingerprint-based was that of Boecker et al., in search of novel series for dopamine D2 and dopamine D3 blockers [65]. A set of known actives consisting of 472 dopamine D2 and D3 ligands was assembled from the literature. The SPECS database of 230,000 compounds was chosen from which to identify compounds. Two descriptor sets were calculated MOE2D [51] and CATS3D [77] for both query and database molecules. Neighbors... [Pg.96]

In addition, the calculation of many different ID, 2D and 3D descriptors is possible using a range of commercially available software packages, such as Sybyl, Cerius2, Tsar, Molconn-Z and Hybot. Several new descriptor sets are based on quantification of 3D molecular surface properties, and these have been explored for the prediction of, e.g., Caco-2 permeability and oral absorption. It is pointed out here that a number of these new descriptors are strongly correlated to the more traditional physico-chemical properties. [Pg.5]

Cabrera et al. [50] modeled a set of 163 drugs using TOPS-MODE descriptors with a linear discriminant model to predict p-glycoprotein efflux. Model accuracy was 81% for the training set and 77.5% for a validation set of 40 molecules. A "combinatorial QSAR" approach was used by de Lima et al. [51] to test multiple model types (kNN, decision tree, binary QSAR, SVM) with multiple descriptor sets from various software packages (MolconnZ, Atom Pair, VoSurf, MOE) for the prediction of p-glycoprotein substrates for a dataset of 192 molecules. Best overall performance on a test set of 51 molecules was achieved with an SVM and AP or VolSurf descriptors (81% accuracy each). [Pg.459]

For example, the descriptor set used in MOE, distributed by Chemical Computing Group, 125 University St., Suite 1600, Montreal, Quebec, Canada H3B 3x3, www.chemcomp.com. [Pg.194]

Determining the Best Model from an Extensive Descriptor Set... [Pg.157]

Principal component analysis (19,20) is used to create correlated descriptors from existing descriptors as the dimensionality of a descriptor set is reduced to create QSAR models. Principal component analysis is a method capable of... [Pg.172]

This part of the case study was broken into two sections, the original descriptor set (2,125,127) (Hansch descriptors) and a new set of descriptors calculated in MOE (6) (MOE descriptors). The MOE Descriptors were calculated for molecules assigned Gasteiger (70) partial charges in the MMFF (46-52). The new descriptor set is constructed of three properties. The water accessible surface area (ASA) calculated using a radius of 1.4 A for a water molecule and the... [Pg.193]

Regardless of the method used to construct the model (MLR, PCA, PCR, or PLS) the QSAR models constructed were relatively the same with respect to the descriptor set. There were discrepancies between the cross-validation values (AO2 sj 0.04) being attributed to the method used to create the model. Through the examination of Tables 3 and 4 an understanding of the physicochemical properties that are important to increase or decrease binding can be achieved. [Pg.194]

Each of these descriptor sets is derived from, or related to, the Hansch and Leo descriptors with the expectation that they would be widely applicable. Taken together the VSA descriptors define, nominally, a 10 + 8 + 14 = 32 dimen-... [Pg.266]

In our study we compare two diversity-driven design methods (uniform cell coverage and clustering), two analysis methods motivated by similarity (cell-based analysis and cluster-classification), and two descriptor sets (BCUT and constitutional). Thus, our study addresses some of the many questions arising in a sequential screen how to choose the initial screen, how to analyze the structure-activity data, and what molecular descriptor set to use. The study is limited to one assay and thus cannot be definitive, but it at least provides preliminary insights and reveals some trends. [Pg.308]

For example, we address the following questions. Does diversity generated by one method and descriptor set correspond to diversity according to another How do various designs compare with random selections In structure-activity analysis, does one method outperform another in the identification of active compounds ... [Pg.308]

Using the uniform-cell-coverage (UCC) criterion in Subheading 2.3., with 4096 cells in every ID, 2D, and 3D subspace of a descriptor set. [Pg.308]

Uniform-Cell-Coverage (UCC) Criterion Evaluated for Three Descriptor Sets and Various Designs3... [Pg.309]

Design method Descriptor set used for design BCUT Constitutional - 6 PCs Constitutional - 20 PCs... [Pg.309]

The UCC and clustering methods require a descriptor set—BCUT or constitutional descriptors. As our implementation of UCC requires continuous descriptors, the 46 constitutional descriptors, which include discrete counts, were also reduced to either the first 6 or the first 20 principal components (PCs). Thus, the UCC algorithm was applied to the BCUT descriptors and either 6 or 20 PCs from the constitutional descriptors. In addition to these three sets, clustering was also applied to the 46 raw constitutional descriptors. The random design requires no descriptors. [Pg.309]

Thus, there are eight design-method/descriptor-set combinations to compare, as shown in the first two columns of Table 1. We use UCC to measure diversity, as it provides a comprehensive assessment of coverage in all low-dimensional subsets of variables. Recall that a small value of UCC is better. Furthermore, no matter how the design is generated, UCC can be measured according to the BCUT or constitutional descriptors (6 or 20 PCs). The results are very similar for the two replicates, hence only the first replicate is reported. [Pg.309]

Cell Coverage in 1D, 2D, and 3D Subspaces for Various Descriptor Sets and Designs3... [Pg.310]

To our surprise, the comparisons in Subheading 4.2. of diversity design strongly suggest that a design constructed to be diverse according to one method and descriptor set is not much more diverse than random when assessed by a different diversity criterion. In the absence of compelling reasons for a... [Pg.312]

We computed an analysis of variance over Table 1, followed by tests of specific effects. Two results were statistically significant. Random selection of compounds was better than cluster or space-filling selection. BCUT descriptors were better for analysis than either of the principal component descriptor sets. [Pg.331]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...