Selection of Descriptors

GAs or other methods from evolutionary computation are applied in various fields of chemistry Its tasks include the geometry optimization of conformations of small molecules, the elaboration of models for the prediction of properties or biological activities, the design of molecules de novo, the analysis of the interaction of proteins and their ligands, or the selection of descriptors [18]. The last application is explained briefly in Section 9.7.6. [Pg.467]

This study was done because we wanted to see whether 3D descriptors can improve on the models obtained by 2D descriptors. Futhermore, we wanted to use the descriptor set as initially chosen, without any tedious selection of descriptor as reported in the Tutorial in Section 10.1.5.2. [Pg.501]

The previously mentioned data set with a total of 115 compounds has already been studied by other statistical methods such as Principal Component Analysis (PCA), Linear Discriminant Analysis, and the Partial Least Squares (PLS) method [39]. Thus, the choice and selection of descriptors has already been accomplished. [Pg.508]

For the selection of descriptors, GA simulated evolution of a population. Each individual of the population represents a subset of descriptors and is defined by a chromosome of binary values. The chromosome has as many genes as there are possible descriptors (92 for the aromatic group, 119 for non-rigid aliphatic,... [Pg.527]

After selection of descriptors/NN training, the best networks were applied to the prediction of 259 chemical shifts from 31 molecules (prediction set), which were not used for training. The mean absolute error obtained for the whole prediction set was 0.25 ppm, and for 90% of the cases the mean absolute error was 0.19 ppm. Some stereochemical effects could be correctly predicted. In terms of speed, the neural network method is very fast - the whole process to predict the NMR shifts of 30 protons in a molecule with 56 atoms, starting from an MDL Molfile, took less than 2 s on a common workstation. [Pg.527]

When applied to QSAR studies, the activity of molecule u is calculated simply as the average activity of the K nearest neighbors of molecule u. An optimal K value is selected by the optimization through the classification of a test set of samples or by the leave-one-out cross-validation. Many variations of the kNN method have been proposed in the past, and new and fast algorithms have continued to appear in recent years. The automated variable selection kNN QSAR technique optimizes the selection of descriptors to obtain the best models [20]. [Pg.315]

Several methods have been developed to aid in the selection of descriptors to create a QSAR model. The use of brute computational force to test every possible combination of the above problem is wasteful. Methods such as the Stepwise Searches (45), Simulated Annealing (94), and Genetic Algorithms... [Pg.158]

Fig. 4. Representation of simulated annealing for the selection of descriptors for a QSAR model. The potential well represents all possible models and as the value of R2 increases (low R2 values at the top and high R2 values at the bottom of the Y axis), the number of models decreases.

Fig. 5. Representation of a genetic algorithm for the selection of descriptors for a QSAR model. The model is commonly referred to as a gene and is encoded with different descriptors. Two Parents creating two Children is a crossover of genetic information (descriptors). The genes of an individual can mutate, introducing random changes in the model. Crossover and mutation are can occur independent of each other.

Through a careful selection of descriptors and model development, the resulting (Q)SARs may lead to predictions of reasonable accuracy. (Q)SAR models generally work according to... [Pg.79]

The successful application of the topological indices (some software for their calculation is listed in Table 5.3) to different problems in QSPR analyses depends significantly on the critical selection of descriptors and proper evaluation of the quantitative models. To this end, some knowledge regarding the limitations of topological indices is beneficial. [Pg.90]

There are dimensionality issues. Later we propose Mahalanobis distance (Section 4.5) as a good metric for diversity analysis. With p descriptors in the data set, this metric effectively, if not explicitly, computes a covariance matrix with ( ) parameters. In order to obtain accurate estimates of the elements of the covariance matrix, one rule of thumb is that at least five observations per parameter should be made. This suggests that a data set with n observations can only investigate approximately V2 /5 descriptors for the Mahalanobis distance computation. Thus, some method for subset selection of descriptors is needed. [Pg.80]

Structure Evaluating) was first developed before the MultiCASE system. It uses the same technology but differs in some ways. The major algorithmic difference in MultiCASE is the use of hierarchy in the selection of descriptors, leading to the concept of biophores and modulators. Another important difference is that only with MultiCASE new internal proprietary data can be used to create new databases. [Pg.811]

Molecule encoding. Several molecular attributes, such as predicted and measured properties and structural descriptors, span a high-dimensional feature space. The selection of descriptors is mainly driven by the experience of the scientists, project-specific considerations, and existing knowledge about putative structure-activity relationships. [Pg.358]

Most, if not all, QSAR methods require selection of relevant or informative descriptors before modeling is actually performed. This is necessary because the method could otherwise be more susceptible to the effects of noise. The a priori selection of descriptors, however, carries with it the additional risk of selection bias [73], when the descriptors are selected before the dataset is divided into the training and test sets (Figure 6.6A). Because of selection bias, both external validation and cross validation could significantly overstate pre-... [Pg.164]

Hodes, L. (1976) Selection of descriptors according to discrimination and redundancy. Application to chemical structure searching. /. Chem. Inf. Comput. Sci., 16, 88-93. [Pg.1067]

Okey R.W. and Martis, M.C. (1999) Molecular level studies on the origin of toxicity identification of key variables and selection of descriptors. Chemosphere, 38, 1419—1427. [Pg.1133]

Several other descriptors based on the topology of molecules were developed and have been successfully applied. Reviews on descriptor technology can be found in [19, 20, 22, 30]. The following discussion focuses on three recently developed descriptors used in the application study in section 3. The selection of descriptors was based on availability and speed of the software. [Pg.578]

Selection of descriptors for substructure search can be done on a statistical basis. The aim is to select a set in which the descriptors are independent of each other and roughly equifrequent in distribution. Much of the statistical analysis of fragment distributions was done by Lynch et al. in the 1970s (for a summary, see Ref. 86). The... [Pg.529]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...