Optimization descriptor

Use cross-validation to select the optimal descriptors without compound 1. [Pg.492]

Keywords Optimal descriptor, QSPR, SMILES, Solubility fullerene C60... [Pg.337]

In this chapter, the genesis of SMILES-based descriptors (as well as perspectives of utilization of these characteristics for QSPR/QSAR analyses) is discussed. We concluded that in fact the SMILES-based optimal descriptors are derivatives of the graph-based optimal descriptors. In fact the SMILES-based descriptors are calculated with scheme that is similar to the well-known additive scheme (Zinkevich et al., 2004), but instead of contributions for the molecular fragments (chemical elements, different kinds of cycles, covalent bonds, etc.) contributions for the SMILES fragments (c, C, n, N, Cl, Br, =,, etc.) are using. [Pg.338]

Flexible optimal descriptors have been defined as specific modifications of adjacency matrix, by means of utilization of nonzero diagonal elements (Randic and Basak, 1999, 2001 Randic and Pompe, 2001a, b). These nonzero values of matrix elements change vertex degrees and consequently the values of molecular descriptors. As a rule, these modifications are aimed to change topological indices. The values of these diagonal elements must provide minimum standard error of estimation for predictive model (that is based on the flexible descriptor) of property/activity of interest. [Pg.339]

Optimal descriptor used for the QSPR modeling of the C60 solubility is expressed as ... [Pg.341]

It is to be noted that in fact the optimal descriptors examined in the present study are topological characteristics. In other words, no information, except the topology of solvent molecules and experimental values of the fullerene C60 solubility, has been used. [Pg.346]

SMILES-based optimal descriptors can be utilized as a tool for prediction of the fullerene C60 solubility. [Pg.348]

Toropov AA, Leszczynska D, Leszczynski J (2007a) QSPR study on solubility of fullerene C60 in organic solvents using optimal descriptors calculated with SMILES Chem. Phys. Lett. 441 119-122. [Pg.350]

A.P. Toropova et al., QSPR modeling mineral crystal lattice energy by optimal descriptors of the graph of atomic orbitals. Chem. Phys. Lett. 428, 183-186 (2006)... [Pg.215]

A.A. Toropov et al., QSAR modeling toxicity of nanosized oxides towards E. coli bacteria using the SMILES-based optimal descriptors. Chem. Biol. Dmg Des., 2009 (submitted)... [Pg.216]

Reversible decoding is of great importance, since once a SRC model is established optimal values of the response can be chosen and values of the model molecular descriptors calculated by using the estimated SRC model. Then the possible molecular structures corresponding to the optimized descriptor values can be designed (and synthesized). This last operation is a troublesome task as the model molecular descriptors are not simple and easily interpretable. [Pg.423]

Toropoy A.A., Rasulev, B.F. and Leszczynski, J. (2007) QSAR modeling of acute toxicity for nitrobenzene derivatives towards rats comparative analysis by MLRA and optimal descriptors. QSAR Comb. Sci., 26, 686-693. [Pg.1184]

The observed vs. calculated plots have been obtained with the given correlation vector C of the optimal descriptor of Table 7.9 (unless otherwise stated). The derivation of the semi-random descriptors, due to the limits of the our PC, could not be searched with the two sets of random indices plus all MC indices plus the five experimental parameter and M. They have been calculated in a more compact way, which is explained in each property s paragraph. The super-descriptors have been obtained with a full search over the set of indices of the optimal descriptors for the 12 properties (see the corresponding paragraph). [Pg.131]

The search for a me-exp-rn-dcscription with rl-r38 and then rdl-rdS8 plus Tb, e, and the set of MC-Kp(p-odd)lfs indices finds exactly the already found five-index descriptor. A search among rl- r38 and rdl-rd38 plus the five indices of the optimal descriptor finds again this same descriptor. The descriptor is now chosen among rl- r38 plus rdl-rd38 plus and s. The full search lands on the... [Pg.142]

The attempt to model this property with a four-index descriptor searched among the sets of MC indices plus the rl-r38 and then rdl-rd38 random numbers gave no new results (the previous optimal descriptor of Tale 7.9 was confirmed). Nevertheless, if the search is restricted to rl-r38 plus rdl-rd38 plus yj, and it is possible to obtain the rather good semi-random descriptor (here only y is configuration-dependent) of Tables 7.13 and 7.14, but its is not as good as the q of the optimal descriptor. [Pg.150]

After introduction of the classical approaches to predict properties and activities (endpoints) of the typical chemical substances essential parts of this chapter are devoted to (i) description of optimal descriptors which are translators of eclectic data into endpoint prediction (ii) discussion of predictive X)tential of models based on the optimal descriptors and (iii) discussion of possible ways to improve the optimal descriptors as a tool to build up QFPR/QFAR which are analogs of the traditional QSPR/QSAR for nanomaterials. [Pg.354]

General scheme of building up optimal descriptors with the hydrogen-suppressed graphs based approach can be demonstrated using example of ethyl isopropyl sulhde with the following numbering of atoms ... [Pg.357]

Optimal descriptor is a mathematical function of the molecular structure where instead of the rigid invariants (e.g. vertex degree, topological distances, etc.) specific coefficients are used. Thus, the traditional descriptor is in fact ... [Pg.358]

It is to be noted that comparison of the hydrogen-suppressed graph based and the hydrogen-filled graph based optimal descriptors has been carried out in [18]. It has been shown that optimal descriptor based on hydrogen-filled graph improves accuracy prediction of normal boiling points of alkyl alcohols. [Pg.359]

The graph of atomic orbitals (GAO) [16-22] also can be used as a basis for the optimal descriptors. The basic idea of the representation of the molecular stmcture by considering configuration of chemical elements is presented in the Table 12.1. [Pg.359]

The scheme can be based not only on molecular graphs, but also on simplified Molecular Input Line Entry System (SMILES) [24—26]. SMILES contains an string of characters. These characters reflect molecular stmcture, or at least some part of attributes of the molecular stmcture. Based on such assumptions one can attempt to define a descriptor that represents a mathematical function of the SMILES characters. Figure 12.2 contains the scheme for building up optimal descriptors using SMILES. [Pg.359]

Thus, there are four basic representation of the molecular structure which can be used as basis to build up the optimal descriptors (Fig. 12.3) (i) hydrogen suppressed graph (ii) hydrogen hlled graph (iii) GAO and (iv) SMILES. These representations also can be involved into hybrid version of the optimal descriptor where molecular features extracted from e.g. GAO and SMILES play the role of hybrid basis for a QSPR/QSAR predictions [27-32]. [Pg.360]

Fig. 12.2 The definition of optimal descriptors using representation of the molecular structure by SMILES...

Fig. 12.3 The basic versions of the representations of the molecular structure which are used to build up optimal descriptors...

The above discussion provides summary of QSPR/QSAR approaches applied to classical, chemical compounds. However, an analysis devoted to nanomaterials having gigantic and complex molecular architecture lead to necessity of definition of new approaches for the predictive modelling, because the representation of their molecular structure by means of molecular graph and/or SMILES sometimes becomes very problematic (e.g. multi-walled carbon nanotubes [34], graphene [35]). In the first approximation, the optimal descriptors for such species should be a collector of all available data which are able to impact the physicochemical and/or biochemical behavior of nanomaterials. This concept is displayed in Fig. 12.6. [Pg.361]

Figure 12.7 shows the evolution of concepts related to building up optimal descriptors. Interestingly, at the last stage one can see a new quality the predicted details of molecular structure lost the advantage to be the only source of the information. The essence of difference between traditional descriptors and descriptors calculated with quasi-SMILES is depictured at the Fig. 12.8. Here eclectic feamres (impacts) partially replace data on the molecular structure.

Fig. 12.5 The general scheme of the Monte Carlo optimization used as the basis of caleulation of optimal descriptors. The row Correlation weight contains graphical images of various features (extracted from graph or SMILES) characterized by positive values of the correlation weights (they are indicated by white color) or by negative values of correlation weights (those are indieated by black color). Blocked (rare) features have correlation weights which are fixed to be equal to zero (indicated by grey b ). The R(X,Y) is correlation coefficient between descriptor and endpoint...

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...