Examples of SE and DSE Analysis

A key question is as follows Can SE and DSE, as an information theoretic approach to descriptor comparison and selection, be applied to accurately classify compoimds or to model physiochemical properties To answer this question, two conceptually different applications of SE and DSE analysis will be discussed here and related to other studies. The first application explores systematic differences between compound sets from synthetic and natural sources. The second addresses the problem of rational descriptor selection to predict the aqueous solubility of synthetic compounds. For these purposes, SE or DSE analysis were carried out, and in both cases, selected descriptors were used to build binary QSAR-like classification models. [Pg.280]

The highest ES descriptors reflect some known differences between synthetic and natural molecules, including, for example, the degree of saturation or aromatic character. It is also interesting to note that the descriptor with the highest ES value, a ICM, is itself calculated using entropic principles. It accounts for the entropy of the distribution of the elemental composition of the compound. [Pg.281]

Based on this SE and ES analysis, four sets of descriptors were tested in binary QSAR models. The four sets of descriptors consisted of (1) 7 descriptors with intermediate SE values in both databases, (2) 11 descriptors with low SE values in both databases, (3)8 descriptors with high SE values in both databases, and (4) 8 descriptors with the highest ES values in Table 1. [Pg.281]

Two conclusions can be derived from these results. First, it is feasible to use entropy-based information theory to select fewer than 10 chemical descriptors that can systematically distinguish between compounds from different sources. Second, when selecting descriptors to distinguish between compounds, it is important that these descriptors have high information content that can support separability or differentiate compounds between the datasets. The power of the entropic separation revealed in this analysis gave rise to the development of the DSE and, ultimately, the SE-DSE metric, as described earlier. [Pg.283]

Another example that focuses on the use of DSE analysis is to model chemical properties such as predicting the aqueous solubility of compounds. Aqueous solubility provides an example of a physicochemical property that can be addressed at the level of structurally derived chemical descriptors. Because the aqueous solubility of many compounds is known, an accurate and sufficiently large dataset can be accumulated for constructing and evaluating predictive models. In addition, problems surrounding solubility remain a significant issue for lead identification and optimization in pharmaceutical research. [Pg.283]

We will describe below the SE formalism in detail and explain how it can be used to estimate chemical information content based on histogram representations of feature value distributions. Examples from our work and studies by others will be used to illustrate key aspects of chemical information content analysis. Although we focus on the Shannon entropy concept, other measures of information content will also be discussed, albeit briefly. We will also explain why it has been useful to extend the Shannon entropy concept by introducing differential Shannon entropy (DSE) to facilitate large-scale analysis and comparison of chemical features. The DSE formalism has ultimately led to the introduction of the SE-DSE metric. [Pg.265]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...