Databases and Chemical Space

The advent of experimental techniques such as combinatorial and parallel chemical synthesis, and high-throughput screening has enabled the production of massive amounts of data. These compound databases have played a key role in drug design (Miller, 2002) and other research areas such as Agrochemistry and Food Chemistry. Current computational [Pg.37]

Karina Martinez-Mayorga and Jose L. Medina-Franco [Pg.38]

The Distributed Structure-Searchable Toxicity (DSSTox) Database Network is a public database containing more than 1000 molecules annotated with toxicity data. The database can be searched online. The chemical structures can also be downloaded from the web site for analysis (vide infra). [Pg.38]

TABLE 2.2 Examples of large molecular databases used in research [Pg.38]

DSSTox (Distributed Structure- http //www.epa.gov/ncct/dsstox/ [Pg.38]

Contents I. Introduction 34 II. Molecular Descriptors and Physicochemical Properties 36 III. Molecular Databases and Chemical Space 37 IV. Chemoinformatics in Food Chemistry 40 V. Examples of Molecular Similarity, Pharmacophore Modeling, Molecular Docking, and QSAR in Food or Food-Related Components 43 A. Molecular similarity 43 B. Pharmacophore model 47 C. QSAR and QSPR 48 D. Molecular docking 49 VI. Concluding Remarks and Perspectives 52 Acknowledgments 53 References 53... [Pg.33]

Abstract The aim of the present chapter is to present the current research and potential applications of chemoinformatics tools in food chemistry. First, the importance and variety of molecular descriptors and physicochemical properties is delineated, and then a survey and chemical space analysis of representative databases with emphasis on food-related ones is presented. A brief description of methods commonly used in molecular design, followed by examples in food chemistry are presented, such methods include similarity searching, pharmacophore modeling, quantitative... [Pg.33]

The E-state indices may define chemical spaces that are relevant in similarity/ diversity search in chemical databases. This similarity search is based on atom-type E-state indices computed for the query molecule [55]. Each E-state index is converted to a z score, Z =(% -p )/0 , where is the ith E-state atomic index, p is its mean and O is its standard deviation in the entire database. The similarity was computed with the EucHdean distance and with the cosine index and the database used was the Pomona MedChem database, which contains 21000 chemicals. Tests performed for the antiinflamatory drug prednisone and the antimalarial dmg mefloquine as query molecules demonstrated that the chemicals space defined by E-state indices is efficient in identifying similar compounds from drug and drug-tike databases. [Pg.103]

The full matrix nature of the BioPrint database also enables an analysis of targets in drug chemical space. In this approach, a target is characterized by a fingerprint of the activities of a fixed set of compounds (the drug and reference compound set) against... [Pg.41]

Fig. 1. Median partitioning and compound selection. In this schematic illustration, a two-dimensional chemical space is shown as an example. The axes represent the medians of two uncorrelated (and, therefore, orthogonal) descriptors and dots represent database compounds. In A, a compound database is divided in into equal subpopulations in two steps and each resulting partition is characterized by a unique binary code (shared by molecules occupying this partition). In B, diversity-based compound selection is illustrated. From the center of each partition, a compound is selected to obtain a representative subset. By contrast, C illustrates activity-based compound selection. Here, a known active molecule (gray dot) is added to the source database prior to MP and compounds that ultimately occur in the same partition as this bait molecule are selected as candidates for testing. Finally, D illustrates the effects of descriptor correlation. In this case, the two applied descriptors are significantly correlated and the dashed line represents a diagonal of correlation that affects the compound distribution. As can be seen, descriptor correlation leads to over- and underpopulated partitions.

Once a database of candidate molecules has been prepared, it may be desirable to select a diverse set of molecules. Diversity algorithms are designed to select sets of molecules in such a way that the chemical space from which they have been extracted is sampled democratically.1291 Molecules are represented in this space using molecular descriptors and dissimilarity between them is quantified using metrics derived from the value of the descriptors. In terms of descriptors that have been used for fragment molecules,... [Pg.45]

The hits identified by screening a small percentage of the database can be followed up using 2D methods and/or 3D pharmacophore-based further exploration of the database to retrieve more actives in a sequential screening fashion. The goal is to minimize the number of compounds screened and maximize the number of actives retrieved at the end of the screening rounds such that the relevant chemical space is explored with an optimal use of resources. [Pg.204]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...