PubChem Compound table

The third set of files from the PubChem repository describes chemical compounds. These are distributed as sdf files and are identified using a unique compound id. There are also multiple properties associated with each compound. Using the sdf2sql file utility described above, the table pubchem.compound is created. The compound table can then be used to locate compounds by searching any of the columns of data for example,... [Pg.58]

From the examples in the previous section, it is clear how the substance id relates pubchem.substance to biological assay data and how substance data can be selected using the substance id. How can the compound table be used to select compound data for substances appearing in one of the biological assay data tables In other words, how is the... [Pg.58]

There are several approaches to creating the relation between compound and substance. One is to create an integer column, say, pubchem. substance.cid that would contain only the primary compound id from the column cid associations. This column becomes a foreign key related to the pubchem. compound. c id column. This would form a proper relation between the tables, but would neglect the secondary cid associations. If those are of no interest, this approach is an excellent choice. [Pg.59]

The proper way to create a relation between pubchem.substance, substance id and pubchem.compound.cid is to create a new table that acts as an intermediary. This is a typical approach to handling many-to-many relationships. This table must include a column for the compound id and a column for the substance id. There can be as many rows... [Pg.59]

Table 2.2 also lists three publicly available databases commonly used in drug research. PubChem is accessed through the National Library of Medicine (Austin et al., 2004) and contains chemical structure information and corresponding activity across a number of biological assays. The system links the compound information with biomedical literature and it is possible to perform web-based similarity searching. PubChem also enables one to download files with structures to perform chemoinformatic analysis off-line. [Pg.39]

These are some examples of the analysis that can be performed for three representative datasets in Table 2.2. Similar analysis using these or other molecular representations can be conducted for other databases. In fact, a comparison of the collection of drugs analyzed here with natural products and compounds obtained from PubChem has been published elsewhere (Singh et ah, 2009). [Pg.43]

Other properties are not unique, for example, chemical names. These should be stored in a separate table with one row for each value. For example, the entry in the pubchem database contains 10 synonyms for the SMILES C1(C(C(C(C(C10)0)0P(=0)(0)0)0)0)0 as shown in Table 13.1. Each of these should be entered as a separate row in a table of names along with a column containing the compound id. A simple table of this type would be created using the following SQL. [Pg.158]

Today, I have turned my habit around. When I have a set of chemical structures or data files, my first task is to organize them in a relational database. After all, the tools I now use are designed to read and write tables in a database. Rather than creating folders to keep project files, I create a schema of tables with rows holding chemical structures and data imported from the files. For example, the PubChem project provides information on millions of compounds in the form of hundreds of chemical structure files and associated experimental data files. While PubChem provides excellent Web tools to search this data, for local use I developed a schema to hold the structures and data in related tables. One possible schema for this is shown in Chapter 6 of this book. [Pg.243]

Beyond a summary description, one would like to view, analyze, and display the actual bioassay data. PubChem provides an integrated suite of tools, each presented as an individual tab, for this purpose. One would use the bioactivity summary tool to, at a glance, be able to examine an overview of the bioassays tested for a list of substances or compounds. To be able to subset and analyze substances or compounds tested in a set of bioassays, one would use the structure-activity analysis tool. To view the actual bioassay outcomes, one would use the data table tool. [Pg.232]

In the following section, we present an exploratory investigation into the use of HR MS data to determine the correct molecular formula of measured compounds. Ten organic compounds chosen with masses ranging between 132 and 1218 Da are listed in Table 8.20 together with the PubChem [218] Compound Identifier (CID), molecular formula ji and nominal mass nip. Compounds were obtained fi om Sigma-Aldrich (St. Louis, MO, USA). All compound solutions were prepared in MeOH/Hj 0 (1 1). [Pg.379]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...