PubChem Substance

D Domains PubChem BioAssay PubChem Compound PubChem Substance Gene LocusLink UniGene HomoloGene... [Pg.498]

PubChem is organized as three linked databases within the NC8I s Entrez information retrieval system. These are PubChem Substance. PubChem Compound, and PubChem BioAssay. Pubchem also provides a fast chemical structure similarity search tool. More information about using each component database may be found using the links above. [Pg.206]

Only recently, initiatives have been started to create freely available data sources such as PubChem,36 ChEBI,37 DrugBank,38 and HMDB,39 to mention some of them. These public databases collect publicly available information on compounds, their structures, their physical formulations (e.g., PubChem Substance), their targets, and their effects on biological processes. However, these databases are far from covering the entire spectrum of chemical information that can be linked to biology and pharmacology. [Pg.128]

The U.S. National Institutes of Health PubChem project contains information on millions of chemical compounds.1 The data are divided into three main sections. PubChem Substance contains structures supplied by depositors. PubChem Compound contains unique structures with computed properties. PubChem BioAssay contains bioactivity assay results supplied by depositors. The data in these three sections are recorded independently, yet there are chemical relationships among these sections. For example, information available as a PubChem BioAssay is associated with a particular substance for which the data were collected. A substance may be a single compound or a mixture of several compounds. [Pg.53]

Notice the use of the Join keyword and the additional table name pubchem.substance in the From clause. This is necessary because data from this table is being selected. The additional columns selected are ext datasource name and substance.ext datasource regid in the Select clause. Any columns of interest in the substance table could be selected. Note that since there is a column named ext datasource id in both tables, it is necessary to specify that the column substance.ext datasource regid is desired. Finally, the clause On nci h23.sid= substance, substance id indicates that these columns are related to each other and must be used in the Join. [Pg.57]

Figure 6.3 Entity-relationship diagram for pubchem.substance and pubhchem. nci h23 tables.

Figure 6.3 shows the relationship between the pubhcem.nci h23 and pubchem.substance tables in the form of an entity-relationship diagram (ERD). The primary key substance.substance id and the foreign key nci h23.sid are indicated and imply their use in an On clause when these two tables are joined. [Pg.58]

From the examples in the previous section, it is clear how the substance id relates pubchem.substance to biological assay data and how substance data can be selected using the substance id. How can the compound table be used to select compound data for substances appearing in one of the biological assay data tables In other words, how is the... [Pg.58]

The column pubchem.substance.cid associations is taken directly from the sdf files supplied by PubChem. It has all the necessary information, but it is not in a proper form for a relation between pubchem.substance and pubchem.compounds. This is because too much information has been crammed into this column. For example, the cid associations for substance id 22 contains the data "449653 1449655 2 6540406 2". This means that there are three compound ids associated with this substance id. In other words, there is a many-to-many relationship between compounds and substances. While it would be possible to parse the cid associations column when the compound id is needed, it is better to have a clear relationship between substance ids and compounds ids. It is better because it enforces and preserves the relational integrity (or referential integrity) between these data. It also makes selecting data from all three data sources quicker and easier. [Pg.59]

There are several approaches to creating the relation between compound and substance. One is to create an integer column, say, pubchem. substance.cid that would contain only the primary compound id from the column cid associations. This column becomes a foreign key related to the pubchem. compound. c id column. This would form a proper relation between the tables, but would neglect the secondary cid associations. If those are of no interest, this approach is an excellent choice. [Pg.59]

The proper way to create a relation between pubchem.substance, substance id and pubchem.compound.cid is to create a new table that acts as an intermediary. This is a typical approach to handling many-to-many relationships. This table must include a column for the compound id and a column for the substance id. There can be as many rows... [Pg.59]

PubChem is organized as three distinct databases PubChem Substance, PubChem Compound, and PubChem BioAssay. PubChem Substance contains descriptions of chemical samples, provided by dafa deposifors, and links to information on their biological activities. The description includes PubChem Compound identifiers in cases where the chemical structures of compounds in fhe sample are known. Links providing information on biological activity include those to PubMed [8] citations, protein 3-D structures [9], links to contributor websites, and to biological testing results available in PubChem BioAssay. [Pg.218]

PubChem Compound contains the unique chemical structure content of PubChem Substance. Compounds may be searched by computed chemical properties and are pre-clustered by structure comparison into identity and similarity groups. Whenever possible, compounds are linked via PubChem Substance to information on their biological activities. [Pg.219]

Contributed substance descriptions that do not include a chemical structure or that fail the PubChem chemical structure standardization procedure do not enter or have links to the PubChem Compound database. Prior to analysis or any modification of chemical structure input, care is taken to preserve the original structure description. The result of the normalization methodology employed is a uniform representation of the chemical structure content contained within the PubChem Substance database. [Pg.220]

The fundamental relationships between the three PubChem databases are straightforward. PubChem Substance identifiers (SIDs) relate to PubChem Compound... [Pg.220]

Many of the PubChem tools perform such transformations of the ID space implicitly, such as assay tools that work with sets of CIDs, or Entrez searches of CID chemical property indices in PubChem Substance, like lUPAC name, that actually come from standardized compounds. It can be important to understand these implicit relationships when navigating through PubChem, especially when searching and analyzing records across multiple databases. [Pg.221]

An index is a piece of informafion fied fo individual records and matched directly to a user s query in an Entrez search. Each index consists of text, numeric, or date values. Each Entrez database has its own set of indices. These indices are named according fo fhe fype of informafion fhey confain, for example, the indices "lU-PACName" or "MolecularWeight" in PubChem Substance and Compound. Some indices may have multiple values for each record. For example, the index "Synonym" corresponds to chemical or common names of a substance, any number of which may be supplied by fhe deposifor. [Pg.223]

The Entrez DocSum reports serve a limited quantity of data to help navigate and subset records. Detailed information is provided by PubChem summary pages. Each record in an Entrez DocSum contains a link that leads to the more detailed information on a specific record. Typically these pages are reached through Entrez, but one can also navigate to them directly. For PubChem Substance SIDs, the summary page URL is of the form ... [Pg.227]

After working with PubChem to achieve a particular subset for a query of inferesf, it is often important for a user to export resulting substance or compound records from PubChem for further local analysis. The structure download tool prepares PubChem Substance or PubChem Compound records as an export from Enfrez in a number of formats. While all PubChem data is available on the PubChem FTP site (via the URL ftp //ftp.ncbi.nlm.nih.gov/pubchem/), being able to interact with a user-selected subset is substantially more convenient. The structure download tool may be directly accessed using the URL ... [Pg.231]

PubChem A chemical database is a database specifically designed to store chemical informatioa Chemical stmctures are traditionally represented using hnes indicating chemical bonds between atoms and drawn on paper (2D stmctural formulae). Various chemical databases are available on the Internet which are free for all. Large chemical databases are expected to handle the storage and searching of information on millions of molecules. PubChem is one of the free chemical databases which is developed by the National Center for Biotechnology Information (NCBI). More than 24 millions of compound stmctures and descriptive datasets can be freely downloaded from PubChem. PubChem is a user-friendly database, we can search the compounds by compound name/key word, and we can also search the compound by chemical properties. We can download the compounds in SDF format which is the standard one for various stractural viewers. PubChem has three components, namely PubChem Compounds, PubChem Substances, and PubChem BioAssay described below. [Pg.77]

PubChem Compounds The PubChem Compounds Database contains validated chemical depiction information provided to describe substances in PubChem Substance. Stracmres stored within PubChem Compounds are pre-clustered and cross-referenced by identity and similarity groups. We can search unique chemical stmctures using names, synonyms, or keywords. Links to available biological properly information are also provided for each compound. [Pg.77]

PubChem Substances The PubChem substance database contaius chemical stmctures, synonyms, registration IDs, description, related urls, and database cross-reference links to PubMed, protein 3D stracmres, and biological screening results. We can search deposited chemical substance records using names, synonyms, or keywords. Links are also provided to biological property information and depositor websites. [Pg.77]

PubChem BioAssay The PubChem BioAssay Database contains BioActiv-ity screens of chemical substances described in PubChem Substance. It provides searchable descriptions of each BioAssay, including descriptions of the conditions and readouts. We can search bioassay records using terms from the bioassay... [Pg.77]

Essential to this pilot phase was the creation of a chemical library the Molecular Libraries Small Molecule Repository, or MLSMR. This library, which is accessible by querying MLSMR in PubChem Substance, was subjected to MLSCN bioactivity screening between 2004 and 2008 for a total of 691 assays that were uploaded to PubChem. These included 242 primary, 402 confirmatory and 8 summary assays, which covered 171 targets and 29 phenotypic screens. These numbers continue to grow as pilot phase projects are completed. [Pg.14]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...