PubChem Compound substance

Find chemical structures of small organic molecules and information on their biological activities in the three new Entrez PubChem databases Compound, Substance, and SioAssay. Try PubChem Structure Search" to query the databases with a structure. More... [Pg.495]

D Domains PubChem BioAssay PubChem Compound PubChem Substance Gene LocusLink UniGene HomoloGene... [Pg.498]

PubChem is organized as three linked databases within the NC8I s Entrez information retrieval system. These are PubChem Substance. PubChem Compound, and PubChem BioAssay. Pubchem also provides a fast chemical structure similarity search tool. More information about using each component database may be found using the links above. [Pg.206]

The U.S. National Institutes of Health PubChem project contains information on millions of chemical compounds.1 The data are divided into three main sections. PubChem Substance contains structures supplied by depositors. PubChem Compound contains unique structures with computed properties. PubChem BioAssay contains bioactivity assay results supplied by depositors. The data in these three sections are recorded independently, yet there are chemical relationships among these sections. For example, information available as a PubChem BioAssay is associated with a particular substance for which the data were collected. A substance may be a single compound or a mixture of several compounds. [Pg.53]

The column pubchem.substance.cid associations is taken directly from the sdf files supplied by PubChem. It has all the necessary information, but it is not in a proper form for a relation between pubchem.substance and pubchem.compounds. This is because too much information has been crammed into this column. For example, the cid associations for substance id 22 contains the data "449653 1449655 2 6540406 2". This means that there are three compound ids associated with this substance id. In other words, there is a many-to-many relationship between compounds and substances. While it would be possible to parse the cid associations column when the compound id is needed, it is better to have a clear relationship between substance ids and compounds ids. It is better because it enforces and preserves the relational integrity (or referential integrity) between these data. It also makes selecting data from all three data sources quicker and easier. [Pg.59]

There are several approaches to creating the relation between compound and substance. One is to create an integer column, say, pubchem. substance.cid that would contain only the primary compound id from the column cid associations. This column becomes a foreign key related to the pubchem. compound. c id column. This would form a proper relation between the tables, but would neglect the secondary cid associations. If those are of no interest, this approach is an excellent choice. [Pg.59]

The proper way to create a relation between pubchem.substance, substance id and pubchem.compound.cid is to create a new table that acts as an intermediary. This is a typical approach to handling many-to-many relationships. This table must include a column for the compound id and a column for the substance id. There can be as many rows... [Pg.59]

Figure 6.4 Entity-relationship diagram for all three sets of PubChem data, showing primary and foreign keys relating compounds, substances, and biological assay data.

PubChem is organized as three distinct databases PubChem Substance, PubChem Compound, and PubChem BioAssay. PubChem Substance contains descriptions of chemical samples, provided by dafa deposifors, and links to information on their biological activities. The description includes PubChem Compound identifiers in cases where the chemical structures of compounds in fhe sample are known. Links providing information on biological activity include those to PubMed [8] citations, protein 3-D structures [9], links to contributor websites, and to biological testing results available in PubChem BioAssay. [Pg.218]

PubChem Compound contains the unique chemical structure content of PubChem Substance. Compounds may be searched by computed chemical properties and are pre-clustered by structure comparison into identity and similarity groups. Whenever possible, compounds are linked via PubChem Substance to information on their biological activities. [Pg.219]

Contributed substance descriptions that do not include a chemical structure or that fail the PubChem chemical structure standardization procedure do not enter or have links to the PubChem Compound database. Prior to analysis or any modification of chemical structure input, care is taken to preserve the original structure description. The result of the normalization methodology employed is a uniform representation of the chemical structure content contained within the PubChem Substance database. [Pg.220]

The fundamental relationships between the three PubChem databases are straightforward. PubChem Substance identifiers (SIDs) relate to PubChem Compound... [Pg.220]

After working with PubChem to achieve a particular subset for a query of inferesf, it is often important for a user to export resulting substance or compound records from PubChem for further local analysis. The structure download tool prepares PubChem Substance or PubChem Compound records as an export from Enfrez in a number of formats. While all PubChem data is available on the PubChem FTP site (via the URL ftp //ftp.ncbi.nlm.nih.gov/pubchem/), being able to interact with a user-selected subset is substantially more convenient. The structure download tool may be directly accessed using the URL ... [Pg.231]

The primary eUtil tools of most interest to PubChem users are eSearch, eFetch, ePost, eLink, eHistory, and einfo. eSearch performs an Entrez search, with the same query syntax as web-based Entrez queries (e.g., to query PubChem Compound for the chemical name "aspirin"). eFetch returns an ID list from a prior search (e.g., the list of PubChem Compound identifiers (CIDs) from the aforementioned query of "aspirin"). ePost creates a new ID list by upload of a list of identifiers (e.g., substance identifiers (SIDs)). eLink follows a given link type to create a new ID list from an existing one (e.g., to find all PubChem BioAssay identifiers (AIDs) associated with a list of SIDs). eHistory returns information on current Entrez History entries, einfo lists available Entrez indices and links for a given database. [Pg.236]

PubChem A chemical database is a database specifically designed to store chemical informatioa Chemical stmctures are traditionally represented using hnes indicating chemical bonds between atoms and drawn on paper (2D stmctural formulae). Various chemical databases are available on the Internet which are free for all. Large chemical databases are expected to handle the storage and searching of information on millions of molecules. PubChem is one of the free chemical databases which is developed by the National Center for Biotechnology Information (NCBI). More than 24 millions of compound stmctures and descriptive datasets can be freely downloaded from PubChem. PubChem is a user-friendly database, we can search the compounds by compound name/key word, and we can also search the compound by chemical properties. We can download the compounds in SDF format which is the standard one for various stractural viewers. PubChem has three components, namely PubChem Compounds, PubChem Substances, and PubChem BioAssay described below. [Pg.77]

PubChem Compounds The PubChem Compounds Database contains validated chemical depiction information provided to describe substances in PubChem Substance. Stracmres stored within PubChem Compounds are pre-clustered and cross-referenced by identity and similarity groups. We can search unique chemical stmctures using names, synonyms, or keywords. Links to available biological properly information are also provided for each compound. [Pg.77]

Only recently, initiatives have been started to create freely available data sources such as PubChem,36 ChEBI,37 DrugBank,38 and HMDB,39 to mention some of them. These public databases collect publicly available information on compounds, their structures, their physical formulations (e.g., PubChem Substance), their targets, and their effects on biological processes. However, these databases are far from covering the entire spectrum of chemical information that can be linked to biology and pharmacology. [Pg.128]

As an essential component of NIH s Molecular Libraries Roadmap Initiative, PubChem is the largest chemical database in the public domain. As of October 2007 it contains 19 600000 substance records for the Substance database and 10 900 000 unique compound records for the Compound database, with links to bioassay description, literature, references, and assay data for each entry. Its BioAssay Database provides searchable descriptions of nearly 600 bioassays, including descriptions of the conditions and readouts specific to a screening protocol. [Pg.297]

From the examples in the previous section, it is clear how the substance id relates pubchem.substance to biological assay data and how substance data can be selected using the substance id. How can the compound table be used to select compound data for substances appearing in one of the biological assay data tables In other words, how is the... [Pg.58]

A critical concept for the advanced PubChem user is that of combining and transforming sets of identifiers between the three PubChem databases, based on the above identifier relationships. For instance, there is a many-to-one relationship between SIDs and "standardized" CID, as more than one Substance depositor may have supplied the chemical structure that standardizes to a given CID. (In fact, even within a particular depositor s records, there may be redundant structures because of different sample origins, tautomeric forms, etc.). Also, the perceptive reader will notice there is not a direct relationship between Bio Assay (AID) and Compound (CID) identifiers. To discover assays linked to a CID, there is an expansion of that CID to all SIDs for which that CID is the standardized form AIDs can be associated with CIDs linked to any of these SIDs. [Pg.221]

Many of the PubChem tools perform such transformations of the ID space implicitly, such as assay tools that work with sets of CIDs, or Entrez searches of CID chemical property indices in PubChem Substance, like lUPAC name, that actually come from standardized compounds. It can be important to understand these implicit relationships when navigating through PubChem, especially when searching and analyzing records across multiple databases. [Pg.221]

An index is a piece of informafion fied fo individual records and matched directly to a user s query in an Entrez search. Each index consists of text, numeric, or date values. Each Entrez database has its own set of indices. These indices are named according fo fhe fype of informafion fhey confain, for example, the indices "lU-PACName" or "MolecularWeight" in PubChem Substance and Compound. Some indices may have multiple values for each record. For example, the index "Synonym" corresponds to chemical or common names of a substance, any number of which may be supplied by fhe deposifor. [Pg.223]

Beyond a summary description, one would like to view, analyze, and display the actual bioassay data. PubChem provides an integrated suite of tools, each presented as an individual tab, for this purpose. One would use the bioactivity summary tool to, at a glance, be able to examine an overview of the bioassays tested for a list of substances or compounds. To be able to subset and analyze substances or compounds tested in a set of bioassays, one would use the structure-activity analysis tool. To view the actual bioassay outcomes, one would use the data table tool. [Pg.232]

BioActivity summary provides a set of functions that allows one to revise the substance/compound and assay sets. For example, one may focus only on a subset of compounds that are active in one or more of the selected assays using the Compound I Select Active link, or explore additional screen sets where the given compounds were considered active using the Assay I Add Active link. PubChem provides multiple access points for this service. For compounds or substances tested found in Entrez, one can launch this service for each individual record using the direct "BioActivity Analysis" link, or, for all of the records from an Entrez search, through the launching point at the "Tool" area. [Pg.233]

Pubchem [32] Pubchem is a database maintained by the National Center for Biotechnology Information (NCBI), which is part of the US National Institutes of Health (NIH). PubChem can be freely accessed through a web user interface or is downloadable by File Transfer Protocol (FTP) at http //pubchem.ncbi.nlm.nih.gov. Pubchem is organized into three main parts substances ( 126 million entries of compound mixtures, extracts, etc.), pure compounds (48 million unique structures), and bioassays (-740,000 records). Users can search the database by name, Pub-Chem identifiers, structures of molecules to retrieve small molecules, calculated physicochemical data, and experimental biological data. Structure-activity relationship tools are available for further analysis of the extracted results. [Pg.114]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...