PubChem Compound

Figure 8.1 Example structure taken from the PubChem compound database) IUPAC name 4- (2S)-2-acetamido-2-[[(2S)-10-carbamoyl-9-(cyclohexylmethoxy)-2-bicyclo[5.4.0]undeca-7,9,11-trienyl]carbamoyl]ethyl -2-formylbenzoic acid, PubChem CID 9959891.

D Domains PubChem BioAssay PubChem Compound PubChem Substance Gene LocusLink UniGene HomoloGene... [Pg.498]

Xx Pubchem Compound Search unique chemical structures using names. [Pg.206]

PubChem is organized as three linked databases within the NC8I s Entrez information retrieval system. These are PubChem Substance. PubChem Compound, and PubChem BioAssay. Pubchem also provides a fast chemical structure similarity search tool. More information about using each component database may be found using the links above. [Pg.206]

The U.S. National Institutes of Health PubChem project contains information on millions of chemical compounds.1 The data are divided into three main sections. PubChem Substance contains structures supplied by depositors. PubChem Compound contains unique structures with computed properties. PubChem BioAssay contains bioactivity assay results supplied by depositors. The data in these three sections are recorded independently, yet there are chemical relationships among these sections. For example, information available as a PubChem BioAssay is associated with a particular substance for which the data were collected. A substance may be a single compound or a mixture of several compounds. [Pg.53]

The third set of files from the PubChem repository describes chemical compounds. These are distributed as sdf files and are identified using a unique compound id. There are also multiple properties associated with each compound. Using the sdf2sql file utility described above, the table pubchem.compound is created. The compound table can then be used to locate compounds by searching any of the columns of data for example,... [Pg.58]

Select From pubchem.compound Where iupac name Like %aldehyde% And heavy atom count < 20 ... [Pg.58]

The column pubchem.substance.cid associations is taken directly from the sdf files supplied by PubChem. It has all the necessary information, but it is not in a proper form for a relation between pubchem.substance and pubchem.compounds. This is because too much information has been crammed into this column. For example, the cid associations for substance id 22 contains the data "449653 1449655 2 6540406 2". This means that there are three compound ids associated with this substance id. In other words, there is a many-to-many relationship between compounds and substances. While it would be possible to parse the cid associations column when the compound id is needed, it is better to have a clear relationship between substance ids and compounds ids. It is better because it enforces and preserves the relational integrity (or referential integrity) between these data. It also makes selecting data from all three data sources quicker and easier. [Pg.59]

There are several approaches to creating the relation between compound and substance. One is to create an integer column, say, pubchem. substance.cid that would contain only the primary compound id from the column cid associations. This column becomes a foreign key related to the pubchem. compound. c id column. This would form a proper relation between the tables, but would neglect the secondary cid associations. If those are of no interest, this approach is an excellent choice. [Pg.59]

Another approach is to create multiple columns one for the primary compound id and others for the secondary, tertiary, etc. compound ids. Each of these integer columns could serve as a foreign key and form a proper relation to the pubchem.compound.cid column. This approach is not recommended because the maximum number of compound ids in the cid associations column is not known and could increase as more data is added. In addition, the type of association, primary, secondary, etc. would have to be neglected, stored in another column, or somehow encoded in the new column names. This approach has too many drawbacks to be acceptable. [Pg.59]

The proper way to create a relation between pubchem.substance, substance id and pubchem.compound.cid is to create a new table that acts as an intermediary. This is a typical approach to handling many-to-many relationships. This table must include a column for the compound id and a column for the substance id. There can be as many rows... [Pg.59]

PubChem Compound. http //www.ncbi.nlm.nih.gov/sites/entrez db= pccompound (accessed April 18, 2008). [Pg.70]

PubChem is organized as three distinct databases PubChem Substance, PubChem Compound, and PubChem BioAssay. PubChem Substance contains descriptions of chemical samples, provided by dafa deposifors, and links to information on their biological activities. The description includes PubChem Compound identifiers in cases where the chemical structures of compounds in fhe sample are known. Links providing information on biological activity include those to PubMed [8] citations, protein 3-D structures [9], links to contributor websites, and to biological testing results available in PubChem BioAssay. [Pg.218]

PubChem Compound contains the unique chemical structure content of PubChem Substance. Compounds may be searched by computed chemical properties and are pre-clustered by structure comparison into identity and similarity groups. Whenever possible, compounds are linked via PubChem Substance to information on their biological activities. [Pg.219]

Contributed substance descriptions that do not include a chemical structure or that fail the PubChem chemical structure standardization procedure do not enter or have links to the PubChem Compound database. Prior to analysis or any modification of chemical structure input, care is taken to preserve the original structure description. The result of the normalization methodology employed is a uniform representation of the chemical structure content contained within the PubChem Substance database. [Pg.220]

The fundamental relationships between the three PubChem databases are straightforward. PubChem Substance identifiers (SIDs) relate to PubChem Compound... [Pg.220]

Figure 12.3 shows the result of searching for the word "aspirin" in Entrez s PubChem Compound database. This default display of multiple records in Entrez is referred to as a document summary (DocSum) report and is common to all Entrez databases. At the top are the common Entrez controls (database selection and search input box) and tabs for other Entrez tools (e.g.. Limits, History, etc.) some of which are described in more detail below. Note that the format of this page evolves over time, but the basic controls remain the same. Moving down the Doc-Sum page, the next section contains controls to change the display type the default is "Summary" (as shown). Each Entrez database has report styles that vary in type and detail of information shown, the overall format is the same—a list of records. [Pg.222]

FIGURE 12.3 Partial view of an Entrez document summary (DocSum) report page for the PubChem Compound query aspirin . [Pg.222]

By default, when one enters a simple query in the Entrez search interface, that query is matched against all indices in that database. For example, if one searches "aspirin" in PubChem Compound, Entrez will report back any records with an index that contain "aspirin" as (any word in) a synonym, a depositor comment, etc. This is why a text search for "aspirin" also currently brings up the structure of acetaminophen, considering one of the names supplied by a depositor for acetaminophen is "Aspirin-Free Anacin," and so an unrestricted search for "aspirin" will match this record, as well. [Pg.224]

Multiple indices may be searched simultaneously using Entrez s Boolean operators. For example, a query in PubChem Compound of "Br[Element] AND l[CovalentUnitCount]" will find all chemical structures containing the element bromine and that are not part of a mixture. Please note that Entrez Boolean operators are capitalized (e.g., "AND," "OR," and "NOT"). [Pg.224]

By default, Entrez removes whitespace, some punctuation, and other special characters from the query string. To make sure Entrez treats the query as a single word or phrase, despite special characters, simply enclose the query in quotation marks. For example, to search the PubChem Compound database using the InChl string of aspirin, one would use ""InChI=l/C9H8O4/cl-6(10)13-8-5-3-2-4-7(8)9(ll)12/h2-5H,lH3,(H,ll,12)/f/hllH"[InChI]" as the query. [Pg.224]

There are some special filters that are not link-based. The query "all[Filterj" simply returns every record in a given Entrez database. A database may have other special filters defined, such as the "has pharm" filter in PubChem Compound that indicates whether a given chemical structure has a known pharmacological action. [Pg.225]

Entrez history is used heavily by PubChem tools (which are not a part of Entrez) so results of user searches can be used as a subset for further manipulation. Eor example, the chemical structure download service (described below) reads Entrez history items, so one can generate an SDF file containing just those compounds found in a PubChem Compound Entrez result set. For example, the BioAssay tools (also described below) make frequent use of Entrez history, so that structure queries can be used to subset assay results in a chemical structure analog series. [Pg.226]

The PubChem structure search tool enables one to query and subset PubChem Compound by a variety of chemical structure search types and optional filters. The chemical structure search service may be directly accessed using the URL ... [Pg.229]

The supported query input formats for fhe structure search tool are SMILES, SMARTS [17], InChl, CID (PubChem Compound identifier), molecular formula, and SDF [18]. There is also an online JavaScript-based chemical structure sketcher through which a query may be manually drawn, edited, or imported. The sketcher is compatible with modem web browsers and does not require special software to be downloaded or installed. [Pg.230]

After working with PubChem to achieve a particular subset for a query of inferesf, it is often important for a user to export resulting substance or compound records from PubChem for further local analysis. The structure download tool prepares PubChem Substance or PubChem Compound records as an export from Enfrez in a number of formats. While all PubChem data is available on the PubChem FTP site (via the URL ftp //ftp.ncbi.nlm.nih.gov/pubchem/), being able to interact with a user-selected subset is substantially more convenient. The structure download tool may be directly accessed using the URL ... [Pg.231]

The primary eUtil tools of most interest to PubChem users are eSearch, eFetch, ePost, eLink, eHistory, and einfo. eSearch performs an Entrez search, with the same query syntax as web-based Entrez queries (e.g., to query PubChem Compound for the chemical name "aspirin"). eFetch returns an ID list from a prior search (e.g., the list of PubChem Compound identifiers (CIDs) from the aforementioned query of "aspirin"). ePost creates a new ID list by upload of a list of identifiers (e.g., substance identifiers (SIDs)). eLink follows a given link type to create a new ID list from an existing one (e.g., to find all PubChem BioAssay identifiers (AIDs) associated with a list of SIDs). eHistory returns information on current Entrez History entries, einfo lists available Entrez indices and links for a given database. [Pg.236]

PubChem A chemical database is a database specifically designed to store chemical informatioa Chemical stmctures are traditionally represented using hnes indicating chemical bonds between atoms and drawn on paper (2D stmctural formulae). Various chemical databases are available on the Internet which are free for all. Large chemical databases are expected to handle the storage and searching of information on millions of molecules. PubChem is one of the free chemical databases which is developed by the National Center for Biotechnology Information (NCBI). More than 24 millions of compound stmctures and descriptive datasets can be freely downloaded from PubChem. PubChem is a user-friendly database, we can search the compounds by compound name/key word, and we can also search the compound by chemical properties. We can download the compounds in SDF format which is the standard one for various stractural viewers. PubChem has three components, namely PubChem Compounds, PubChem Substances, and PubChem BioAssay described below. [Pg.77]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...