Structure InChl identifier

Elements can specify data or information in two ways. First, data can be contained between the start and end of any particular element,such as and numeric data, as in the example above, which uses both the CAS Registry Number, a unique identifier assigned to chemical structures by CAS (see Appendix 12-3), and a unique canonical molecule identifier known as InChl (International Chemical Identifier) (for more on InChl, see Appendix 8-1). [Pg.91]

Integration of select Internet resources, such as the public chemical databases mentioned above, provides a very practical approach to structure searching the Internet and internal resources (Dong et al. 2007). Chapter 8 elaborates on this concept. As summarized in Chapter 2, another facet of chemical structure mining involves finding information within full text documents that do not traditionally contain identifiers like InChl or SMILE strings. Chapter 5 contains an in-depth discussion of these identifiers. [Pg.6]

Google searches on the InChl chemical identifiers for benzene (left) and aspirin (right) demonstrate the potential of structure searching... [Pg.7]

The most commonly used identifiers today include line notation identifiers (e.g., Simplified Molecular Input Line Entry System [SMILES] and International Chemical Identifier [InChls]), tabular identifiers (e.g., Molfile and Structure Definition [SD] file types), and portable mark-up language identifiers (e.g., Chemical Markup Language [CML] and FlexMol). Each identifier has its strengths and weaknesses as detailed in Chapter 5. Chapters 5 and 6 provide enough information to guide researchers in choosing the most appropriate formats for their individual use. [Pg.14]

Another important feature of InChl is its layered structure. Unlike in SMILES, where all data related to one atom are stored in one place, in InChl different properties of the structure are encoded in different parts of the identifier. This organization of the data has one very important advantage molecules with the same basic structure that differ only in some minor property, such as in stereochemistry or isotopic composition, have the same InChl, with only the exception of the corresponding layer. This makes it possible not only to compare two InChls to find if they represent exactly the same structure, but to use a more intelligent comparison of two InChl strings to reveal molecules with the same basic structure that differ only in some detail. It is then up to the user to decide which deviations in the InChl are significant for his or her purpose and which are not. [Pg.87]

The features of InChl make it usable as a unique identifier of a molecular structure as well as a format for data storage (with limitations implied by its design, such as the absence of atom coordinates and charge localization, or delocalization of hydrogen atoms, which makes it impossible to distinguish between individual tautomeric forms without the presence of the fixed hydrogen layer). [Pg.88]

The hash origin of InChIKey also means that it is not convertible back to the original InChl or molecular structure, because for each InChIKey there is an unlimited number of possible matching input values. Although this might seem to be a drawback of the format, it is simply the price of the fixed length of the identifier. When a readable identifier with no possible collisions is needed, InChl (or canonical SMILES) should be used. [Pg.91]

Project Prospect is at the time of writing the first real application of semantic enhancement to primary research literature. By using open standards such as the InChl and the Open Biomedical Ontologies, the aim was to remove the ambiguity of searching (this remains to be well integrated with the search engines), but the information is now held in a structured form that can make this happen. In this first implementation, the information is used to add a layer of additional information (visualizations and definitions) and identify relationships between our own related HTML articles. [Pg.159]

ChemSpider was launched in 2007. It is an open-access service in which constituent databases, the largest of which is Web of Science, are linked on a free-access basis, and which uses algorithms to identify and extract chemical names from documents and web pages and convert them to structures and InChl and SMILES identifiers. Access to the core service is free, but the user may be routed to charging component databases. At launch, ChemSpider contained 21 million compounds. At the time of writing, it was too early to assess the success of the service. It was bought by the Royal Society of Chemistry in 2009. [Pg.23]

Recently, a universal string representation method was proposed and published. The International Chemical Identifier,17 or InChl , is a definition and set of methods maintained by the International Union of Pure and Applied Chemistry. It promises to provide a truly universal character string representation of molecular structure. Whether it will replace the widely used SMILES is yet to be seen. [Pg.82]

The conversion of structural information to the Identifier is based on a set of lUPAC structure conventions, and rules for normalization and canonicalization (conversion to a single, predictable sequence) of an input structure representation. The resulting InChl is simply a series of characters that serve to uniquely identify the structure from which it was derived. The InChl uses a layered format to represent all available structural information relevant to compound identity. InChl layers are listed below. Each layer in an InChl representation contains a specific type of structural information. These layers, automatically extracted from the input structure, are designed so that each successive layer adds additional detail to the Identifier. The specific layers generated depend on the level of structural detail available and whether or not allowance is made for tautomerism. Of course, any ambiguities or uncertainties in the original structure wiU remain in the InChl. [Pg.79]

Representation of Chemical Structures with the lUPAC International Chemical Identifier (InChl)... [Pg.80]

The supported query input formats for fhe structure search tool are SMILES, SMARTS [17], InChl, CID (PubChem Compound identifier), molecular formula, and SDF [18]. There is also an online JavaScript-based chemical structure sketcher through which a query may be manually drawn, edited, or imported. The sketcher is compatible with modem web browsers and does not require special software to be downloaded or installed. [Pg.230]

The lUPAC International Chemical Identifier (InChl) is a relatively recent arrival on the chemical structure representation scene, and combines some of the characteristics of connection table, line notation and registry number identifier. A comprehensive technical description has yet to be published, though substantial details are given in the documentation which accompanies the open-source software provided by lUPAC, and a number of authors have provided good overviews. " ... [Pg.171]

The lUPAC International Chemical Identifier (InChl) [18] was invented in 2001. There are programs (World Wide Molecular Matrix) that can create an InChl from the molecular structure, e.g., via a MOL file [19]. For example, for benzene, the InChl is... [Pg.405]

Although many systematic indices (e.g.. Lipid MAPS, Chemical Entries of Biological Interest (ChEBI), lUPAC International Chemical Identifiers (InChl), simplified molecular-input line entry system (SMILES)) were developed to list the chemical compounds, these indices (identifiers) can only be meaningful if the compound is totally identified. However, in practice, lipidomics analysis in many cases can only provide partial identification of lipid molecular structures at the current development of technology. Moreover, different lipidomics approaches provide different levels of stmctural identification of lipid species. Therefore, how to clearly express and report the information about the levels of identification for the structures of lipid species (which can be derived fi om MS analysis) is not only helpful for the readers but also important for bioinformatics and data communication. To this end, the analysis by shotgun lipidomics could be used as a typical example to explain these levels. Similar phenomena also exist in the analysis of lipid species employing LC-MS-based approaches. [Pg.135]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...