Molecular structure SMILES identifier

The hash origin of InChIKey also means that it is not convertible back to the original InChl or molecular structure, because for each InChIKey there is an unlimited number of possible matching input values. Although this might seem to be a drawback of the format, it is simply the price of the fixed length of the identifier. When a readable identifier with no possible collisions is needed, InChl (or canonical SMILES) should be used. [Pg.91]

Recently, a universal string representation method was proposed and published. The International Chemical Identifier,17 or InChl , is a definition and set of methods maintained by the International Union of Pure and Applied Chemistry. It promises to provide a truly universal character string representation of molecular structure. Whether it will replace the widely used SMILES is yet to be seen. [Pg.82]

The Simplified Molecular Input Line Entry System (SMILES) strings ofthe structures in the data set were canonicalized, the charges were standardized, the additional fragments and salts were removed, and duplicate or invalid structures were identified and removed using the KNIME workflow environment [29]. Further data quality control was performed by the Eli Lilly AD ME group. [Pg.109]

Although many systematic indices (e.g.. Lipid MAPS, Chemical Entries of Biological Interest (ChEBI), lUPAC International Chemical Identifiers (InChl), simplified molecular-input line entry system (SMILES)) were developed to list the chemical compounds, these indices (identifiers) can only be meaningful if the compound is totally identified. However, in practice, lipidomics analysis in many cases can only provide partial identification of lipid molecular structures at the current development of technology. Moreover, different lipidomics approaches provide different levels of stmctural identification of lipid species. Therefore, how to clearly express and report the information about the levels of identification for the structures of lipid species (which can be derived fi om MS analysis) is not only helpful for the readers but also important for bioinformatics and data communication. To this end, the analysis by shotgun lipidomics could be used as a typical example to explain these levels. Similar phenomena also exist in the analysis of lipid species employing LC-MS-based approaches. [Pg.135]

Chemical identity may appear to present a trivial problem, but most chemicals have several names, and subtle differences between isomers (e.g., cis and trans) may be ignored. The most commonly accepted identifiers are the IUPAC name and the Chemical Abstracts System (CAS) number. More recently, methods have been sought of expressing the structure in line notation form so that computer entry of a series of symbols can be used to define a three-dimensional structure. For environmental purposes the SMILES (Simplified Molecular Identification and Line Entry System, Anderson et al. 1987) is favored, but the Wismesser Line Notation is also quite widely used. [Pg.3]

The most commonly used identifiers today include line notation identifiers (e.g., Simplified Molecular Input Line Entry System [SMILES] and International Chemical Identifier [InChls]), tabular identifiers (e.g., Molfile and Structure Definition [SD] file types), and portable mark-up language identifiers (e.g., Chemical Markup Language [CML] and FlexMol). Each identifier has its strengths and weaknesses as detailed in Chapter 5. Chapters 5 and 6 provide enough information to guide researchers in choosing the most appropriate formats for their individual use. [Pg.14]

This creates a table of four columns in the schema achemcompany. The column named smiles is intended to store the SMILES representation of a chemical structure, the id column will store an integer identifier to be used for joining other tables, the column mw will store the molecular weight with a precision of 2 digits to the right of the decimal point, and the column named added will record when this structure was entered into the table. As defined above, any character string could be entered into the smiles column, any integer into the id column, and any valid... [Pg.22]

The use of Simplified Molecular Input Line Entry System (SMILES) as a string representation of chemical structure makes possible much of what has been discussed in earlier chapters of this book. A chemical reaction could be represented as a collection of SMILES, some identified as reactants and some as products. It is possible to define a table to do this, or perhaps use some arrays of character data types, but a syntax extension of standard SMILES allows reaction to be expressed easily. SMIRKS is an extension of SMILES and SMiles ARbitrary Target Specification (SMARTS). It is used to represent chemical transformations. SMIRKS can also be used in a transformation function to combine SMILES reactants to produce SMILES products. [Pg.99]

The supported query input formats for fhe structure search tool are SMILES, SMARTS [17], InChl, CID (PubChem Compound identifier), molecular formula, and SDF [18]. There is also an online JavaScript-based chemical structure sketcher through which a query may be manually drawn, edited, or imported. The sketcher is compatible with modem web browsers and does not require special software to be downloaded or installed. [Pg.230]

The SMILES notation is a means by which certain chemical structures can be described using a series of simple letters and numbers expressed in linear fashion, even for complex cyclic structures. This approach is particularly useful as input for computer models when chemical names and CASRN are unknown. As mentioned above, SMILES is an important tool in hazard and exposure modeling used in EPA s voluntary Sustainable Futures Program [1]. It can also be used to identify substances under REACH, and examples are shown in the nomenclature Technical Guidance Document (TGD) along with molecular and structural formulas. [Pg.28]

The UM-BBD-PPS predicts all possible metabolic pathways based on its metabolic rules [10]. The user draws a structure or enters a SMILES (Simplified Molecular Input Line Entry System) string representing the compound of interest. The PPS identifies funcfional groups and matches them to appropriate rules. First-round metabolites are produced. Any first-round metabolite can be selected, and used to match another set of rules. When no metabolic rules are matched, the cycle is stopped. This could indicate a non-metabolizable compound or, in other cases, it is an endpoint metabolite that is a common intermediary compound. [Pg.14]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...