Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...

Articles Figures Tables About

SMILES Representation of Molecular Structure

SMILES (Simplified Molecular Input Line Entry System) was invented by Weininger5 to facilitate the representation and manipulation of molecular structures using computers. It uses standard atomic symbols to represent atoms and the symbols - for single bond, = for double bond, and for triple bond. Hydrogen atoms can be represented explicitly but are almost always represented implicitly using normal conventions of valence bond theory. Single bonds need not be explicitly written. For example, propane is C-C-C or simply CCC. Methylamine is CN, and C N is hydrogen cyanide. Propene is C=CC. For more complex structures with branched bonds, parentheses are used. For example, CC(C)0 is isopropyl alcohol, whereas CCCO is propanol. [Pg.72]

Notice that there are several ways in which SMILES could be written for the same structure, even the simplest ones. For example, hydrogen cyanide can be written as C N or N C, propene is either C=CC or CC=C. More complex structures can have three or many more SMILES that represent the same structure. If there were one standard way to write SMILES, then standard SQL text comparisons could be used to locate any particular structure. SMILES would become a uniquely spelled name for each unique structure. Canonical SMILES does just that. Using rules about which atoms should come before other atoms in the spelling of each SMILES, a unique name for each molecular structure can be provided.6 [Pg.72]

Once there is a unique, canonical SMILES available, this can be stored in a text column and a direct lookup for a specific structure can be done using the SQL = operator. If canonical SMILES is stored in a text column named cansmi, one can locate isopropyl alcohol using the SQL clause Where cansmi = CC(C) O. And because text data can be indexed in SQL, this lookup is extremely fast. In addition, SQL uniqueness constraints can be used to enforce data integrity when using canonical SMILES. [Pg.72]


There are many programs available to parse the various molecular structure file format. OpenBabel is an open-source program that can read many file formats and produce a SMILES representation of molecular structure. There are many other commercial products that can do this as well. In the following examples, the OpenBabel/plpythonu implementation of molfile parsing will be used. This was introduced in Chapter 10. The code to define the necessary functions is shown in the Appendix. [Pg.125]


See other pages where SMILES Representation of Molecular Structure is mentioned: [Pg.72]   


SEARCH



Molecular Structure of

Molecular structure SMILES representation

Molecular structure representation

Representation molecular

Structural representation

Structure representation

© 2024 chempedia.info