Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...

Articles Figures Tables About

Molecular structure SMILES representation

It is to be noted that the QSPR/QSAR analysis of nanosubstances based on elucidation of molecular structure by the molecular graph is ambiguous due to a large number of atoms involved in these molecular systems. Under such circumstances the chiral vector can be used as elucidation of structure of the carbon nanotubes (Toropov et al., 2007c). The SMILES-like representation information for nanomaterials is also able to provide reasonable good predictive models (Toropov and Leszczynski, 2006a). [Pg.338]

Chemical representation can be rule-based or descriptive. Here we will give a short description of two popular file formats for molecular structures, MOLfiles (9) and SMILES (10-13), to illustrate how molecules are represented in computer. SMILES is a rule-based format while MOLfile is a more descriptive one. [Pg.29]

To obtain a unique SMILES notation, computer programs such as the Toolkit include the CANGEN algorithm [1] which performs CANonicalization, resulting in unique enumeration of atoms, and then GENerates the unique SMILES notation for the canonical structure. In the case of pyridine, this is notation (III). Any molecular structure entered in the Toolkit is converted automatically into its unique representation. [Pg.182]

This creates a table of four columns in the schema achemcompany. The column named smiles is intended to store the SMILES representation of a chemical structure, the id column will store an integer identifier to be used for joining other tables, the column mw will store the molecular weight with a precision of 2 digits to the right of the decimal point, and the column named added will record when this structure was entered into the table. As defined above, any character string could be entered into the smiles column, any integer into the id column, and any valid... [Pg.22]

SMILES (Simplified Molecular Input Line Entry System) was invented by Weininger5 to facilitate the representation and manipulation of molecular structures using computers. It uses standard atomic symbols to represent atoms and the symbols - for single bond, = for double bond, and for triple bond. Hydrogen atoms can be represented explicitly but are almost always represented implicitly using normal conventions of valence bond theory. Single bonds need not be explicitly written. For example, propane is C-C-C or simply CCC. Methylamine is CN, and C N is hydrogen cyanide. Propene is C=CC. For more complex structures with branched bonds, parentheses are used. For example, CC(C)0 is isopropyl alcohol, whereas CCCO is propanol. [Pg.72]

Recently, a universal string representation method was proposed and published. The International Chemical Identifier,17 or InChl , is a definition and set of methods maintained by the International Union of Pure and Applied Chemistry. It promises to provide a truly universal character string representation of molecular structure. Whether it will replace the widely used SMILES is yet to be seen. [Pg.82]

This chapter focused primarily on SMILES and canonical SMILES. It is feasible and common to use SMILES as the internal representation of molecular structure. Using the SQL functions described in this chapter,... [Pg.83]

Another choice for the internal representation of molecular structure is a molfile. It would be possible to construct SQL functions like those described in this chapter that would operate on this type of data. One disadvantage of molfiles is their greater size compared with SMILES. One advantage is that it is possible to store atomic coordinates, which is not possible with SMILES. There are other molecular file formats, but these are substantially the same as a molfile, except perhaps for specific atom types that may be of use in some database applications. [Pg.84]

The external representation of molecular structure is a less rigorous definition. For example, there are many programs available that can convert to and from SMILES and molfiles. These can be used when a molfile (the external representation) needs to be imported as a SMILES (the internal representation) into the database. Similarly, a SMILES can be easily exported as a SMILES or converted to a molfile or other file format. It is useful to have these conversion functions as SQL extensions. [Pg.84]

One way to do a quick molecular formula comparison is to store the molecular formula not as a string representation, such as C60, but as a column of integers. Each row in a table of molecular structures would contain SMILES, but the table would also have additional columns containing the count of each atom type. These columns could be indexed to speed up the molecular formula comparison. The SQL used to search for structures containing phenol becomes as follows ... [Pg.92]

It would be possible to create tables using columns to store the atomic symbols and bond information found in molecular structure files, reflecting the column style format of the file itself. Instead, a SMILES representation of this valence bond information is preferred. SMILES is a compact text string containing the same information as the columns of atom symbols and bonds. It can also be used directly in the search functions described in earlier chapters. It is desirable to parse the molecular properties in molecular structure files in order to store them in data columns for possible searching... [Pg.124]

There are many programs available to parse the various molecular structure file format. OpenBabel is an open-source program that can read many file formats and produce a SMILES representation of molecular structure. There are many other commercial products that can do this as well. In the following examples, the OpenBabel/plpythonu implementation of molfile parsing will be used. This was introduced in Chapter 10. The code to define the necessary functions is shown in the Appendix. [Pg.125]

Thus, there are four basic representation of the molecular structure which can be used as basis to build up the optimal descriptors (Fig. 12.3) (i) hydrogen suppressed graph (ii) hydrogen hlled graph (iii) GAO and (iv) SMILES. These representations also can be involved into hybrid version of the optimal descriptor where molecular features extracted from e.g. GAO and SMILES play the role of hybrid basis for a QSPR/QSAR predictions [27-32]. [Pg.360]

Fig. 12.2 The definition of optimal descriptors using representation of the molecular structure by SMILES... Fig. 12.2 The definition of optimal descriptors using representation of the molecular structure by SMILES...
The above discussion provides summary of QSPR/QSAR approaches applied to classical, chemical compounds. However, an analysis devoted to nanomaterials having gigantic and complex molecular architecture lead to necessity of definition of new approaches for the predictive modelling, because the representation of their molecular structure by means of molecular graph and/or SMILES sometimes becomes very problematic (e.g. multi-walled carbon nanotubes [34], graphene [35]). In the first approximation, the optimal descriptors for such species should be a collector of all available data which are able to impact the physicochemical and/or biochemical behavior of nanomaterials. This concept is displayed in Fig. 12.6. [Pg.361]

The prediction of three-dimensional chemical structure from a list of atoms in a molecule and their connectivity is a good example of a chemical problem that may be solved by an expert system. We have already seen (Fig. 9.2) how the SMILES interpreter can construct a two-dimensional representation of a structure from its one-dimensional representation as a SMILES string. The CONCORD program (CONnection table to CoORDinates) takes a SMILES string and, very rapidly, produces a three-dimensional model of an input molecule. This system is a hybrid between an expert system and a molecular mechanics program, molecular mechanics being the method by which molecular structures are minimized in most molecular modelling systems. The procedure operates as follows. [Pg.203]

The simplified molecular input line entry system (SMILES) [68-71] is a compact and comfortable representation of the molecular structure from a chemical point of view. An increasing munber of SMlLES-based databases are gradually appearing on the internet, and thus it is interesting and important to search for suitable ways of using such a representation in QSPR-QSAR analyses. It has to be noted that the molecular graph contains details of the molecular architecture which is absent in SMILES. For instance, an extended connectivity of increasing order cannot be calculated directly from this notation. [Pg.31]

Toropova, A.P., Toropov, A.A., Rasulev, B.F., et al. (2012). QSAR models for ACE-inhibitor activity of tri-peptides based on representation of the molecular structure by graph of atomic orbitals and SMILES. Struct Chem 23, 1873-1878. [Pg.312]

Computer-Aided Property Estimation Computer-aided structure estimation requires the structure of the chemical compounds to be encoded in a computer-readable language. Computers most efficiently process linear strings of data, and hence linear notation systems were developed for chemical structure representation. Several such systems have been described in the literature. SMILES, the Simplified Molecular Input Line Entry System, by Weininger and collaborators [2-4], has found wide acceptance and is being used in the Toolkit. Here, only a brief summary of SMILES rules is given. A more detailed description, together with a tutorial and examples, is given in Appendix A. [Pg.5]

In order to calculate a physicochemical property, the structure of a molecule must be entered in some manner into an algorithm. Chemical structure notations for input of molecules into calculation software are described in Chapter 2, Section VII and may be considered as either being a 2D string, a 2D representation of the structure, or (very occasionally) a 3D representation of the structure. Of this variety of methods, the simplicity and elegance of the 2D linear molecular representation known as the Simplified Molecular Line Entry System (SMILES) stands out. Many of the packages that calculate physicochemical descriptors use the SMILES chemical notation system, or some variant of it, as the means of structure input. The use of SMILES is well described in Chapter 2, Section VII.B, and by Weininger (1988). There is also an excellent tutorial on the use of SMILES at www.daylight.com/dayhtml/smiles/smiles-intro.html. [Pg.45]

Standardized and consistent representations of stereoisomers and stereoisomeric mixtures are similarly important for the unique representations of distinct compounds. Recent file formats such as SDF v3000 and ChemAxon Extended SMILES provide clear definition and representation of complex relative and absolute stereochemical configurations. In practice these are not widely used because many commercially available files are represented by established v2000 or SMILES formats and also because HTS compounds are mostly relatively simple low molecular weight structures. [Pg.241]


See other pages where Molecular structure SMILES representation is mentioned: [Pg.189]    [Pg.114]    [Pg.271]    [Pg.138]    [Pg.56]    [Pg.412]    [Pg.47]    [Pg.212]    [Pg.217]    [Pg.143]    [Pg.52]    [Pg.72]    [Pg.78]    [Pg.80]    [Pg.84]    [Pg.102]    [Pg.119]    [Pg.133]    [Pg.212]    [Pg.217]    [Pg.58]    [Pg.2818]    [Pg.186]    [Pg.342]    [Pg.247]    [Pg.45]    [Pg.31]   
See also in sourсe #XX -- [ Pg.124 , Pg.151 ]




SEARCH



Molecular structure representation

Representation molecular

SMILES Representation of Molecular Structure

Structural representation

Structure representation

© 2024 chempedia.info