Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...

Articles Figures Tables About

SMILES atom ordering

The column structure.id is a unique integer relating the structure, sdf and property tables. The sdf.molfile column contains the molfile for each structure as defined by the vendor. The structure.name and structure.cansmiles columns contain the name and canonical smiles parsed and computed from the molfile. The structure.coord column will contain an array of atomic coordinates. The structure, atom column will contain an array of atom numbers from the file in canonical order to correspond to the atom order in the canonical SMILES. The OpenBabel/plpythonu extension functions molfile mol and molfile properties will be used to parse the vendor SDF molfiles and populate these tables. The molfile column of the sdf table is first populated from the SDF file, using the following perl script. [Pg.126]

The basic idea of specifying the priority of the atoms around a stereocenter in order to obtain a stereodescriptor is also incorporated into the most widespread structure representations, the Molfile and SMILES (see Sections 2.3.3, and 2.4.6). [Pg.82]

Stereochemistry can also be expressed in the SMILES notation [113]. Depending on the clockwise or anti-clockwise ordering of the atoms, the stereocenter is specified in the SMILES code with or respectively Figure 2-78). The atoms around this stereocenter are then assigned by the sequence of the atom symbols following the identifier or (g). This means that, reading the SMILES code from the left, the three atoms behind the identifiers ( ) or ( )( )) describe the stereochemistry of the stereocenter. The sequence of these three atoms is dependent only on the order of writing, and independent of the priorities of the atoms. [Pg.84]

The generation of the correct compound structures is a critical step in which different components such as atomic valences, correct bond orders and properly defined aromaticity have to be considered carefully. In addition, the correct stereochemistry flags need to be added for a correct treatment of stereochemistry. Most of the current pharmacophore generation packages include compound builders, but users can also import them from external sources using common file formats, for example SMILES, MOL, SD or MOL2. [Pg.22]

Similar to SMILES, InChI does not store atom coordinates. In contrast to SMILES, which by default omits hydrogen atoms that are then added implicitly to match the most common valency of an atom, InChI stores hydrogen atoms but does not store bond orders. These two techniques are just different approaches to the same problem for a given molecular skeleton, the bond orders and number of hydrogen atoms... [Pg.86]

Benzene is typically thought of as a combination of two equivalent resonance structures. These could be written as the SMILES C1=C-C=C-C=C1 and C1-C=C-C=C-C=1. In order to have just one representation for benzene and other aromatic systems, SMILES handles these aromatic systems specially, treating the atoms in an aromatic ring as a special aromatic type and the bonds as a special aromatic type. The lowercase symbol is used to denote an aromatic atom in SMILES and SMARTS. The SMILES for benzene then becomes clcccccl. A bond between aromatic atoms is an aromatic bond, unless otherwise spelled out. For example, biphenyl can be written as clcccccl-clcccccl. [Pg.77]

Before considering how SMIRKS can be used to carry out transformations with multiple reactants, first consider simpler unimolecular transformations. These are discussed separately because of the important use of unimolecular transformations to enforce the consistent use of SMILES throughout the database. This improves the integrity of the data in a chemical sense, rather than a relational database sense as discussed previously. The root of the issue is this There are multiple ways to represent the same molecular structure due to the limitations of valence bond theory. In valence bond theory, upon which SMILES is based, atoms have formal charges, most often zero. The bonds between atoms are shared pairs of electrons and may consist of multiple shared pairs giving rise to double, triple, or possibly even higher-order bonds between atoms. This simple theory, while quite powerful and applicable to a majority of chemical structures, leads to certain ambiguities. [Pg.101]

Because SMIRKS is a combination of SMILES and SMARTS and because there is no canonical representation of SMARTS, there is no canonical representation of SMIRKS. SMARTS can be considered as a set of instructions on how to match substructures of SMILES. SMIRKS can similarly be considered as a set of instructions on how to identify reactive atoms and combine or alter them in order to carry out a specific transformation of a set of SMILES. [Pg.107]

Each row in the coordtest table represents a molecule. The smiles column is a string of atom symbols and bonds and the coord column is an array of atom coordinates. How is it possible to keep the ordering of atoms in the smiles string in sync with the ordering of atom coordinates in the coord array When the coordinates are initially entered from the external source, they are likely to be in a common chemical file format. The program that converts from that file format to SMILES would have to output the atom coordinates in the same order as the atoms in the SMILES. [Pg.116]

It would be possible to create tables using columns to store the atomic symbols and bond information found in molecular structure files, reflecting the column style format of the file itself. Instead, a SMILES representation of this valence bond information is preferred. SMILES is a compact text string containing the same information as the columns of atom symbols and bonds. It can also be used directly in the search functions described in earlier chapters. It is desirable to parse the molecular properties in molecular structure files in order to store them in data columns for possible searching... [Pg.124]

In a molecular structure file, an atom record typically contains all of the information about that atom the atomic number or symbol, the charge, coordinates, etc. When such a file is parsed into a SMILES string and an array of coordinates, it is important to be able to associate the proper coordinate with the proper atom. The use of canonical SMILES ensures this. Because canonical SMILES defines a unique order of the atoms in a molecule, that order is used to store the coordinates. Later sections of this chapter will discuss ways in which atomic coordinates might be stored in columns of a table. [Pg.125]

Consider extending SQL with new functions. This might be considered the fundamental suggestion in this book. There are many useful functions built into SQL, but sometimes a simple extension function can allow an SQL operation to run completely on the database server without having to pass data to the client. For example, to sort selected rows by and value in a column requires only simple SQL. If the data needed to sort the rows is not part of data being selected, consider writing a function that will provide the value to be sorted. For example, if it were necessary to sort by the number of atoms in a molecule, a natoms (smiles) function could be used in the order clause of SQL. [Pg.138]

Murcko frameworks allow for an abstract molecular representation, whereby the side chains of the structure are trimmed, preserving only the rings and the linkers that connect rings. An RDKit script was used in order to generate customized Murcko frameworks. Here, atom and bond types are preserved. The resulting Murcko framework descriptor was represented by a valid SMILES string. [Pg.111]

The Smiles rearrangement of various amides has been investigated under two different conditions in order to avoid the formation of side products. The amides were converted into an equilibrium mixture of the amide and the rearranged product within a few hours in method A and few minutes in method B. The formation of side products in either of the methods depends highly on the nature of the substituent on the phenoxyacetamide nitrogen atom. Hence, in one case, the Smiles rearrangement of 20 with aqueous NaOH at 50 °C for 19 h afforded 21 in 18% yield, whereas the reaction with MeONa and DMF at 25 °C resulted in the formation of 21... [Pg.492]

The basis of ALADDIN is the Daylight Chemical Information Systems software, particularly GENIE, a substructure specification language. When GENIE finds a query substructure in an input SMILES structure, it can return to the user those atoms in the structure that correspond to those hit. Since in a MENTHOR database the coordinates of the atoms are stored in the order in which they occur in the SMILES for that molecule, the coordinates of the atoms of interest are thereby identified. Thus, our geometric objects are established from this set of atoms, and geometric tests are performed on them. Steric tests are performed on molecules that meet the geometric criteria. [Pg.243]

The program ONESMILE removes duplicate structures from a sorted list of SMILES. Thus, after the MODSMI transformations, sorting and ONESMILE would be used to produce a file of the unique molecules. Notice that this is possible only because all MODSMI operations produce the unique SMILES with the result that each particular molecular structure is represented by the same SMILES string regardless of the order of the atoms in the structure from which it originated. [Pg.322]


See other pages where SMILES atom ordering is mentioned: [Pg.8]    [Pg.660]    [Pg.279]    [Pg.185]    [Pg.58]    [Pg.85]    [Pg.87]    [Pg.368]    [Pg.369]    [Pg.80]    [Pg.85]    [Pg.102]    [Pg.102]    [Pg.106]    [Pg.128]    [Pg.131]    [Pg.6]    [Pg.644]    [Pg.158]    [Pg.168]    [Pg.173]    [Pg.6]    [Pg.235]    [Pg.318]    [Pg.2733]    [Pg.260]   
See also in sourсe #XX -- [ Pg.74 , Pg.116 ]




SEARCH



Atomic order

© 2024 chempedia.info