Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...

Articles Figures Tables About

SMILES representation

IUPAC-like expressions, true IUPAC nomenclature names, and InChl and SMILES representations of chemical compounds are well suited for detection by machine learning approaches. Conditional random fields (CRFs)41 and support vector machines have been used for the detection of IUPAC expressions in scientific literature 42 Other approaches are based on rules sets43 44 or combinations of machine learning with rule-based approaches 45 All these approaches have in common that they face one significant problem the name-to-structure problem. [Pg.129]

This program respects chirality and cis-trans specifications when creating 3D from their ID SMILES representations. [Pg.334]

This creates a table of four columns in the schema achemcompany. The column named smiles is intended to store the SMILES representation of a chemical structure, the id column will store an integer identifier to be used for joining other tables, the column mw will store the molecular weight with a precision of 2 digits to the right of the decimal point, and the column named added will record when this structure was entered into the table. As defined above, any character string could be entered into the smiles column, any integer into the id column, and any valid... [Pg.22]

It would be possible to create tables using columns to store the atomic symbols and bond information found in molecular structure files, reflecting the column style format of the file itself. Instead, a SMILES representation of this valence bond information is preferred. SMILES is a compact text string containing the same information as the columns of atom symbols and bonds. It can also be used directly in the search functions described in earlier chapters. It is desirable to parse the molecular properties in molecular structure files in order to store them in data columns for possible searching... [Pg.124]

There are many programs available to parse the various molecular structure file format. OpenBabel is an open-source program that can read many file formats and produce a SMILES representation of molecular structure. There are many other commercial products that can do this as well. In the following examples, the OpenBabel/plpythonu implementation of molfile parsing will be used. This was introduced in Chapter 10. The code to define the necessary functions is shown in the Appendix. [Pg.125]

Reaction SMILES representation contains three parts Reactant, Agent, and Product which are separated by > that represents the arrow in a reaction. [Pg.347]

In Sections 2,.3.1-2.3.4, only the four most popular line notations, Wiswesser (WLN), ROSDAL. SMILES, and Sybyl (SEN), arc discussed. Whereas WLN is now almost obsolete, SMILES is quite an important representation and is widely used (Figure 2-7). [Pg.23]

A special extension of SMILES is USMILES (sometimes described as Broad SMILES) [23-25]. This Unique SMILES of Daylight is a canonical representation of a structure. This means that the coding is independent of the internal atomic numbering and results always in the same canonical, unambiguous, and unique description of the compound, granted by an algorithm (see Section 2.5.2). [Pg.27]

The basic idea of specifying the priority of the atoms around a stereocenter in order to obtain a stereodescriptor is also incorporated into the most widespread structure representations, the Molfile and SMILES (see Sections 2.3.3, and 2.4.6). [Pg.82]

Stereoisomerism at double bonds is indicated in SMILES by / and . The characters specify the relative direction of the connected atoms at a double bond and act as a frame. The characters frame the atoms of a double bond in a parallel or an opposite direction. It is therefore only reasonable to use them on both sides Figure 2-78). There are other valid representations of cis/trans isomers, because the characters can be written in different ways. Further details are listed in Section 2,3.3, in the Handbook or in Ref, [22]. [Pg.84]

A SMILES code [22], MDL Molfile [50], or JME s own compact format (one-line representation of a molecule or reaction including the 2D coordinates) of created molecules may be generated. The created SMILES is independent of the way the molecule was drawn (unique SMILES see Section 2.3.3). Extensions to JME developed in cooperation with H. Rzepa and P. Murray-Rust also allow output of molecules in the CML format [60]. [Pg.144]

Chemical structures can be transformed into a language for computer representation via line notations such as ROSDAL, SMILES, Sybyl. [Pg.160]

SMILES (for Simplified Molecular Input Line Entry Specification) notations, as a compact molecular representation [8]. [Pg.189]

In the following, we will discuss two-dimensional (2D)-to-3D conversion in this context. However, it should be emphasized that we do so only for the sake of brevity. In reality, none of the conversion programs uhlizes informahon of a 2D image of a chemical structure. Only the information on the atoms of a molecule and how they are cormected is used (i.e. the starhng informahon is the conshtution of the molecule). One could even refer to linear structure representations such as SMILES as one-dimensional. However this is not true since SMILES allows for branches and ring closure which makes its informahon content essentially 2D. Thus, all structure representahons which lack 3D atomic coordinates will in the following simply be referred to as 2D. [Pg.159]

It is to be noted that the QSPR/QSAR analysis of nanosubstances based on elucidation of molecular structure by the molecular graph is ambiguous due to a large number of atoms involved in these molecular systems. Under such circumstances the chiral vector can be used as elucidation of structure of the carbon nanotubes (Toropov et al., 2007c). The SMILES-like representation information for nanomaterials is also able to provide reasonable good predictive models (Toropov and Leszczynski, 2006a). [Pg.338]

Table 1 shows an example of markup, generated using the OSCAR 3 system. The abstract of a polymer research paper has been parsed by OSCAR and the resulting markup for the first sentence of the abstract is shown in-line with the text (Table IB). The first chemical entity encountered in the sentence is oleic acid , which has been marked up as type = CM (Chemical Moiety) and a number of other annotations, such as in-line representations of chemical structure (InChl, SMILES) have been attached. [Pg.128]

Chemical representation can be rule-based or descriptive. Here we will give a short description of two popular file formats for molecular structures, MOLfiles (9) and SMILES (10-13), to illustrate how molecules are represented in computer. SMILES is a rule-based format while MOLfile is a more descriptive one. [Pg.29]

Computer-Aided Property Estimation Computer-aided structure estimation requires the structure of the chemical compounds to be encoded in a computer-readable language. Computers most efficiently process linear strings of data, and hence linear notation systems were developed for chemical structure representation. Several such systems have been described in the literature. SMILES, the Simplified Molecular Input Line Entry System, by Weininger and collaborators [2-4], has found wide acceptance and is being used in the Toolkit. Here, only a brief summary of SMILES rules is given. A more detailed description, together with a tutorial and examples, is given in Appendix A. [Pg.5]

SMILES is based on the concept of hydrogen-suppressed molecular graphs (HSMG). The following example shows three representations of 1-butanol ... [Pg.179]

To obtain a unique SMILES notation, computer programs such as the Toolkit include the CANGEN algorithm [1] which performs CANonicalization, resulting in unique enumeration of atoms, and then GENerates the unique SMILES notation for the canonical structure. In the case of pyridine, this is notation (III). Any molecular structure entered in the Toolkit is converted automatically into its unique representation. [Pg.182]


See other pages where SMILES representation is mentioned: [Pg.114]    [Pg.4]    [Pg.412]    [Pg.47]    [Pg.128]    [Pg.371]    [Pg.212]    [Pg.97]    [Pg.72]    [Pg.212]    [Pg.89]    [Pg.114]    [Pg.4]    [Pg.412]    [Pg.47]    [Pg.128]    [Pg.371]    [Pg.212]    [Pg.97]    [Pg.72]    [Pg.212]    [Pg.89]    [Pg.294]    [Pg.660]    [Pg.661]    [Pg.731]    [Pg.189]    [Pg.750]    [Pg.186]    [Pg.342]    [Pg.266]    [Pg.178]    [Pg.247]    [Pg.271]    [Pg.138]    [Pg.179]    [Pg.62]   
See also in sourсe #XX -- [ Pg.89 ]




SEARCH



© 2024 chempedia.info