Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...

Articles Figures Tables About

SMILES canonical

The transformation used above to enumerate tautomers would lead to identical products when applied to symmetrically substituted pyrazoles. The set of structures generated in the enumeration process is converted to a sorted list of canonical SMILES [23] from which duplicates are easily eliminated. Structures registered in alternative tautomeric forms are converted to identical lists of SMILES that can each be represented by their common first member. This effectively extends the definition of canonical SMILES to cover an ensemble of tautomeric forms and makes it possible to check for duplicate structures without having to register multiple forms [16, 26]. [Pg.281]

Note that a single molecule may correspond to many different, but equivalent, SMILES strings. For example, for a given asymmetric molecule, starting from a different asymmetric atom will lead to a different, but equally valid, SMILES string. These various SMILES are called isomeric SMILES. They can be converted to a unique form called canonical SMILES (11). [Pg.31]

The hash origin of InChIKey also means that it is not convertible back to the original InChl or molecular structure, because for each InChIKey there is an unlimited number of possible matching input values. Although this might seem to be a drawback of the format, it is simply the price of the fixed length of the identifier. When a readable identifier with no possible collisions is needed, InChl (or canonical SMILES) should be used. [Pg.91]

SMILES has the additional feature of being human readable, but this is not very important in our model case. InChl, and InChIKey by inheritance, features a much better and robust normalization of structures for example, two different tautomeric forms have the same InChl, but different canonical SMILES. Also, the layered structure of InChl gives us the possibility of excluding some particularity of a structure, such as its stereochemistry, from the search when needed. This is not possible using InChIKey or SMILES. [Pg.97]

Canonicalize chemical structures, i.e., make all chemical structures quickly comparable for a computer. For example, canonical smiles or InChls can be used. [Pg.215]

Merge canonical smiles to find out where actives overlap between two targets. Similar targets will have a high overlap of circles. [Pg.217]

Merge chemical features based on shared canonical smiles strings. [Pg.224]

Notice that there are several ways in which SMILES could be written for the same structure, even the simplest ones. For example, hydrogen cyanide can be written as C N or N C, propene is either C=CC or CC=C. More complex structures can have three or many more SMILES that represent the same structure. If there were one standard way to write SMILES, then standard SQL text comparisons could be used to locate any particular structure. SMILES would become a uniquely spelled "name" for each unique structure. Canonical SMILES does just that. Using rules about which atoms should come before other atoms in the spelling of each SMILES, a unique name for each molecular structure can be provided.6... [Pg.72]

Once there is a unique, canonical SMILES available, this can be stored in a text column and a direct lookup for a specific structure can be done using the SQL = operator. If canonical SMILES is stored in a text column named cansmi, one can locate isopropyl alcohol using the SQL clause Where cansmi = CC(C) O. And because text data can be indexed in SQL, this lookup is extremely fast. In addition, SQL uniqueness constraints can be used to enforce data integrity when using canonical SMILES. [Pg.72]

The cansmiles function can also be used to enforce an SQL constraint that the cansmi column must contain valid canonical SMILES. SQL constrains like this are commonly used to maintain data integrity. For example, the SQL clause check (cansmi = cansmiles (cansmi)) can be used in the initial creation of the table. One might also consider using an SQL trigger to handle an insert or update to a column that is required to contain canonical SMILES. [Pg.74]

Using canonical SMILES is a very powerful technique for molecular structure storage and lookup. However, it is sometimes necessary to perform... [Pg.74]

If canonical SMILES are used in a table to facilitate direct lookup of molecular structure, it is necessary that only one unique name be used for any one structure. Similarly, if one is searching for structure-containing nitro groups, it is necessary that all nitro groups be represented using the same valence conventions. For these reason, it is essential to make a decision about the use of SMILES in certain cases, such as nitro groups. Sulfur and phosphorous atoms also must be considered carefully since they are commonly found with "unusual" valence. [Pg.80]

It is possible to represent chirality in SMILES. This is essential to correctly define the appropriate enantiomer or stereoisomer. Many databases will contain isomers. It is possible to relate the various isomers of a structure by using their common canonical SMILES. This might be done by relaxing the uniqueness constraint on the cansmi column in a structure table, or by adding another table of stereoisomers that is related to the master table. Chirality may be used in SMARTS as well. [Pg.80]

The cansmiles function will not preserve any stereochemical information in the input SMILES. This is done so that the canonical SMILES for all stereoisomers is the same. It may be preferable to keep each isomer as a unique entry in a database. The isosmiles function preserves the stereochemical information while also reordering the atoms in the same way as the canonical SMILES. [Pg.80]

When searching a database, if an isomeric query is used, only structures with the identical stereochemistry will be found using either a direct lookup or the matches function. If a nonchiral query is used, the direct lookup will find matching nonchiral structures, including canonical SMILES. When a nonchiral query is used in the matches function, structures of all chirality will be found. There is no one best method for dealing with a database containing many chiral molecules. It is important to carefully consider how to design and search such a database. [Pg.81]

It is possible to specify the isotope of any atom in a SMILES string. This is generally not necessary because the most common isotope is simply assumed. But if, for example, a database contains information about 13C, this can be readily encoded into the SMILES using [13C] instead of simply C. The [13C] atom is considered different from the normal C atom in a SMILES. A direct lookup using canonical SMILES will not locate isotopes of the same structure. A substructure search using the matches function will locate isotopes. This is because the match function uses SMARTS to specify the desired substructure. [Pg.81]

This chapter focused primarily on SMILES and canonical SMILES. It is feasible and common to use SMILES as the internal representation of molecular structure. Using the SQL functions described in this chapter,... [Pg.83]

The SQL domain allows one to define which values are to be allowed in a particular column of a table. A domain is created by stating the underlying built-in SQL data type used to store the domain data type. In addition, a check constraint function may be used to allow or forbid certain values. This can be used to great advantage for SMILES and canonical SMILES. Using a domain improves the ability of the RDBMS to maintain the integrity of the data contained in its tables. [Pg.86]

It might also be useful to define a canonical SMILES domain. This could be done as follows ... [Pg.87]

This is not recommended. Instead, a trigger is a better way to handle canonical SMILES. [Pg.87]

Using a domain ensures that only appropriate data can be inserted into a column. If an attempt is made to insert invalid data, an error is reported. The user is then responsible for correcting the value, if possible and trying the insert again. The SQL trigger mechanism automates this process. The following SQL will not only ensure that the cansmi column contains canonical SMILES, it will correct problems where possible. [Pg.87]

This canonicalize function uses NEW to refer to the row being inserted or updated. NEW.cansmi refers to the value under question. The canonical SMILES is computed and compared to NEW. cansmi. If they are not the same, the NEW.smi value is replaced by the canonical SMILES value and the NEW row is returned. This NEW row is used by the RDBMS in place of the original row. The create trigger command causes this operation to be put into effect in the RDBMS. [Pg.87]

Why use the domain to define a smiles data type, but use a trigger for canonical SMILES First, SMILES is either valid or not. It is not feasible to... [Pg.87]

Simplified Molecular Input Line Entry System (SMILES) is a simple, yet complete description of molecular structure that considers the atoms and bonds in a molecule. Using unique canonical SMILES, an indexed table lookup of a structure can be quickly done. For example, the SQL to lookup phenol is ... [Pg.91]

When the table contains unique canonical smiles in an indexed column cansmi, and the cansmiles function provides the proper canonical SMILES for phenol, this lookup is extremely fast. [Pg.91]

This type could be used in a table that might also contain a canonical smiles column, or even other variants of SMILES if desired. [Pg.116]

In a molecular structure file, an atom record typically contains all of the information about that atom the atomic number or symbol, the charge, coordinates, etc. When such a file is parsed into a SMILES string and an array of coordinates, it is important to be able to associate the proper coordinate with the proper atom. The use of canonical SMILES ensures this. Because canonical SMILES defines a unique order of the atoms in a molecule, that order is used to store the coordinates. Later sections of this chapter will discuss ways in which atomic coordinates might be stored in columns of a table. [Pg.125]


See other pages where SMILES canonical is mentioned: [Pg.661]    [Pg.115]    [Pg.282]    [Pg.319]    [Pg.97]    [Pg.6]    [Pg.212]    [Pg.72]    [Pg.73]    [Pg.73]    [Pg.74]    [Pg.74]    [Pg.75]    [Pg.80]    [Pg.82]    [Pg.82]    [Pg.82]    [Pg.85]    [Pg.88]    [Pg.88]    [Pg.88]    [Pg.102]    [Pg.107]   
See also in sourсe #XX -- [ Pg.283 ]




SEARCH



Canonical Reaction SMILES

Canonical SMILES InChl

Canonical SMILES cansmiles function

Canonical SMILES function

Canonical SMILES lookup

Canonical SMILES table

Trigger canonical SMILES

© 2024 chempedia.info