Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...

Articles Figures Tables About

Canonical SMILES lookup

Once there is a unique, canonical SMILES available, this can be stored in a text column and a direct lookup for a specific structure can be done using the SQL = operator. If canonical SMILES is stored in a text column named cansmi, one can locate isopropyl alcohol using the SQL clause Where cansmi = CC(C) O. And because text data can be indexed in SQL, this lookup is extremely fast. In addition, SQL uniqueness constraints can be used to enforce data integrity when using canonical SMILES. [Pg.72]

Using canonical SMILES is a very powerful technique for molecular structure storage and lookup. However, it is sometimes necessary to perform... [Pg.74]

If canonical SMILES are used in a table to facilitate direct lookup of molecular structure, it is necessary that only one unique name be used for any one structure. Similarly, if one is searching for structure-containing nitro groups, it is necessary that all nitro groups be represented using the same valence conventions. For these reason, it is essential to make a decision about the use of SMILES in certain cases, such as nitro groups. Sulfur and phosphorous atoms also must be considered carefully since they are commonly found with "unusual" valence. [Pg.80]

When searching a database, if an isomeric query is used, only structures with the identical stereochemistry will be found using either a direct lookup or the matches function. If a nonchiral query is used, the direct lookup will find matching nonchiral structures, including canonical SMILES. When a nonchiral query is used in the matches function, structures of all chirality will be found. There is no one best method for dealing with a database containing many chiral molecules. It is important to carefully consider how to design and search such a database. [Pg.81]

It is possible to specify the isotope of any atom in a SMILES string. This is generally not necessary because the most common isotope is simply assumed. But if, for example, a database contains information about 13C, this can be readily encoded into the SMILES using [13C] instead of simply C. The [13C] atom is considered different from the normal C atom in a SMILES. A direct lookup using canonical SMILES will not locate isotopes of the same structure. A substructure search using the matches function will locate isotopes. This is because the match function uses SMARTS to specify the desired substructure. [Pg.81]

Simplified Molecular Input Line Entry System (SMILES) is a simple, yet complete description of molecular structure that considers the atoms and bonds in a molecule. Using unique canonical SMILES, an indexed table lookup of a structure can be quickly done. For example, the SQL to lookup phenol is ... [Pg.91]

When the table contains unique canonical smiles in an indexed column cansmi, and the cansmiles function provides the proper canonical SMILES for phenol, this lookup is extremely fast. [Pg.91]

The id column is defined as a primary key. This causes an index to be created, which will facilitate joining the structure table with other tables yet to be created. The smiles column is defined to be unique, which also automatically creates an index. This column will not be used as a key, but the unique index will allow fast lookups on this table if a particular structure is desired. The final definition of this schema creates an index on the cansmiles column. This will not be a unique index, but it will allow fast lookup of structures by canonical SMILES. [Pg.157]

There is some overhead in the use of indexes, constraints, triggers, etc. as discussed here. The overhead is incurred when rows are inserted or updated in the table. However, the value of this approach is that the data in the table are well validated and can be searched more reliably and efficiently. Direct lookups of canonical or stereo SMILES is simple and quick because of the index on these columns. Using the fingerprint column speeds up substructure search. Tautomers can be readily selected using the column of simple graphs. [Pg.162]


See other pages where Canonical SMILES lookup is mentioned: [Pg.73]    [Pg.74]    [Pg.75]    [Pg.82]    [Pg.369]   
See also in sourсe #XX -- [ Pg.73 ]




SEARCH



Canonical SMILES

Lookup

© 2024 chempedia.info