Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...

Articles Figures Tables About

Canonical SMILES function

The cansmiles function can also be used to enforce an SQL constraint that the cansmi column must contain valid canonical SMILES. SQL constrains like this are commonly used to maintain data integrity. For example, the SQL clause check (cansmi = cansmiles (cansmi)) can be used in the initial creation of the table. One might also consider using an SQL trigger to handle an insert or update to a column that is required to contain canonical SMILES. [Pg.74]

The cansmiles function will not preserve any stereochemical information in the input SMILES. This is done so that the canonical SMILES for all stereoisomers is the same. It may be preferable to keep each isomer as a unique entry in a database. The isosmiles function preserves the stereochemical information while also reordering the atoms in the same way as the canonical SMILES. [Pg.80]

When searching a database, if an isomeric query is used, only structures with the identical stereochemistry will be found using either a direct lookup or the matches function. If a nonchiral query is used, the direct lookup will find matching nonchiral structures, including canonical SMILES. When a nonchiral query is used in the matches function, structures of all chirality will be found. There is no one best method for dealing with a database containing many chiral molecules. It is important to carefully consider how to design and search such a database. [Pg.81]

It is possible to specify the isotope of any atom in a SMILES string. This is generally not necessary because the most common isotope is simply assumed. But if, for example, a database contains information about 13C, this can be readily encoded into the SMILES using [13C] instead of simply C. The [13C] atom is considered different from the normal C atom in a SMILES. A direct lookup using canonical SMILES will not locate isotopes of the same structure. A substructure search using the matches function will locate isotopes. This is because the match function uses SMARTS to specify the desired substructure. [Pg.81]

This chapter focused primarily on SMILES and canonical SMILES. It is feasible and common to use SMILES as the internal representation of molecular structure. Using the SQL functions described in this chapter,... [Pg.83]

The SQL domain allows one to define which values are to be allowed in a particular column of a table. A domain is created by stating the underlying built-in SQL data type used to store the domain data type. In addition, a check constraint function may be used to allow or forbid certain values. This can be used to great advantage for SMILES and canonical SMILES. Using a domain improves the ability of the RDBMS to maintain the integrity of the data contained in its tables. [Pg.86]

This canonicalize function uses NEW to refer to the row being inserted or updated. NEW.cansmi refers to the value under question. The canonical SMILES is computed and compared to NEW. cansmi. If they are not the same, the NEW.smi value is replaced by the canonical SMILES value and the NEW row is returned. This NEW row is used by the RDBMS in place of the original row. The create trigger command causes this operation to be put into effect in the RDBMS. [Pg.87]

When the table contains unique canonical smiles in an indexed column cansmi, and the cansmiles function provides the proper canonical SMILES for phenol, this lookup is extremely fast. [Pg.91]

The column structure.id is a unique integer relating the structure, sdf and property tables. The sdf.molfile column contains the molfile for each structure as defined by the vendor. The structure.name and structure.cansmiles columns contain the name and canonical smiles parsed and computed from the molfile. The structure.coord column will contain an array of atomic coordinates. The structure, atom column will contain an array of atom numbers from the file in canonical order to correspond to the atom order in the canonical SMILES. The OpenBabel/plpythonu extension functions molfile mol and molfile properties will be used to parse the vendor SDF molfiles and populate these tables. The molfile column of the sdf table is first populated from the SDF file, using the following perl script. [Pg.126]

The next statement defines a t r igger function that will be used whenever data is inserted or updated in this table. This function performs three important functions. First, it modifies the SMILES to be inserted into the smi column so that it contains the result of the isosmiles function. The isosmiles function is similar to the cansmiles function, except that it retains any stereochemistry that might be contained within the SMILES. If two stereoisomers are entered into this table, each will have a unique isosmiles value, but the same cansmiles value. In this way, they can be kept distinct, but their identical canonical SMILES shows them to be stereoisomers. The trigger function also computes the fingerprint and inserts it into the table when the SMILES is inserted or updated. [Pg.156]

Operators, such as +, 11 and functions such as sqrt, round, and upper can be used with these data types. SQL has the ability to search data, using functions such as =, <, and the like. The goal of the SQL extensions is to enable SMILES to be handled as readily as any standard data type. This requires that SQL be extended to validate and standardize, or canon-icalize SMARTS. In addition, these SQL extensions provide functions and operators to allow comparisons and searches of molecular structures stored as SMILES. [Pg.73]


See other pages where Canonical SMILES function is mentioned: [Pg.212]    [Pg.73]    [Pg.82]    [Pg.85]    [Pg.88]    [Pg.88]    [Pg.107]    [Pg.128]    [Pg.157]    [Pg.207]    [Pg.220]    [Pg.212]    [Pg.368]   


SEARCH



Canonical SMILES

Canonical SMILES cansmiles function

© 2024 chempedia.info