Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...

Articles Figures Tables About

Validation SMILES

A somewhat dilferent way to define a molecule is as a simplified molecular input line entry specification (SMILES) structure. It is a way of writing a single text string that defines the atoms and connectivity. It does not define the exact bond lengths, and so forth. Valid SMILES structures for ethane are CC, C2, and H3C-CH3. SMILES is used because it is a very convenient way to describe molecular geometry when large databases of compounds must be maintained. There is also a very minimal version for organic molecules called SSMILES. [Pg.67]

Note that a single molecule may correspond to many different, but equivalent, SMILES strings. For example, for a given asymmetric molecule, starting from a different asymmetric atom will lead to a different, but equally valid, SMILES string. These various SMILES are called isomeric SMILES. They can be converted to a unique form called canonical SMILES (11). [Pg.31]

The presence of a branch in the structure raises the question of where to start coding. With the SMILES system it does not matter where one starts. A SMILES interpreter will produce the same structure from any valid SMILES coding for a compound. In some circumstances, such as the system s use in databases, it is necessary to have a unique SMILES string for a molecule. Using a set of rules it is possible to uniquify a SMILES string. [Pg.41]

Convince yourself that any valid SMILES is also a valid SMARTS, but not vice versa. [Pg.75]

The standard SQL data type Text has been used to store SMILES. This is appropriate because every SMILES is a valid text string. But not every text string is a valid SMILES. Without additional information about SMILES, the RDBMS cannot enforce any rules about which text strings ought to be in a column intended to contain SMILES. [Pg.86]

This would ensure that the column smi could contain only a valid SMILES. If this is the only table in which a SMILES column is used, this approach... [Pg.86]

This function can then be used to easily estimate a logp value for any valid SMILES. For example, the following SQL computes the same result as the select statement above. [Pg.152]

Update schema.structure Set fp=fp(smiles), isosmiles=isosmiles(smiles) Where valid(smiles) EOSQL... [Pg.207]

Getting SMILES/IUPAC name from structure by MarvinView Copy a valid SMILES string and paste into MarvinSketeh or Marvin riew panel to display the stmeture (Fig. 1.15). [Pg.19]

Murcko frameworks allow for an abstract molecular representation, whereby the side chains of the structure are trimmed, preserving only the rings and the linkers that connect rings. An RDKit script was used in order to generate customized Murcko frameworks. Here, atom and bond types are preserved. The resulting Murcko framework descriptor was represented by a valid SMILES string. [Pg.111]

Stereoisomerism at double bonds is indicated in SMILES by / and . The characters specify the relative direction of the connected atoms at a double bond and act as a frame. The characters frame the atoms of a double bond in a parallel or an opposite direction. It is therefore only reasonable to use them on both sides Figure 2-78). There are other valid representations of cis/trans isomers, because the characters can be written in different ways. Further details are listed in Section 2,3.3, in the Handbook or in Ref, [22]. [Pg.84]

Method of Meylan and Howard Meylan and Howard [9] expanded the bond contribution method of Hine and Mookerjee. Based on 345 compounds they derived bond contributions for 59 different bond types. Their method has been validated with an independent set of 74 structurally diverse compounds, obtaining a correlation coefficient of 0.96. Their method also needs correction factors for several structural-substructural features. This method has been implemented into a Henry s law constant program performing AWPC (25°C) estimations from SMILES input [15]. [Pg.142]

The descriptor was a product of the correlation weights, CW(Ik), calculated by the Monte Carlo method for each kth element of a special SMILES-like notation introduced by the authors. The notation codes the following characteristics the atom composition, the type of substance (bulk or not, ceramic or not), and the temperature of synthesis. The QSAR model constructed in this way was validated with the use of many different splits into training (n 21) and validation (n=8) sets. Individual sub-models are characterized by high goodness-of-fit (0.972 applicability domain of the model, it is not known if all the compounds (metal oxides, nitrides, mullite, and silicon carbide) can be truly modeled together. [Pg.211]

The same modeling scheme has been employed by Toropov et al. [72], who once again used the DCW descriptor. But, in this case, the descriptor denoted the variance in a set of 26 organic solvents coded with the SMILES notation. The model was externally validated, which confirmed its predictivity. The values of... [Pg.211]

This creates a table of four columns in the schema achemcompany. The column named smiles is intended to store the SMILES representation of a chemical structure, the id column will store an integer identifier to be used for joining other tables, the column mw will store the molecular weight with a precision of 2 digits to the right of the decimal point, and the column named added will record when this structure was entered into the table. As defined above, any character string could be entered into the smiles column, any integer into the id column, and any valid... [Pg.22]

Create Domain smiles As Text Check (valid(Value)) ... [Pg.28]

Operators, such as +, 11 and functions such as sqrt, round, and upper can be used with these data types. SQL has the ability to search data, using functions such as =, <, and the like. The goal of the SQL extensions is to enable SMILES to be handled as readily as any standard data type. This requires that SQL be extended to validate and standardize, or canon-icalize SMARTS. In addition, these SQL extensions provide functions and operators to allow comparisons and searches of molecular structures stored as SMILES. [Pg.73]

The cansmiles function can also be used to enforce an SQL constraint that the cansmi column must contain valid canonical SMILES. SQL constrains like this are commonly used to maintain data integrity. For example, the SQL clause check (cansmi = cansmiles (cansmi)) can be used in the initial creation of the table. One might also consider using an SQL trigger to handle an insert or update to a column that is required to contain canonical SMILES. [Pg.74]

The recommendation here is to use SMILES to store molecular structure itself. If other features of the molecule or atoms need to be stored, other data types and columns can be added to the row describing the molecule. It is the "SQL way" to not encode a lot of information into one data type. When using a molfile as the structural data type, too much data is encoded in a single data type. The individual data items must be parsed and validated. Errors creep into the data, due to missing, extra, or invalid portions of the molfile. Ways of storing atomic coordinates, atom types, and molecular properties are discussed Chapter 11. [Pg.84]

Why use the domain to define a smiles data type, but use a trigger for canonical SMILES First, SMILES is either valid or not. It is not feasible to... [Pg.87]

There is some overhead in the use of indexes, constraints, triggers, etc. as discussed here. The overhead is incurred when rows are inserted or updated in the table. However, the value of this approach is that the data in the table are well validated and can be searched more reliably and efficiently. Direct lookups of canonical or stereo SMILES is simple and quick because of the index on these columns. Using the fingerprint column speeds up substructure search. Tautomers can be readily selected using the column of simple graphs. [Pg.162]


See other pages where Validation SMILES is mentioned: [Pg.179]    [Pg.378]    [Pg.28]    [Pg.74]    [Pg.87]    [Pg.88]    [Pg.191]    [Pg.179]    [Pg.378]    [Pg.28]    [Pg.74]    [Pg.87]    [Pg.88]    [Pg.191]    [Pg.313]    [Pg.6]    [Pg.17]    [Pg.176]    [Pg.203]    [Pg.268]    [Pg.87]    [Pg.328]    [Pg.239]    [Pg.94]    [Pg.90]    [Pg.313]    [Pg.141]    [Pg.761]    [Pg.73]    [Pg.82]    [Pg.85]    [Pg.85]    [Pg.123]    [Pg.140]    [Pg.156]   
See also in sourсe #XX -- [ Pg.86 ]




SEARCH



Generation and Validation of SMILES String

Valid function SMILES

© 2024 chempedia.info