SMILES column

If one views the gigantlo scale in which this maun-fecture is carried on at the present day, by a Tennant Or a Mdspkatt, he must smile on looking bock at tbe, almost puny manner in which they operated in former times but, then, add to their process the continuous system—the introduction of steam into the chambers by Kestneb, concentration in the plathmm vesselB, and Gay-Lussac s column for the recovery of the nitrous gas—and manufacturers have almost all that could be desired at present. [Pg.1022]

Fig. 2-2. Observed dependencies of the Boltzman variable on time in experiments with horizontal water movement in unsaturated soil columns. Experiments (a) Nielsen et al. (1962) (b) Rawlins and Gardner (1963) (c) Ferguson and Gardner (1963) and (d) Smiles et al. (1978) (see Table 2-2 for the legend).

Chapter 7 introduces ways in which RDBMS can be used to handle chemical structural information using SMILES and SMARTS representations. It shows how extensions to relational databases allow chemical structural information to be stored and searched efficiently. In this way, chemical structures themselves can be stored in data columns. Once chemical structures become proper data types, many search and computational options become available. Conversion between different chemical structure formats is also discussed, along with input and output of chemical structures. [Pg.2]

This creates a table of four columns in the schema achemcompany. The column named smiles is intended to store the SMILES representation of a chemical structure, the id column will store an integer identifier to be used for joining other tables, the column mw will store the molecular weight with a precision of 2 digits to the right of the decimal point, and the column named added will record when this structure was entered into the table. As defined above, any character string could be entered into the smiles column, any integer into the id column, and any valid... [Pg.22]

Note two additional Join clauses, each with the appropriate On clause naming the columns that relate the tables being joined. The additional columns compound id, compound type, and openeye can smiles are from the compound table. No columns are actually selected from the substance compound table. That table is simply used to affect the many-to-many relationship between the substance and compound tables. [Pg.60]

Once there is a unique, canonical SMILES available, this can be stored in a text column and a direct lookup for a specific structure can be done using the SQL = operator. If canonical SMILES is stored in a text column named cansmi, one can locate isopropyl alcohol using the SQL clause Where cansmi = CC(C) O. And because text data can be indexed in SQL, this lookup is extremely fast. In addition, SQL uniqueness constraints can be used to enforce data integrity when using canonical SMILES. [Pg.72]

The cansmiles function can also be used to enforce an SQL constraint that the cansmi column must contain valid canonical SMILES. SQL constrains like this are commonly used to maintain data integrity. For example, the SQL clause check (cansmi = cansmiles (cansmi)) can be used in the initial creation of the table. One might also consider using an SQL trigger to handle an insert or update to a column that is required to contain canonical SMILES. [Pg.74]

It is possible to represent chirality in SMILES. This is essential to correctly define the appropriate enantiomer or stereoisomer. Many databases will contain isomers. It is possible to relate the various isomers of a structure by using their common canonical SMILES. This might be done by relaxing the uniqueness constraint on the cansmi column in a structure table, or by adding another table of stereoisomers that is related to the master table. Chirality may be used in SMARTS as well. [Pg.80]

The recommendation here is to use SMILES to store molecular structure itself. If other features of the molecule or atoms need to be stored, other data types and columns can be added to the row describing the molecule. It is the "SQL way" to not encode a lot of information into one data type. When using a molfile as the structural data type, too much data is encoded in a single data type. The individual data items must be parsed and validated. Errors creep into the data, due to missing, extra, or invalid portions of the molfile. Ways of storing atomic coordinates, atom types, and molecular properties are discussed Chapter 11. [Pg.84]

The standard SQL data type Text has been used to store SMILES. This is appropriate because every SMILES is a valid text string. But not every text string is a valid SMILES. Without additional information about SMILES, the RDBMS cannot enforce any rules about which text strings ought to be in a column intended to contain SMILES. [Pg.86]

The SQL domain allows one to define which values are to be allowed in a particular column of a table. A domain is created by stating the underlying built-in SQL data type used to store the domain data type. In addition, a check constraint function may be used to allow or forbid certain values. This can be used to great advantage for SMILES and canonical SMILES. Using a domain improves the ability of the RDBMS to maintain the integrity of the data contained in its tables. [Pg.86]

Using a domain like this, the smiles data type behaves much like a standard data type. When one attempts to insert an invalid number into a numeric column, an SQL error is reported and the value is not inserted. This fundamental behavior of an RDBMS is readily extended to SMILES using a domain. [Pg.86]

This would ensure that the column smi could contain only a valid SMILES. If this is the only table in which a SMILES column is used, this approach... [Pg.86]

Using a domain ensures that only appropriate data can be inserted into a column. If an attempt is made to insert invalid data, an error is reported. The user is then responsible for correcting the value, if possible and trying the insert again. The SQL trigger mechanism automates this process. The following SQL will not only ensure that the cansmi column contains canonical SMILES, it will correct problems where possible. [Pg.87]

When the table contains unique canonical smiles in an indexed column cansmi, and the cansmiles function provides the proper canonical SMILES for phenol, this lookup is extremely fast. [Pg.91]

One way to do a quick molecular formula comparison is to store the molecular formula not as a string representation, such as C60, but as a column of integers. Each row in a table of molecular structures would contain SMILES, but the table would also have additional columns containing the count of each atom type. These columns could be indexed to speed up the molecular formula comparison. The SQL used to search for structures containing phenol becomes as follows ... [Pg.92]

The column named smarts contains the SMiles ARbitrary Target Specification (SMARTS) pattern defining the fragment. The column named... [Pg.93]

This function could be used to add a column of tpsa to any table containing SMILES. [Pg.98]

Each row in the coordtest table represents a molecule. The smiles column is a string of atom symbols and bonds and the coord column is an array of atom coordinates. How is it possible to keep the ordering of atoms in the smiles string in sync with the ordering of atom coordinates in the coord array When the coordinates are initially entered from the external source, they are likely to be in a common chemical file format. The program that converts from that file format to SMILES would have to output the atom coordinates in the same order as the atoms in the SMILES. [Pg.116]

This type could be used in a table that might also contain a canonical smiles column, or even other variants of SMILES if desired. [Pg.116]

It would be possible to create tables using columns to store the atomic symbols and bond information found in molecular structure files, reflecting the column style format of the file itself. Instead, a SMILES representation of this valence bond information is preferred. SMILES is a compact text string containing the same information as the columns of atom symbols and bonds. It can also be used directly in the search functions described in earlier chapters. It is desirable to parse the molecular properties in molecular structure files in order to store them in data columns for possible searching... [Pg.124]

In a molecular structure file, an atom record typically contains all of the information about that atom the atomic number or symbol, the charge, coordinates, etc. When such a file is parsed into a SMILES string and an array of coordinates, it is important to be able to associate the proper coordinate with the proper atom. The use of canonical SMILES ensures this. Because canonical SMILES defines a unique order of the atoms in a molecule, that order is used to store the coordinates. Later sections of this chapter will discuss ways in which atomic coordinates might be stored in columns of a table. [Pg.125]

The column structure.id is a unique integer relating the structure, sdf and property tables. The sdf.molfile column contains the molfile for each structure as defined by the vendor. The structure.name and structure.cansmiles columns contain the name and canonical smiles parsed and computed from the molfile. The structure.coord column will contain an array of atomic coordinates. The structure, atom column will contain an array of atom numbers from the file in canonical order to correspond to the atom order in the canonical SMILES. The OpenBabel/plpythonu extension functions molfile mol and molfile properties will be used to parse the vendor SDF molfiles and populate these tables. The molfile column of the sdf table is first populated from the SDF file, using the following perl script. [Pg.126]

Consider extending SQL with new functions. This might be considered the fundamental suggestion in this book. There are many useful functions built into SQL, but sometimes a simple extension function can allow an SQL operation to run completely on the database server without having to pass data to the client. For example, to sort selected rows by and value in a column requires only simple SQL. If the data needed to sort the rows is not part of data being selected, consider writing a function that will provide the value to be sorted. For example, if it were necessary to sort by the number of atoms in a molecule, a natoms (smiles) function could be used in the order clause of SQL. [Pg.138]

The next statement defines a t r igger function that will be used whenever data is inserted or updated in this table. This function performs three important functions. First, it modifies the SMILES to be inserted into the smi column so that it contains the result of the isosmiles function. The isosmiles function is similar to the cansmiles function, except that it retains any stereochemistry that might be contained within the SMILES. If two stereoisomers are entered into this table, each will have a unique isosmiles value, but the same cansmiles value. In this way, they can be kept distinct, but their identical canonical SMILES shows them to be stereoisomers. The trigger function also computes the fingerprint and inserts it into the table when the SMILES is inserted or updated. [Pg.156]

The id column is defined as a primary key. This causes an index to be created, which will facilitate joining the structure table with other tables yet to be created. The smiles column is defined to be unique, which also automatically creates an index. This column will not be used as a key, but the unique index will allow fast lookups on this table if a particular structure is desired. The final definition of this schema creates an index on the cansmiles column. This will not be a unique index, but it will allow fast lookup of structures by canonical SMILES. [Pg.157]

It might be tempting to add additional columns to the structure table to hold defined properties of each structure. Not all properties of a structure are appropriate for a table of structures. Some properties, for example, molecular weight and molecular formula are fixed properties of a structure with a unique value. These might be added as columns to the structure table. However, they could also be kept in another table related to the structure table. Consider also how often these values will be needed or if they will be searched. It is possible to easily compute these properties when needed, using SQL functions that take a SMILES argument. [Pg.158]

Other properties are not unique, for example, chemical names. These should be stored in a separate table with one row for each value. For example, the entry in the pubchem database contains 10 synonyms for the SMILES C1(C(C(C(C(C10)0)0P(=0)(0)0)0)0)0 as shown in Table 13.1. Each of these should be entered as a separate row in a table of names along with a column containing the compound id. A simple table of this type would be created using the following SQL. [Pg.158]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...