The SMILES Coding

In 1986, David Weininger created the SMILES Simplified Molecular Input Line Entry System) notation at the US Environmental Research Laboratory, USEPA, Duluth, MN, for chemical data processing. The chemical structure information is highly compressed and simplified in this notation. The flexible, easy to learn language describes chemical structures as a line notation [20, 21]. The SMILES language has found widespread distribution as a universal chemical nomenclature [Pg.26]

Compared with WLN and ROSDAL, SMILES uses only six basic rules to convert a structure into a character string (Table 2-2). [Pg.27]

Hydrogen atoms automatically saturate free valences and are omitted (simple hydrogen connection). [Pg.27]

Neighboring atoms stand next to each other. [Pg.27]

Rings are described by allocating digits to the two connecting ring atoms. [Pg.27]

Stereochemistry can also be expressed in the SMILES notation [113]. Depending on the clockwise or anti-clockwise ordering of the atoms, the stereocenter is specified in the SMILES code with or respectively Figure 2-78). The atoms around this stereocenter are then assigned by the sequence of the atom symbols following the identifier or (g). This means that, reading the SMILES code from the left, the three atoms behind the identifiers ( ) or ( )( )) describe the stereochemistry of the stereocenter. The sequence of these three atoms is dependent only on the order of writing, and independent of the priorities of the atoms. [Pg.84]

Convard, T., Dubost,). P., Le Solleu, H., Kummer, E. SmilogP a program for a fast evaluation of theoretical log P from the Smiles code of a molecule. Quant. Struct.-Act. Relat. 1994, 13, 34-37. [Pg.378]

SMILES provides an-easy-to-handle-format useful for entries in any software application. Therefore, based on the SMILES code pKb and log P values can be predicted beneath other chemical and biological properties (Table 1). [Pg.294]

Figure 2.3 gives the structure, WLN, and SMILES codes for a sample chemical. It is apparent that the WLN code is more complicated and much less obvious to a chemist than the SMILES code. The numbers refer to the atom numbers used in developing the WLN code. [Pg.40]

It is this writer s firm conviction, that the invention of the SMILES code and its convertibility with computer-readable SDF format has opened the door to great steps forward in man s ability to understand structure-activity relationships. Despite the present differences between the two major SMILES code interpreting software environments, I recommend that all students of chemistry become familiar with this code. [Pg.42]

Log KoW, MP, and Kp values were calculated with the Syracuse Research Corporation (SRC, Syracuse, NY) KOWWIN, MPBPWIN, and DERMWIN software packages (SRC, Syracuse, NY, USA), respectively, using the SMILES codes of the chemicals as the input. Values of MW and ST were calculated with the Advanced Chemistry Development (ACD) ChemSketch software. [Pg.402]

Figure 6 Generation of the SMILES code of morphine (reproduced from Figure 6 of ref. [256] with permission from Pergamon Press Ltd., Headington Hill Hall, Oxford 0X3 OBW, UK).

SmilogP A Program for a Fast Evaluation of Theoretical LogP from the Smiles Code of a Molecule. [Pg.308]

A SMILES code [22], MDL Molfile [50], or JME s own compact format (one-line representation of a molecule or reaction including the 2D coordinates) of created molecules may be generated. The created SMILES is independent of the way the molecule was drawn (unique SMILES see Section 2.3.3). Extensions to JME developed in cooperation with H. Rzepa and P. Murray-Rust also allow output of molecules in the CML format [60]. [Pg.144]

An alternative way to represent molecules is to use a linear notation. A linear notation uses alphanumeric characters to code the molecular structure. These have the advantage of being much more compact than the connection table and so can be particularly useful for transmif-ting information about large numbers of molecules. The most famous of the early line notations is the Wiswesser line notation [Wiswesser 1954] the-SMILES notation is a more recent example that is increasingly popular [Weininger 1988]. To construct the Wiswesser... [Pg.659]

Tab. 17.1. Smiles codes and names of the test set compounds used in Caco-2 external permeability prediction...

Global SMILES attributes were defined as (%,0,l,2,3)-code of six symbols si, s2, s3, s4, s5, and s6. The first symbol (si) is % . This symbol indicates this SMILES attribute is global. The s2 is descriptor of a presence of fluorine 0 means F symbol is absent in the SMILES 1 means there is one F symbol 2 means there are two F symbols finally 3 means that there are three or more F symbols in the SMILES the s3, s4, s5 are the same descriptors for Cl , Br , and O , respectively the s6 is the descriptor for the ( symbol in the SMILES. The brackets are tools to reflect the branching of molecular skeleton (Weininger, 1988, 1990 Weininger et al., 1989). Thus, ( and ) are indicators of the same phenomenon These SMILES attributes (i.e., brackets) have common correlation weight. [Pg.341]

It defines two simple methods. The molfileToSmiles() method converts a Molfile string to a Smiles string. The smilesToMolfile() method does the opposite—converting a Smiles string to a Molfile string. They both throw MolstructureConversionException to flag conversion failures. The failure can be caused by exceptions from the vendor implementation or from the CRS code. [Pg.95]

Fig. 11.17. Search for bioisosteres of propionic acid based on

The descriptor was a product of the correlation weights, CW(Ik), calculated by the Monte Carlo method for each kth element of a special SMILES-like notation introduced by the authors. The notation codes the following characteristics the atom composition, the type of substance (bulk or not, ceramic or not), and the temperature of synthesis. The QSAR model constructed in this way was validated with the use of many different splits into training (n 21) and validation (n=8) sets. Individual sub-models are characterized by high goodness-of-fit (0.972 applicability domain of the model, it is not known if all the compounds (metal oxides, nitrides, mullite, and silicon carbide) can be truly modeled together. [Pg.211]

The same modeling scheme has been employed by Toropov et al. [72], who once again used the DCW descriptor. But, in this case, the descriptor denoted the variance in a set of 26 organic solvents coded with the SMILES notation. The model was externally validated, which confirmed its predictivity. The values of... [Pg.211]

The presence of a branch in the structure raises the question of where to start coding. With the SMILES system it does not matter where one starts. A SMILES interpreter will produce the same structure from any valid SMILES coding for a compound. In some circumstances, such as the system s use in databases, it is necessary to have a unique SMILES string for a molecule. Using a set of rules it is possible to uniquify a SMILES string. [Pg.41]

For small molecules the system should allow one to construct the molecule and generate a reasonable three-dimensional conformation quickly. The best currently available approach is CONCORD (27), which rapidly (15-30s) generates a low-energy conformation for most classes of organic compounds from a simple alphanumeric SMILES code (28), a powerful, easily learned... [Pg.1]

Only a few compounds screened in early lead identification phases are synthesized in-house. More flexible and cost effective is to purchase chemicals from external suppliers. Most vendors provide lists of some ten to himdred thousand chemicals on compact discs and guarantee delivery within days to weeks. To explore this huge amount of data with the aid of computers, chemical information is transformed to computer-readable strings, e.g., smiles code, and different descriptors are determined. 1-dimensional (1-D) descriptors encode chemical composition and physicochemical properties, e.g., molecular weight, stoichiometry (C O Hj,), hydrophobicity, etc. 2-D descriptors reflect chemical topology, e.g., connectivity indices, degree of branching, number of aromatic bonds, etc. 3-D descriptors consider 3-D shape, volume or surface area. [Pg.78]

Table 7. Log P calculation of chlorpromazine (4) by the program CLOGP [253]. Input of the structure in SMILES code cl2cc(Cl)ccc2Sc3ccccc3NlCCCN(C)C unique SMILES code, generated by the program CN(C)CCCN2cIccccclSc3ccc(Cl)cc23 (reproduced from ref. [253] with permission...

Hie HOSE code represents a linear notation of a chemical structure. Other codes are ROSDAL, SYBYL, and SMILES. The latter code is currently the most popular linear notation. [Pg.280]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...