SMILES identifiers

ChemSpider was launched in 2007. It is an open-access service in which constituent databases, the largest of which is Web of Science, are linked on a free-access basis, and which uses algorithms to identify and extract chemical names from documents and web pages and convert them to structures and InChl and SMILES identifiers. Access to the core service is free, but the user may be routed to charging component databases. At launch, ChemSpider contained 21 million compounds. At the time of writing, it was too early to assess the success of the service. It was bought by the Royal Society of Chemistry in 2009. [Pg.23]

Stereochemistry can also be expressed in the SMILES notation [113]. Depending on the clockwise or anti-clockwise ordering of the atoms, the stereocenter is specified in the SMILES code with or respectively Figure 2-78). The atoms around this stereocenter are then assigned by the sequence of the atom symbols following the identifier or (g). This means that, reading the SMILES code from the left, the three atoms behind the identifiers ( ) or ( )( )) describe the stereochemistry of the stereocenter. The sequence of these three atoms is dependent only on the order of writing, and independent of the priorities of the atoms. [Pg.84]

Chemical identity may appear to present a trivial problem, but most chemicals have several names, and subtle differences between isomers (e.g., cis and trans) may be ignored. The most commonly accepted identifiers are the IUPAC name and the Chemical Abstracts System (CAS) number. More recently, methods have been sought of expressing the structure in line notation form so that computer entry of a series of symbols can be used to define a three-dimensional structure. For environmental purposes the SMILES (Simplified Molecular Identification and Line Entry System, Anderson et al. 1987) is favored, but the Wismesser Line Notation is also quite widely used. [Pg.3]

PS 9/7. Interview with Dr. Ketchum. Unable to identify cards at all - did not seem to respond to questions. Smiled vaguely when asked what time it was. Speaks out of context. Dropped cards when they were handed to him. [Pg.86]

Empath is a Cabinet database of metabolic pathways that models a metabolic pathway chart. It initially models the Boehringer Mannheim wall chart [35] but other pathway layouts are possible. It currently includes 1462 steps (metabolic reactions). The Empath database consists of more than 8000 live objects. Every object has an exact geometric location, that is, x, y coordinate, which is optionally indicated by visible hotspots. The chart is clickable everywhere and the image recenters itself around the selected point. The current object is the one closest to the center of the image and is identified by a bull s eye. A summary of the object is given (e.g., its structure, EC number, SMILES, reaction stoichiometry, etc.). Empath provides navigational features such as zoom in/out, wider, thinner, taller and shorter. [Pg.253]

The major hexa-CDD isomers were identified as 1,2,3,6,7>8 hexa--CDD one of the most toxic isomers, see Figure U. In addition 1,2,U,6,7,9- and 1,2,3,6,8,9-hexa-CDD or their Smiles-rearranged products (1,2,1, 6,8,9- an(i l,2,3,6,7,9 hexa-CDD, respectively), were found. These three isomers were always present in an almost constant isomeric ratio of 50 U0 10. Both of the hepta-CDD isomers were present in these samples in a ratio of 15 85 with the biologically most active (17) 1,2,3,, 6,7 hepta-CDD as the major constituent. All hexa-CDD isomers found in these samples were dimerization products of 2,3,, 6-tetrachlorophenol, the expected precursor of PCP in the chlorination starting from phenol (26). [Pg.327]

Use the library editor to create a library file for the compounds under study (Fig. 18.2). Library files are essentially plain text files that contain a record on each line, with an entry identifier and a SMILES string for the... [Pg.349]

Will you, indeed Thevizio smiled. Heads he won, tails I lost. What more could his shriveled little heart desire And on what basis will you identify either of them ... [Pg.83]

Integration of select Internet resources, such as the public chemical databases mentioned above, provides a very practical approach to structure searching the Internet and internal resources (Dong et al. 2007). Chapter 8 elaborates on this concept. As summarized in Chapter 2, another facet of chemical structure mining involves finding information within full text documents that do not traditionally contain identifiers like InChl or SMILE strings. Chapter 5 contains an in-depth discussion of these identifiers. [Pg.6]

The most commonly used identifiers today include line notation identifiers (e.g., Simplified Molecular Input Line Entry System [SMILES] and International Chemical Identifier [InChls]), tabular identifiers (e.g., Molfile and Structure Definition [SD] file types), and portable mark-up language identifiers (e.g., Chemical Markup Language [CML] and FlexMol). Each identifier has its strengths and weaknesses as detailed in Chapter 5. Chapters 5 and 6 provide enough information to guide researchers in choosing the most appropriate formats for their individual use. [Pg.14]

The main intention behind the development of InChI was to create a new way of naming compounds that would enable computer programs to assign them unique identifiers, regardless of how they are drawn and without the need for a central registration point for such identifiers (as in the case of registry numbers and other similar identifiers). This intent led directly to the fact that InChI cannot be created by humans, because they would not be able to reliably reproduce the steps needed to create such a unique identifier. With this fact in mind, InChI was created to be written and read by computers only. This is in strong contrast to SMILES, which was created specifically to be written and read by humans and even in its canonical form is at least human readable. [Pg.86]

Another important feature of InChl is its layered structure. Unlike in SMILES, where all data related to one atom are stored in one place, in InChl different properties of the structure are encoded in different parts of the identifier. This organization of the data has one very important advantage molecules with the same basic structure that differ only in some minor property, such as in stereochemistry or isotopic composition, have the same InChl, with only the exception of the corresponding layer. This makes it possible not only to compare two InChls to find if they represent exactly the same structure, but to use a more intelligent comparison of two InChl strings to reveal molecules with the same basic structure that differ only in some detail. It is then up to the user to decide which deviations in the InChl are significant for his or her purpose and which are not. [Pg.87]

The hash origin of InChIKey also means that it is not convertible back to the original InChl or molecular structure, because for each InChIKey there is an unlimited number of possible matching input values. Although this might seem to be a drawback of the format, it is simply the price of the fixed length of the identifier. When a readable identifier with no possible collisions is needed, InChl (or canonical SMILES) should be used. [Pg.91]

These identifiers were developed as an lUPAC project in 2000-2004. They are the most recent technology aimed at an unambiguous text-string representation of chemical structures. (Earlier technologies included Wiswesser line notation, which is not described here, and SMILES, described below.)... [Pg.165]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...