Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...

Articles Figures Tables About

Molecular structure molfile

Chemical representation can be rule-based or descriptive. Here we will give a short description of two popular file formats for molecular structures, MOLfiles (9) and SMILES (10-13), to illustrate how molecules are represented in computer. SMILES is a rule-based format while MOLfile is a more descriptive one. [Pg.29]

To demonstrate the use of binary substructure descriptors and Tanimoto indices for cluster analysis of chemical structures we consider the 20 standard amino acids (Figure 6.3) and characterize each molecular structure by eight binary variables describing presence/absence of eight substructures (Figure 6.4). Note that in most practical applications—for instance, evaluation of results from searches in structure databases—more diverse molecular structures have to be handled and usually several hundred different substructures are considered. Table 6.1 contains the binary substructure descriptors (variables) with value 0 if the substructure is absent and 1 if the substructure is present in the amino acid these numbers form the A-matrix. Binary substructure descriptors have been calculated by the software SubMat (Scsibrany and Varmuza 2004), which requires as input the molecular structures in one file and the substructures in another file, all structures are in Molfile format (Gasteiger and Engel 2003) output is an ASCII file with the binary descriptors. [Pg.270]

Fig. 2.1. Illustrative example of a MOLFile for acetaminophen (also known as paracetamol), (a) Molecular structure of acetaminophen, commonly known as Tylenol. Tylenol is a widely used medicine for reducing fever and pain, (b) MOLFile for acetaminophen. Fig. 2.1. Illustrative example of a MOLFile for acetaminophen (also known as paracetamol), (a) Molecular structure of acetaminophen, commonly known as Tylenol. Tylenol is a widely used medicine for reducing fever and pain, (b) MOLFile for acetaminophen.
SMILES strings are very concise and hence are suitable for storing and transporting a large number of molecular structures, while MOLfiles and its extension SDFiles have the option to store more complicated molecular data such as 3D molecular conformational information and biological data associated with the molecules. There are many other file formats not discussed here. Interested readers can find a list of file types at the following web site http //www.ch.ic.ac.uk/chemime/. [Pg.32]

However, XML is constantly penetrating farther into all fields of computer science, and it is probably only a matter of time until CML or a similar format replaces Molfile as the de facto standard format for molecular structure storage. [Pg.83]

Another choice for the internal representation of molecular structure is a molfile. It would be possible to construct SQL functions like those described in this chapter that would operate on this type of data. One disadvantage of molfiles is their greater size compared with SMILES. One advantage is that it is possible to store atomic coordinates, which is not possible with SMILES. There are other molecular file formats, but these are substantially the same as a molfile, except perhaps for specific atom types that may be of use in some database applications. [Pg.84]

The recommendation here is to use SMILES to store molecular structure itself. If other features of the molecule or atoms need to be stored, other data types and columns can be added to the row describing the molecule. It is the "SQL way" to not encode a lot of information into one data type. When using a molfile as the structural data type, too much data is encoded in a single data type. The individual data items must be parsed and validated. Errors creep into the data, due to missing, extra, or invalid portions of the molfile. Ways of storing atomic coordinates, atom types, and molecular properties are discussed Chapter 11. [Pg.84]

The external representation of molecular structure is a less rigorous definition. For example, there are many programs available that can convert to and from SMILES and molfiles. These can be used when a molfile (the external representation) needs to be imported as a SMILES (the internal representation) into the database. Similarly, a SMILES can be easily exported as a SMILES or converted to a molfile or other file format. It is useful to have these conversion functions as SQL extensions. [Pg.84]

The molfile or sdf file format is a very common way to store molecular structures. This can be considered as an external representation of a molecular structure data type. There are many other common file formats in use and only the essential features common to all of them will be considered here. The essential aspects of molecular structure contained in these files are atomic number or atomic symbol, formal atomic charge, bonded atom pairs, and bond orders. These are the minimum attributes necessary to define an unambiguous valence bond molecular structure. Other atom properties, such as atom types might also occur in these files, but these are specific to particular modeling programs and will not be discussed here. Sometimes molecular properties are also stored in these files. A way to store these properties in relational tables is discussed. [Pg.124]

There are many programs available to parse the various molecular structure file format. OpenBabel is an open-source program that can read many file formats and produce a SMILES representation of molecular structure. There are many other commercial products that can do this as well. In the following examples, the OpenBabel/plpythonu implementation of molfile parsing will be used. This was introduced in Chapter 10. The code to define the necessary functions is shown in the Appendix. [Pg.125]

PerlMol is a module add-on to the perl language that facilitates working with molecular structures using SMILES, SMARTS, and molfiles, as well as other functionality. PerlMol is available from CPAN, the Comprehensive Perl Archive Network. In order to install PerlMol, it is recommended to use the command cpan -i PerlMol as superuser in order to install the modules into the system perl library. This will install all the necessary modules for the following functions, as well as other parts of PerlMol that may be useful. [Pg.188]

The internal representation of molecules is accomplished using the technique developed by Wipke and Dyott (3), and later used by Molecular Design Limited (MDL) in several of their commercial programs. An MDL program, MACCS, is used to graphically input the molecular structure of the compound of interest, then save that structure into a file (molfile). The importation of this file provides CHESS with information such as the number and type of atoms and bonds, as well as stereochemical information. [Pg.48]

One of the few examples of commercially available retrieval systems encompassing both compounds and reactions is Molecular Design s microcomputer-based ChemBase system. It saves molfiles (molecular structure files) separately from its reaction file without any elaborate linkages between the files. This makes it rather difficult to move information between the files and requires the conversion of a molfile to move it in or out of a reaction file. Additionally, searching a compound structure for where it appears in a reaction is not a simple operation. Lastly, reaction sites are neither identified nor searchable. These limitations could be overcome if the reaction site could be associated with the molfiles or reaction files stored for searching. [Pg.371]

The most commonly used identifiers today include line notation identifiers (e.g., Simplified Molecular Input Line Entry System [SMILES] and International Chemical Identifier [InChls]), tabular identifiers (e.g., Molfile and Structure Definition [SD] file types), and portable mark-up language identifiers (e.g., Chemical Markup Language [CML] and FlexMol). Each identifier has its strengths and weaknesses as detailed in Chapter 5. Chapters 5 and 6 provide enough information to guide researchers in choosing the most appropriate formats for their individual use. [Pg.14]

Structure registration is the process of entering structural information in a centralized repository, usually a structure database. These repositories serve as a pool for providing structure information that has been created in other departments of a company. Structure databases are set up according to the individual needs of a department or company. They consist of a common representation of a structure in a standardized file format, such as MolFile, SDF, reaction (RXN) (MDL), JCAMP (International Union of Pure and Applied Chemistry), or simplified molecular input line entry specification. Any additional data can be stored with the structure depending on the context typical examples are structure properties, reaction conditions, and literature references. [Pg.335]

Tel. 612-626-1888, e-mail x-mol msc.edu Molecular display for structures in various molfile formats. DEC, Silicon Graphics, and Sun workstations. [Pg.257]

JME Molecule Editor is a Java applet to draw/edit structures and reactions [21], It also displays molecules on screen in display panel and generates output formats like Simplified Molecular-Input Line-Entry System (SMILES) and MDL molfile. To use JME in your portlet, use the following applet code. Include the JME distribution containing JME.jar for referencing. [Pg.515]


See other pages where Molecular structure molfile is mentioned: [Pg.47]    [Pg.47]    [Pg.169]    [Pg.71]    [Pg.388]    [Pg.84]    [Pg.131]    [Pg.131]    [Pg.133]    [Pg.140]    [Pg.412]    [Pg.318]    [Pg.50]    [Pg.139]    [Pg.143]    [Pg.150]    [Pg.7]    [Pg.382]    [Pg.421]    [Pg.426]    [Pg.418]    [Pg.167]    [Pg.254]    [Pg.206]    [Pg.351]    [Pg.362]    [Pg.367]   
See also in sourсe #XX -- [ Pg.124 ]




SEARCH



Molfile

© 2024 chempedia.info