Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...

Articles Figures Tables About

Molfile and Other Common File Formats

The molfile or sdf file format is a very common way to store molecular structures. This can be considered as an external representation of a molecular structure data type. There are many other common file formats in use and only the essential features common to all of them will be considered here. The essential aspects of molecular structure contained in these files are atomic number or atomic symbol, formal atomic charge, bonded atom pairs, and bond orders. These are the minimum attributes necessary to define an unambiguous valence bond molecular structure. Other atom properties, such as atom types might also occur in these files, but these are specific to particular modeling programs and will not be discussed here. Sometimes molecular properties are also stored in these files. A way to store these properties in relational tables is discussed. [Pg.124]

It would be possible to create tables using columns to store the atomic symbols and bond information found in molecular structure files, reflecting the column style format of the file itself. Instead, a SMILES representation of this valence bond information is preferred. SMILES is a compact text string containing the same information as the columns of atom symbols and bonds. It can also be used directly in the search functions described in earlier chapters. It is desirable to parse the molecular properties in molecular structure files in order to store them in data columns for possible searching [Pg.124]

Although SMILES is an entirely equivalent way of storing a connection table of atoms and bonds, it is sometimes desirable to create a traditional connection table, for example, when an external program requires it. The extension functions smiles to symbols and smiles to bonds accept a SMILES string and produce an array of either symbols or bonds. These are discussed in a later section of this chapter. Several implementations of these functions are shown in the Appendix. [Pg.125]

It may also be desirable to store the atomic coordinates read from these files. The purpose of parsing the coordinates from the file and putting them into a separate column is to enable use of the coordinates from within the database. If the column is properly defined as a numeric or float column, this will also ensure that the coordinates are proper numbers. If there is no need for atomic coordinates, it is not necessary to create a column for these. Later sections of this chapter will discuss ways in which these atomic coordinates might be used in SQL functions. [Pg.125]

In a molecular structure file, an atom record typically contains all of the information about that atom the atomic number or symbol, the charge, coordinates, etc. When such a file is parsed into a SMILES string and an array of coordinates, it is important to be able to associate the proper coordinate with the proper atom. The use of canonical SMILES ensures this. Because canonical SMILES defines a unique order of the atoms in a molecule, that order is used to store the coordinates. Later sections of this chapter will discuss ways in which atomic coordinates might be stored in columns of a table. [Pg.125]


See other pages where Molfile and Other Common File Formats is mentioned: [Pg.124]   


SEARCH



File format

Molfile

Molfile format

© 2024 chempedia.info