Molecular information file

CAS = Chemical Abstracts Service CIP = Cahn-Ingold-Prelog system of labeling stereogenic atoms and bonds MDLI = Molecular Design Limited, Inc. MIF = molecular information file MOLFile = molecule file SEMA = stereo-chemically extended Morgan algorithm SMD = Standard Molecular Data SMILES = simplified molecular input line entry specification. [Pg.2727]

The exchange file format MIF (molecular information file, see below) can code stereochemistry using stereovertex lists like those discussed above. [Pg.2730]

The molecular information file (MIF), developed from extensions to the earlier standard molecular data (SMD) format, designed to be compatible with CIF and mmCIF and, at present at least, used primarily by the crystallographic community. [Pg.2822]

The first task of chemoinformatics is to transform chemical knowledge, such as molecular structures and chemical reactions, into computer-legible digital information. The digital representations of chemical information are the foundation for all chemoin-formatic manipulations in computer. There are many file formats for molecular information to be imported into and exported from computer. Some formats contain more information than others. Usually, intended applications will dictate which format is more suitable. For example, in a quantum chemistry calculation the molecular input file usually includes atomic symbols with three-dimensional (3D) atomic coordinates as the atomic positions, while a molecular dynamics simulation needs, in addition, atom types, bond status, and other relevant information for defining a force field. [Pg.29]

Another way to get a structure into the computer is to import (read) a molecule file containing the atomic co-ordinates (and perhaps other atomic and molecular information) into your program. Unfortunately, there is no single standard file format that all programs use. However, some of the commonly encountered formats include those of SYBYL MOL2 files and Protein Data Bank (PDB) files. There are also free programs available for download from the World Wide Web that can interconvert the numerous file formats still in use. [Pg.383]

It would be possible to create tables using columns to store the atomic symbols and bond information found in molecular structure files, reflecting the column style format of the file itself. Instead, a SMILES representation of this valence bond information is preferred. SMILES is a compact text string containing the same information as the columns of atom symbols and bonds. It can also be used directly in the search functions described in earlier chapters. It is desirable to parse the molecular properties in molecular structure files in order to store them in data columns for possible searching... [Pg.124]

In a molecular structure file, an atom record typically contains all of the information about that atom the atomic number or symbol, the charge, coordinates, etc. When such a file is parsed into a SMILES string and an array of coordinates, it is important to be able to associate the proper coordinate with the proper atom. The use of canonical SMILES ensures this. Because canonical SMILES defines a unique order of the atoms in a molecule, that order is used to store the coordinates. Later sections of this chapter will discuss ways in which atomic coordinates might be stored in columns of a table. [Pg.125]

The previous section shows how molecular structures stored in an RDBMS can be made available to client programs that traditionally read molecular structure files. The advantage of storing molecular structures in an RDBMS is that the information can be used from within the database, as well as by external clients. For example, it would be possible to search a table of molecular structures for three-dimensional overlap, much like it might be searched for substructure match. Of course, such search functions need to be written and installed as extensions to an RDBMS, just like the matches functions was done for substructure searches. This section shows some possible ways this might be accomplished. [Pg.133]

These files serve several other functions. Each reaction stored in CHESS needs to know structures of products and reactants. This is accomplished using reference numbers that point to the molecular reaction files of CHESS. Other knowledge must be placed into these reaction libraries. All of the information must be represented in such a way as to permit CHESS to reason on the stored reaction mechanisms. [Pg.48]

One of the few examples of commercially available retrieval systems encompassing both compounds and reactions is Molecular Design s microcomputer-based ChemBase system. It saves molfiles (molecular structure files) separately from its reaction file without any elaborate linkages between the files. This makes it rather difficult to move information between the files and requires the conversion of a molfile to move it in or out of a reaction file. Additionally, searching a compound structure for where it appears in a reaction is not a simple operation. Lastly, reaction sites are neither identified nor searchable. These limitations could be overcome if the reaction site could be associated with the molfiles or reaction files stored for searching. [Pg.371]

Friedman, H.L. (1951) Influence of isosteric replacements upon biological activity. National Academy of Sciences-National Research Council Publication No. 206, Washington, DC, pp. 295-395. Thornber, C.W. (1979) Isosterism and molecular modification in drug design. Chemical Society Reviews, 8, 563-580. Hall, S.R., Allen, F.H., and Brown, I.D. (1991) The crystallographic information file (CIF) a new standard archive file for crystallography. Acta Crystallographica A, 47, 655-685. [Pg.98]

As a foundation for comparing the SRP RNAs in three dimensions, we used the human SRP RNA model, generated earlier with ERNA-3D (9). Next, we focused on the SRP RNA of M. jannaschii to take advantage of the large number base pairs compounded into helical sections 5bcd, 5gh, 5ij, and 6bc. A simple textual input file was generated to contain the M. Jannaschii SRP RNA sequence, information about the paired residues, and about positions of the helical sections. The positions of the helical sections were copied from the PDB-formatted (18) molecular structure file of the human SRP RNA model (9). [Pg.410]

One way to input molecular structure of the new component is to enter individual atoms and bonds. However, a simpler way is to import the molecular structure from other component databanks. A very useful resource is the NIST (National Institute of Standards and Technology) Chemistry WebBook. The information for this new component is shown in Figure 3.66. Click the molecular structure file, 2d Mol file, and save in a known directory with a given file name. The next step is to go to Molecular Structure of EMC under Properties and Click Import Structure to import this file into Aspen Plus (see Fig. 3.67). After this step, ask Aspen Plus to calculate bonds by clicking Calculate Bonds. The graphical structure of EMC is successfully imported into Aspen Plus as shown in Figure 3.68. [Pg.88]

The crystallographic information file (CIF) and macro-molecular CIF (mmCIF) formats developed by the International Union of Crystallography, mainly to represent the data associated with X-ray crystallography studies. [Pg.2822]

Parallel to that the MDL Molfile format (see the Tutorial in Section 2.4.6) developed at Molecular Design Limited (now MDL Information Systems, Inc.) became a de facto standard file format [50]. [Pg.45]

As this short example shows. PDB files use different syntax for different records and both writing and reading such files require much effort. Another problem is the extensibility of this format to handle new kinds of information, which further complicates the file structure. The Protein Data Bank has been faced with the consequences - the existing legacy data comply with several different PDB formats, so they are not uniform and they arc more difEcuh to handle (145, 155, 157]. As mentioned in Section 2,9.7.1, there is a much more flexible and general way of representing molecular structure codes and associated information - the STAR file format and the file formats based on it. [Pg.120]

The most important feature of editing software is the option to save the structure in standard file formats which contain information about the structure (e,g., Mol-filc. PDB-filc). Most of these file formats arc ASCII text files (which can be viewed in simple text editors) and cover international standardized and normalized specifications of the molecule, such as atom and bond types or connectivities (CT) (see Section 2,4). Thus, with these files, the structure can be exchanged between different programs. Furthermore, they can seiwe as input files to other chemical software, e.g, to calculate 3D structures or molecular properties. [Pg.138]

In order to represent 3D molecular models it is necessary to supply structure files with 3D information (e.g., pdb, xyz, df, mol, etc.. If structures from a structure editor are used directly, the files do not normally include 3D data. Indusion of such data can be achieved only via 3D structure generators, force-field calculations, etc. 3D structures can then be represented in various display modes, e.g., wire frame, balls and sticks, space-filling (see Section 2.11). Proteins are visualized by various representations of helices, / -strains, or tertiary structures. An additional feature is the ability to color the atoms according to subunits, temperature, or chain types. During all such operations the molecule can be interactively moved, rotated, or zoomed by the user. [Pg.146]

Dalby A, J G Nourse, W D Hounshell, A K I Gushurst, D L Grier, B A Leland and J Laufer 1991 Description of Several Chemical Structure File Formats Used by Computer Programs Developei at Molecular Design Limited, journal of Chemical Information and Computer Science 32 244-255. [Pg.737]

The HyperChem log file includes calculated dipole moments of molecules. To set the amount of information collected in the log file, change the value of the QuantumPrintLevel setting in the chem.ini file. Note that the sign convention used in the quantum mechanical calculation of dipoles is opposite to that used in molecular mechanics dipole calculations this reflects the differing sign conventions of physics and chemistry. [Pg.135]

For a quantum mechanical calculation, the single point calculation leads to a wave function for the molecular system and considerably more information than just the energy and gradient are available. In principle, any expectation value might be computed. You can get plots of the individual orbitals, the total (or spin) electron density and the electrostatic field around the molecule. You can see the orbital energies in the status line when you plot an orbital. Finally, the log file contains additional information including the dipole moment of the molecule. The level of detail may be controlled by the PrintLevel entry in the chem.ini file. [Pg.301]

Registry File. This CAS file contains more than 11.8 million chemical substance records. About 8,000—14,000 records are added each week as new substances are identified by the CAS Registry System. The substance records contain CAS Registry Numbers, chemical names, stmctures, molecular formulas, ring data biosequence information, and classes for polymers. AH of this information may be displayed. [Pg.117]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...