Structure data , file format

SDF (Structure-Data File) format is commonly used to store multiple structures in a single file. It allows storing arbitrary data together with coordinates and atom types. Oftentimes, small molecules stored in SDF are flat (2D) and energy minimization is performed to get 3D structures with proper bond length between different atoms. [Pg.248]

SDF - structure data file, one of several formats originating from MDL Information Systems software. The MDL Molfile is a subset of the SD-file. The MDL formats are all connectivity-oriented and geared towards small molecules with up to 255 heavy atoms and structure search systems. ... [Pg.1406]

A major advance Introduced in the JCAMP-CS protocol was interblock linking. Although the compound data file format had been described in JCAMP-DX version 4.24 there was no mechanism provided to cross-reference between blocks. This problem was solved by the introduction of the LDR BLOCKJD=, which could be used to link a spectrum or peak table block to the chemical structure block for instance. [Pg.2695]

The different internal and external file formats make it necessary to have programs which convert one format into another. One of the first conversion programs for chemical structure information was Babel (around 1992). It supports almost 50 data formats for input and output of chemical structure information [61]. CLIFF is another file format converter based on the CACTVS technology and which supports nearly the same number of file formats [29]. In contrast to Babel, the program is more comprehensive it is able to convert chemical reaction information, and can calculate missing atom coordinates [29]. [Pg.46]

In 1971 the Protein Data Bank - PDB [146] (see Section 5.8 for a complete story and description) - was established at Brookhaven National Laboratories - BNL -as an archive for biological macromolccular cr7stal structures. This database moved in 1998 to the Research Collaboratory for Structural Bioinformatics -RCSB. A key component in the creation of such a public archive of information was the development of a method for effreient and uniform capture and curation of the data [147], The result of the effort was the PDB file format [53], which evolved over time through several different and non-uniform versions. Nevertheless, the PDB file format has become the standard representation for exchanging inacromolecular information derived from X-ray diffraction and NMR studies, primarily for proteins and nucleic acids. In 1998 the database was moved to the Research Collaboratory for Structural Bioinformatics - RCSB. [Pg.112]

An alternative and much more flexible approach is represented hy the STAR file format [L48, 149, which can be used for building self-describing data files. Additionally, special dictionaries can be constructed, which specify more precisely the contents of the eorresponding data files. The two most widely used such dictionaries (and file formats) arc the CIF (Crystallographic Information File) file format [150] - the International Union of Crystallography s standard for representation of small molecules - and mmCIF [151], which is intended as a replacement for the PDB format for the representation of macromolecular structures,... [Pg.112]

As this short example shows. PDB files use different syntax for different records and both writing and reading such files require much effort. Another problem is the extensibility of this format to handle new kinds of information, which further complicates the file structure. The Protein Data Bank has been faced with the consequences - the existing legacy data comply with several different PDB formats, so they are not uniform and they arc more difEcuh to handle (145, 155, 157]. As mentioned in Section 2,9.7.1, there is a much more flexible and general way of representing molecular structure codes and associated information - the STAR file format and the file formats based on it. [Pg.120]

The Self-defining Text Archive and Retrieval (STAR) file format addresses primarily the problem of the inflexibility of the PDB file format, its fixed sets of allowable fields, and their strong dependence on order, To overcome the problems described, both the data. structure and the actual data items within a STAR file arc self-defined, which means that they are preceeded by corresponding names (labels) which identify and describe the data. The data may be of any type and there is no predefined order of the data. STAR files, in contrast to PDB files, are easy to read and write manually. The whole syntax of STAR files is very simple and is defined by only a few rules ... [Pg.120]

Table 7.1 shows the structure of the input data file that is prepared according to the format described in the previous section. [Pg.218]

Chem3D can read a wide variety of popular chemical structure files, including Gaussian, MacroModel, MDL, MOPAC, PDB, and SYBYL. Two-dimensional structures imported from ChemDraw or ISIS/Draw are automatically converted to three-dimensional structures. The Chem3D native file format contains both the molecular structure and results of computations. Data can be exported in a variety of chemical-structure formats and graphics files. [Pg.324]

SMILES strings are very concise and hence are suitable for storing and transporting a large number of molecular structures, while MOLfiles and its extension SDFiles have the option to store more complicated molecular data such as 3D molecular conformational information and biological data associated with the molecules. There are many other file formats not discussed here. Interested readers can find a list of file types at the following web site http //www.ch.ic.ac.uk/chemime/. [Pg.32]

Select the ID and the 2D NMR data files HUX and HHUX in the directory D NMRDATA F0RMAT XWINNMR 1D H and D NMRDATA FORMAT XWIN NMR 2D HH respectively and inspect their data structures using the WINDOWS file manager (WINDOWS Explorer). Check the differences between a 1D and a 2D data file. [Pg.29]

One of the most widely used chemical structure-encoding schemas in the pharmaceutical industry is the MDL Connection Table (CT) File Format. Both Molfile and SD File are based on MDL CT File Format to represent chemical structures. A Molfile represents a single chemical structure. An SD File contains one to many records, each of which has a chemical structure and other data that are associated with the structure. MDL Connection Table File Format also supports RG File to describe a single Rgroup query, rxnfile, which contains structural information of a single reaction, RD File, which has one to many records, each of which has a reaction and data associated with the reaction, and lastly, MDL s newly developed XML representation of the above—XD File. The CT File Format definition can be downloaded from the MDL website http //www.mdl.com/downloads/public/ctfile/ctfile.jsp. [Pg.3]

In a compound registration system, compound data can be imported from data files such as SD File, XML File, or Mobile. Alternatively, data can be entered from the presentation layer using a structure drawing package such as ISISDraw or ChemDraw. These data, once imported to the system, need to be bound to the domain objects in order for the system to process them efficiently. To support a variety of data sources, a Data Binder API is needed to decouple the system from specific format of input data and make it easily extensible to support other data input formats down the road. Figure 12.17 is the class diagram of the Data Binder API. [Pg.127]

PDB is one of the oldest protein data bases, founded in 1971. It has three locations, Rutgers University in New Jersey, San Diego Supercomputer Center (SDSC) at the University of California, and the National Institute of Standards and Technology (NIST) in Gaithersburg, Maryland. The PDB is a source for protein characterization and structure as well. The PDB archive contains macromolecular structure data on proteins, nucleic acids, protein-nucleic acid complexes, and viruses. Approximately 50-100 new structures are deposited each week, which are annotated and released upon the depositor s specifications. PDB data are freely available worldwide. PDB formats, annotates, validates, and releases dozens of complicated structure files each week some of them take only a couple of hours, others take weeks to process. Data processing is the main task of people at the PDB and validation is the most time-consuming part (Smith-Schmidt, 2002). [Pg.418]

Another way to get a structure into the computer is to import (read) a molecule file containing the atomic co-ordinates (and perhaps other atomic and molecular information) into your program. Unfortunately, there is no single standard file format that all programs use. However, some of the commonly encountered formats include those of SYBYL MOL2 files and Protein Data Bank (PDB) files. There are also free programs available for download from the World Wide Web that can interconvert the numerous file formats still in use. [Pg.383]

Data formats in the Brookhaven Protein Databank have become an intensively discussed topic in the last few years. The original PDB file format [17] was created in the late 1970s and maintained by the Research Collaboratory for Structural Bioinformatics [13]. In order to improve the organization of bibliographic... [Pg.132]

A second initiative is being developed by Dr. Ann Richard and coworkers at the EPA. The Distributed Structure-Searchable Toxicity (DSSTox) public database network is a flexible community-supported, web-based approach for the collation of data. It is based on the SDF format for the representation of chemical structure. It is intended to enable decentralized, free public access to toxicity data files. This should allow users from different disciplines to be linked. Public, commercial, industry, and academic groups have also been asked to contribute to, and expand, the DSSTox public database network. Data from potentially any toxicological endpoint can be collated in the DSSTox public database network, including both human health, and environmental endpoints (Richard et al 2002 Richard and Williams, 2002). [Pg.35]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...