The Database

The thermodynamic databases that accompany most modeling codes were prepared intentionally to be separate from the codes. This means equilibrium constants are not hard-wired into the codes, which makes it much easier for users to change the values of the equilibrium constants in the database, or add/delete a reaction from the database without affecting the functionality of the codes. Many codes (e.g., phreeqc and EQ3/6) allow users to modify the equilibrium constants in the input file as well as in the database itself (see the Appendix). [Pg.75]

None of the thermodynamic databases accompanying the modeling codes are comprehensive compilations of all the aqueous and mineral species we may encounter. Neither has it been ensured that the thermodynamic data are internally consistent. Code developers and releasing agencies make no statement about what are the best thermodynamic properties of a solid or aqueous species or about the internal consistency of the data they assume this to be the responsibility of the modeler. [Pg.75]

A full release of GenBank occins on a bimonthly schedule with incremental (and nonincremental) daily updates available by anonymous FTP. The International Nucleotide Sequence Database Collaboration also exchanges new and updated records daily. Therefore, all sequences present in GenBank are also present in DDBJ and EMBL, as described in the introduction to this chapter. The three databases rely on a common data format for information described in the feature table documentation (see below). This represents the lingua franca for nucleotide sequence database annotations. Together, the nucleotide sequence databases have developed defined submission procedures (see Chapter 4), a series of guidelines for the content and format of all records. [Pg.49]

As mentioned above, nucleotide records are often the primary somce of sequence and biological information from which protein sequences in the protein databases are derived. There are three important consequences of not having the correct or proper information on the nucleotide record [Pg.49]

The GenBank flatfile (GBEF) is the elementary imit of information in the GenBank database. It is one of the most commonly used formats in the representation of [Pg.49]

Subtle differences exist in the formatting of the definition line and the use of the gene feature. EMBF uses line-type prefixes, which indicate the type of information present in each line of the record (Appendix 3.2). The feature section (see below), prefixed with FT, is identical in content to the other databases. All these formats are really reports from what is represented in a much more structured way in the underlying ASN.l file. [Pg.50]

The GBFF can be separated into three parts the header, which contains the information (descriptors) that apply to the whole record the features, which are the armotations on the record and the nucleotide sequence itself. All major nucleotide database flat files end with // on the last line of the record. [Pg.50]

Table 81.21.1. Surface stmctural detemiination methods. The second colunni indicates whether a technique can be considered a diffraction method, in the sense of relying on wave interference. Also shown are statistics of surface stmctural detemiinations, extracted from the Surface Stmcture Database [14], up to 1997. Counted here are only detailed and complete stmctural determinations, in which typically the experiment is simulated computationally and atomic positions are fitted to experiment. (Some stmctural detemiinations are perfomied by combining two or more methods those are counted more than once in this table, so that the colunnis add up to more than the actual 1113 stmctural detemiinations included in the database.)...

$Table 81.21.1. Surface stmctural detemiination methods. The second colunni indicates whether a technique can be considered a diffraction method, in the sense of relying on wave interference. Also shown are statistics of surface stmctural detemiinations, extracted from the Surface Stmcture Database [14], up to 1997. Counted here are only detailed and complete stmctural determinations, in which typically the experiment is simulated computationally and atomic positions are fitted to experiment. (Some stmctural detemiinations are perfomied by combining two or more methods those are counted more than once in this table, so that the colunnis add up to more than the actual 1113 stmctural detemiinations included in the database.)...$

In tenns of individual techniques, table B1.2T1 lists tlie breakdown totalled over time, counting from the inception of surface stmctural detennination in the early 1970s. It is seen that LEED has contributed altogether about 67% of all stmctural detenninations included in the database. The annual share of LEED was 100% until 1978, and has generally remained over 50% since then. In 1979 other methods started to produce stmctural detenninations, especially PD, ion scattering (IS) and SEXAFS. XRD and then XSW started to contribute results in the period 1981-3. [Pg.1757]

The relative simplicity of tlie method and the penetrative nature of the x-rays, yield a technique that is sensitive to elements with Z > 10 down to a few parts per million (ppm) and can be perfonued quantitatively from first principles. The databases for PIXE analysis programs [21, 22 and 23] are typically so well developed as to include accurate fiindamental parameters, allowing the absolute precision of the technique to be around 3% for major elements and 10-20% for trace elements. A major factor m applying the PIXE teclmique is that the bombardmg energy of the... [Pg.1841]

The quantitative imaging capability of the NMP is one of the major strengtiis of the teclmique. The advanced state of the databases available for PIXE [21, 22 and 23] allows also for the analysis of layered samples as, for example, in studying non-destmctively the elemental composition of fluid inclusions in geological samples. [Pg.1844]

The term has different spellings Chemoinformatics and Cheminformatics. Searches in the database of the Chemical Abstracts Service have shown an approximately equal number of hits for both terms, with Cheminformatics gaining ground somewhat in recent years. Here, we use the spelling "Chemoinformatics" without trying to put forward reasons for that choice. [Pg.5]

In 1971 the Protein Data Bank - PDB [146] (see Section 5.8 for a complete story and description) - was established at Brookhaven National Laboratories - BNL -as an archive for biological macromolccular cr7stal structures. This database moved in 1998 to the Research Collaboratory for Structural Bioinformatics -RCSB. A key component in the creation of such a public archive of information was the development of a method for effreient and uniform capture and curation of the data [147], The result of the effort was the PDB file format [53], which evolved over time through several different and non-uniform versions. Nevertheless, the PDB file format has become the standard representation for exchanging inacromolecular information derived from X-ray diffraction and NMR studies, primarily for proteins and nucleic acids. In 1998 the database was moved to the Research Collaboratory for Structural Bioinformatics - RCSB. [Pg.112]

If the database is not integrated in a database system, the database is called a flat-frle. As the name indicates, the data arc stored in a file that can be used directly by the user. [Pg.228]

Figure 5-2. The database(s) (DB) with organized data and metadata are part of the Database SysteiTi (DBS), which is managed by the Database Management System (DBMS).

In a flat-file sy.stcm the database is called a file. [Pg.229]

Most database users do not know how the data are organized in a database system (DBS) they depend solely on the application programs. This is sufficient for most database searches where users can receive large amounts of results quickly and easily, e.g., on literature or other information. Nevertheless, a basic knowledge on where and how to find deeper or more detailed information is quite useful. Due to their complex nature, comprehensive searches (e.g., for processes or patents) are not recommended for beginners. However, most local (in-house), online, and CD-ROM databases provide extensive tutorials and help functions that are specific to the database, and that give a substantial introduction into database searching. [Pg.230]

More than 10 000 databases exist that provide a small or large amount of data on various topics (including chemistry). The contents in databases are supplied by approximately 3500 database developers (e.g., the Chemical Abstracts Service, MDL Information Systems, etc.). Since there is a variety of topics from economics to science, as well as a variety of structures of the database, only some of the vendors (-2000) offer one or more databases as either local or as online databases (Figure 5-4) [4]. Usually, databases are provided by hosts that permit direct access to more than one database. The search occurs primarily through different individual soft-... [Pg.230]

A hierarchical system is the simplest type ofdatabase system. In this form, the var-iou.s data typc.s also called entities (sec figure 5-,3) arc as.signcd. systematically to various levels (Figure 5-5). The hierarchical system is represented as an upside-down tree with one root segment and ordered nodes. Each parent object can have one or more children (objects) but each child has only one parent. If an object should have more than one parent, this entity has to be placed a second time at another place in the database system. [Pg.232]

In order to trace (find, change, add, or delete) a segment in the database, the sequence in which the data arc read is important. Thus, the sequence of the hierarchical path is parent > child > siblings. The assignment of the data entities uses pointers. In our example, the hierarchical path to K is traced in Figure 5-fi. [Pg.232]

The characteristic of a relational database model is the organization of data in different tables that have relationships with each other. A table is a two-dimensional consti uction of rows and columns. All the entries in one column have an equivalent meaning (c.g., name, molecular weight, etc. and represent a particular attribute of the objects (records) of the table (file) (Figure 5-9). The sequence of rows and columns in the tabic is irrelevant. Different tables (e.g., different objects with different attributes) in the same database can be related through at least one common attribute. Thus, it is possible to relate objects within tables indirectly by using a key. The range of values of an attribute is called the domain, which is defined by constraints. Schemas define and store the metadata of the database and the tables. [Pg.235]

Relational database models utilize memory very efficiently, avoiding repetition of data. It is possible to extract both individual data elements and combinations of them from a table. The main advantage of this structure is that it offers the possibility ofehanging the structure of the database (adding or deleting tables) without... [Pg.235]

Factual databases may provide the electronic version of printed catalogs on chemical compoimds. The catalogs of different suppliers of chemicals serve to identify chemical compounds with their appropriate synonyms, molecular formulas, molecular weight, structure diagrams, and - of course - the price. Sometimes the data are linked to other databases that contain additional information. Structure and substructure search possibihties have now been included in most of the databases of chemical suppliers. [Pg.240]

These are databases that provide links to other databases or data sources. In this case, records describe objects that are other databases. The "Gale Directory of Databases" [14] is one of them. The connection between the databases flows through the meta-data of each database. [Pg.240]

SCISEARCH contains bibliographic citations (links) to publications in science and technology. The database represents the electronic online version of the expanded Science Citation Index (SCI) and parts from the Current Contents of the Institute for Scientific Information (ISI). More than 5900 science and technical journals are included in the database with more than 20 million records (October, 2002). Searches can be performed on the bibliographic data, along with where, and how often, an author or publication is dted. [Pg.241]

Medline covers primarily biomedical literature, containing more than 13 million citations (October, 2002) of articles from more than 4600 journals published since 1958 [18]. The database covers basic biomedical research, clinical sciences, dentistry, pharmacy, veterinary medicine, pre-clinical sciences, and life science. Medline, a subset of PubMed, is a bibliographic database produced by the US Nationcil Library of Medicine (NLM). The database is available free of charge via SciFinder Scholar or PubMed [19]. [Pg.241]

The Chemical Abstracts System (CAS) produces a set of various databases ranging from bibliographic to chemical structure and reaction databases. All the databases originate from the printed media of Chemical Abstracts, which was first published in 1907 and is divided into different topics. Author index, general index, chemical structure index, formula index, and index guide arc entries to the corresponding database (Table 5-3). [Pg.242]

Access to CAS databases is only possible on computers on which the SciFinder software has been installed. Tt is directly available at CAS, computational seiwice centers, or library seiwices with online access. The database is not free of charge access can be obtained only via these services. After the licensed software has been installed and online access is obtained, the program tan be started. [Pg.242]

Thus, if the user wants to look for literature including requested chemicals or reactions, it is possible to query the database by the first option Chemical Substance or Reaction , The compound can be entered as a query in three different ways drawing the chemical structure in a molecule editor (Chemical Structure) searching by names or identification number, such as the CAS Number (Structure Identifier) and searching by molecular formula (Figure 5-12). [Pg.244]

The reinaining five search topics (Research Topic, Author Name, Document Identifier, Company Namc/Organii ation, and Browse Table of Contents arc conducted in a similar fashion, with the input being the only difference between the criteria. Thus, in Research Topic" the entry can be any, or even several, keywords or phrases. In "Author Name", literature written by a specific author will be Found, including alternative spelling, Document Identifier" can also be entered directly in the query. Document identifiers arc CA abstract numbers, patent numbers, patent application numbers, or priority application numbers. The last two search topics (Company Name/Organi2ation, and Browse Table ofContents) allow one to search for literature from specific companies or to view the list of journals which are available in the database. [Pg.246]

The database is produced by the German Chemical Society (GDCh) and provided by MDL Information Systems Inc. [22]. [Pg.248]

Gmelin contains over 800 different chemical and physical property fields, and a detailed index of the original literature. Broad categories of data found in the database include ... [Pg.248]

This database provides thermophysical property data (phase equilibrium data, critical data, transport properties, surface tensions, electrolyte data) for about 21 000 pure compounds and 101 000 mixtures. DETHERM, with its 4.2 million data sets, is produced by Dechema, FIZ Chcmic (Berlin, Germany) and DDBST GmhH (Oldenburg. Germany). Definitions of the more than SOO properties available in the database can be found in NUMERIGUIDE (sec Section 5.18). [Pg.249]

After return to the Commander window, the reaction retrieval may be executed separately 629 Dicls-Aldcr reactions between aliphatic dienes and cyclic dicno-pliiles are found. This partial result can be narrowed down by restricting tlie reaction conditions by means of the fact editor, The search field codes for the yield and the temperature can be found to be RX.NYD and RX.T, respectively, either by browsing the database structure or by applying the Find option, as described in the first example. To ensure that the retrieved reaction conditions belong to the same experiment, both search terms must be connected by means of the PROXIMITY operator. Before the retrieval is started, the option "Refine results in... [Pg.255]

Specinfo, from Chemical Concepts, is a factual database information system for spectroscopic data with more than 660000 digital spectra of 150000 associated structures [24], The database covers nuclear magnetic resonance spectra ( H-, C-, N-, O-, F-, P-NMR), infrared spectra (IR), and mass spectra (MS). In addition, experimental conditions (instrument, solvent, temperature), coupling constants, relaxation time, and bibliographic data are included. The data is cross-linked to CAS Registry, Beilstein, and NUMERIGUIDE. [Pg.258]

Specinfo has an additional tool for calculating NMR spectra that is based on the data sets of the compounds contained in the database. This leads to quite reliable calculated spectral parameters for the compound classes which are registered in the database. [Pg.258]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...