Flat-file databases

Databases are electronic filing cabinets that serve as a convenient and efficient means of storing vast amounts of information. An important distinction exists between primary (archival) and secondary (curated) databases. The primary databases represent experimental results with some interpretation. Their record is the sequence as it was experimentally derived. The DNA, RNA, or protein sequences are the items to be computed on and worked with as the valuable components of the primary databases. The secondary databases contain the fruits of analyses of the sequences in the primary sources such as patterns, motifs, functional sites, and so on. Most biochemical and/or molecular biology databases in the public domains are flat-file databases. Each entry of a database is given a unique identifier (i.e., an entry name and/or accession number) so that it can be retrieved uniformly by the combination of the database name and the identifier. [Pg.48]

The kind of information managed, whether it is sales data, electronic docnments, clinical trial data, or recipes for a manufacturing execution system, is fairly independent of the database type (althongh no one wonld build a flat file database for any of these). The choice of relational vs. hierarchical vs. network is primarily dependent on business needs. [Pg.752]

Pre-1980 —Flat File Storage of Chemical Structures. Computers consisted of mainframe machines (e.g., IBM 3090) and small minicomputers (Digital, Prime). Users connected through low speed serial connections, using "dumb" terminals (no graphics capability) or monochrome vector graphics terminals such as Tektronix and Imlac. Chemical structures were mainly stored as either (l)individ-ual structure files, indexed by name, and handled one or a few structures at a time or (2) in a flat-file database accessed by record number (26). A typical corporate database contained up to a few tens of thousands of structures. [Pg.360]

Relational databases can be combined, giving the whole system immense flexibility. The older flat-file databases store information in files which can be searched and sorted, but cannot be linked to other databases. [Pg.315]

All flat file databases are semi-structured, containing a list of entries, with each entry containing a list of data-fields (e.g an Id, an Accession number, key words, a sequence etc.). Figure 1.6 shows a sample of a database entry. Each of the data-fields consists of strings or tokens. The set of productions for each database must describe how to divide the database into entries and then further into fields and then into the strings or tokens within that data field. It is these tokens within each field which are inserted into an index. [Pg.451]

The Sequence Retrieval System (SRS [19]), initially developed at EMBL and the European Bioinformatics Institute, uses an interesting approach by combining the features of data warehouses and federated database systems. SRS is on the one hand heavily indexing locally stored genomic flat file databases and, on the other hand, it allows one to query DBMS on different sites. An example for a federated approach is the Mouse Federated Database of the Comparative Mouse Genomics Centers Consortium (http //www.niehs.nih. gov/cmgcc/dbmouse.htm). [Pg.196]

Most biochemical and/or molecular biology databases in the public domains are flat-file databases. Each entry of a database is given a unique identifier, i.e. an entry name and/or accession number so that it can be retrieved uniformly by the combination of the database name and the identifier. [Pg.551]

According to an elegant remark by Davies [5], "Modem scientific data handling is multitechnique, multisystem, and manufacturer-independent, with results being processed remotely from the measuring apparatus. Indeed, data exchange and storage are steps of the utmost importance in the data acquisition pathway. The simplest way to store data is to define some special format (i.e., collection of rules) of a flat file. Naturally, one cannot overestimate the importance of databases, which are the subject of Chapter 5 in this book. Below we discuss three simple, yet efficient, data formats. [Pg.209]

In a flat-file sy.stcm the database is called a file. [Pg.229]

Figure 5-3. a) Main organization of a database or container the basic units of a field are bits and bytes, b) Example of data organization in a flat-file. [Pg.229]

Each object has the ability to serialize itself and also to initialize itself from a serialized representation. If the programming language has a reflective facility, you can write a single piece of code to determine the structure of the object and perform serialization and initialization. Java serialization works this way. Of course, flat files do not provide any of the multi-user, concurrency, meta-data, schema evolution, transaction, and recovery facilities that a database provides. [Pg.524]

Figure 7 shows the relationship between a raw data channel and its associated metadata. If we were to choose the item highlighted Instrument Method , the embedded relational database would retrieve the exact version of the instrument method that was used to acquire the raw data. All this occurs in a fraction of a second. Imagine how long it would take using a conventional flat-file system (see Figure 8). [Pg.594]

The core of the EntityDictionaryDao is in the retrieve...() methods. Here we assume the entity dictionaries are stored in a relational database. They can also be accessed from other types of data sources, such as web service, XML, and flat files. The point is to transform them into something that can be accessed easily and quickly by CRS. Take a closer look at the retrievePersonnel() method. Like most other retrieve...() methods, retrievePersonnel() returns a Map. What is in the Map depends on what kind of lookups the clients want to use to access the personnel dictionary. In the context of CRS, the personnel data can be accessed by its entirety, the research site where the person is located, person id, person s full name, or person s username. Therefore, the Map that retrievePersonnel() returns has four Collections—an entire personnel list, a site-people map, a person id-person map, a person s full name-person map, and a username-person map. [Pg.155]

In SRS, meta definition is used to describe objects which the SRS core uses. In the case of a database, a library object must be defined. This object contains the name of the library, what sort of library it is (i.e. what group of databanks it belongs to), the name and whereabouts of the flat files containing the data. It also contains a link to a file containing a list of rules which describe the internal syntax of the databank. These syntax rules will be described below. [Pg.449]

The EMBL library object, (Figure 1.8), defines the EMBL library as being part of the sequence library group, with the format being described in the EMBL FORMAT object (Figure 1.9) and the files which make up the EMBL database being all those files with the extension dat in the EMBL flat file directory. [Pg.452]

Other attempts to solve the problem of integrating Molecular Biology resources can be divided into two possible approaches, either using relational databases to store and retrieve data or to use database specific programs to parse flat files. [Pg.459]

The NCBI does use a flat file approach to parse and retrieve the data in their databases and present it on the web. While the NCBI continutes to add databases, there are not as many databases available as some SRS servers, and hence it is difficult to find relationships that may exist between the data displayed and data in other databases. Since the NCBI present their data on their web site it is also not possible for other academic institutions or companies to bring the software in-house for integration of their proprietry data. [Pg.460]

SRS supports the data structure of individual databases in flat-file format by providing special indexes for implementing list of subentities such as feature tables. SRS has the ability to define indexed links between databases. Once indexed, the links become bidirectional and operate in multistep fashion. They operate on sets of entries and can be weighted and combined with logical operators (AND, OR, and NOT). [Pg.395]

Components can treat a data record flowing in a pipeline in the same manner regardless of its source location or format. Data from disparate sources (databases, flat files, or the Web) are handled in an identical manner, avoiding cumbersome data migration or data integration operations. [Pg.428]

Field/value-based flat files have been very commonly used in bioinformatics. Examples are the flat file libraries from GenBank, European Molecular Biology Laboratory Nucleotide Sequence Database (EMBL), DNA Data Bank of Japan, or Universal Protein Resource (UniProt). These file types are a very limited solution because they lack referencing, vocabulary control, and constraints. In addition, on the file level, there is no inherent locking mechanism that detects when a file is being used or modified. However, these file types are primarily used for reading purposes. [Pg.195]

The GBFF can be separated into three parts the header, which contains the information (descriptors) that apply to the whole record the features, which are the armotations on the record and the nucleotide sequence itself. All major nucleotide database flat files end with // on the last line of the record. [Pg.50]

If the database is not integrated in a database system, the database is called a flat-frle. As the name indicates, the data arc stored in a file that can be used directly by the user. [Pg.228]

Figure 9.3. MACCS— the Molecular ACCess System—an early structure indexing system. This program originally used fixed menus for searching, registration, and reporting. Later versions allowed users to customize the menus. The figure shows the result of a 3D pharmacophore search for ACE inhibitors. Out of a database of 115,000 structures, 21 fit the 2D and 3D requirements of the search query. The user could typically browse the "hits" from the search, save the list of structures to a list file, and output the structures to a structure-data file (SDFile). The MACCS database was a proprietary flat database system in which data of a given type, say, formula, was stored in a given file, indexed by the compound ID number.

Flat Database or File. Essentially a spreadsheet of data, in which a given row contains all the data about a structure. There are no hierarchical relationships in a flat database. Many older and proprietary structure databases were flat in structure. These are in contrast to relational databases that are more commonly used at present. [Pg.404]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...