THE NCBI DATA MODEL

National Center for Biotechnology Information National Library of Medicine National Institutes of Health Bethesda, Maryland [Pg.19]

Department of Molecular Biology and Genetics The Johns Hopkins School of Medicine Baltimore, Maryland [Pg.19]

Most biologists are familiar with the use of animal models to study human diseases. Although a disease that occms in humans may not be foimd in exactly the same form in animals, often an animal disease shares enough attributes with a human counterpart to allow data gathered on the animal disease to he used to make inferences about the process in humans. Mathematical models describing the forces involved in musculoskeletal motions can be built by imagining that muscles are combinations of springs and hydraulic pistons and bones are lever arms, and, often times. [Pg.19]

Homo sapiens D-dopachrome tautomerase (DDT) gene, exon 1. AF012432 [Pg.22]

We have alluded to how the NCBI data model defines sequences in a way that supports a richer and more explicit description of the experimental data than can be... [Pg.23]

The NCBI data model stores most citations as a collection called a Pub-equiv, a set of equivalent citations that includes a reliable identifier (PMID or MUID) and the citation itself. The presence of the citation form allows a useful display without an extra retrieval from the database, whereas the identifier provides a reliable key for linking or indexing the same citation in the record. [Pg.27]

A biological sequence is often most appropriately stored in the context of other, related sequences. For example, a nucleotide sequence and the sequences of the protein products it encodes naturally belong in a set. The NCBI data model provides the Bioseq-set for this purpose. [Pg.34]

Although the DDBJ/EMBL/GenBank feature table allows numerous kinds of features to be included (see Chapter 3), the NCBI data model treats some features as more equal than others. Specifically, certain features directly model the central dogma of molecular biology and are most likely to be used in making coimections between records and in discovering new information by computation. These features are discussed next. [Pg.36]

Descriptors were introduced in the NCBI data model to reduce redimdant information in records. For example, the protein products of a nucleotide sequence should always be from the same biological source (orgaiusm, tissue) as the nucleotide itself. And the publication that describes the sequencing of the DNA in many cases also discusses the translated proteins. By placement of these items as descriptors at the Nuc-prot set level, only one copy of each item is needed to properly describe all the sequences. [Pg.40]

Figure 4.3. The NCBI Desktop displays a graphical overview of how the record is structured in memory, based on the NCBI data model (see Chapter 2). This view is most useful to a software developer or database sequence annotator. In this example, the submission contains a single Nuc-prot set, which in turn contains a nucleotide and two proteins. Each sequence has features associated with it. BioSource and publication descriptors on the Nuc-prot set apply the same organism Drosophila melanogaster) and the same publication, respectively, to all sequences.

Although it may appear from this discussion that NCBI is the center of the sequence universe, many specialized sequence databases throughout the world serve specific groups in the scientific commimity. Often, these databases provide additional information such as phenotypes, experimental conditions, strain crosses, and map features. The data are of great importance to these subsets of the scientific commimity, inasmuch as they can influence rational experimental design, but such types of data do not always fit neatly within the confines of the NCBI data model. Development of specialized databases necessarily ensued, but they are intended to be used as an adjunct to GenBank, not in place of it. It is impossible to discuss all of these kinds of databases here, but, to emphasize the sheer number of such databases that exist. Nucleic Acids Research devotes its first issue every year to papers describing these databases (cf. Baxevanis, 2001). [Pg.178]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...