Secondary databases

Databases are electronic filing cabinets that serve as a convenient and efficient means of storing vast amounts of information. An important distinction exists between primary (archival) and secondary (curated) databases. The primary databases represent experimental results with some interpretation. Their record is the sequence as it was experimentally derived. The DNA, RNA, or protein sequences are the items to be computed on and worked with as the valuable components of the primary databases. The secondary databases contain the fruits of analyses of the sequences in the primary sources such as patterns, motifs, functional sites, and so on. Most biochemical and/or molecular biology databases in the public domains are flat-file databases. Each entry of a database is given a unique identifier (i.e., an entry name and/or accession number) so that it can be retrieved uniformly by the combination of the database name and the identifier. [Pg.48]

Retrieve one primary database of protein sequence and its secondary databases and discuss their relationships. [Pg.52]

There are different classes of protein sequence databases. Primary and secondary databases are used to address different aspects of sequence analysis. Composite databases amalgamate a variety of different primary sources to facilitate sequence searching efficiently. The primary structure (amino acid sequence) of a protein is stored in primary databases as linear alphabets that represent the constituent residues. The secondary structure of a protein corresponding to region of local regularity (e.g., a-helices, /1-strands, and turns), which in sequence alignments are often apparent as conserved motifs, is stored in secondary databases as patterns. The tertiary structure of a protein derived from the packing of its secondary structural elements which may form folds and domains is stored in structure databases as sets of atomic coordinates. Some of the most important protein sequence databases are PIR (Protein Information Resource), SWISS-PROT (at EBI and ExPASy), MIPS (Munich Information Center for Protein Sequences), JIPID (Japanese International Protein Sequence Database), and TrEMBL (at EBI). ... [Pg.213]

The application of the secondary databases to study protein structure and function will be discussed in the following chapter (Chapter 12). [Pg.216]

Although this chapter is about the GenBank nucleotide database, GenBank is just one member of a community of databases that includes three important protein databases SWISS-PROT, the Protein Information Resomce (PIR), and the Protein DataBank (PDB). PDB, the database of nucleic acid and protein structures, is described in Chapter 5. SWISS-PROT and PIR can be considered secondary databases, curated databases that add value to what is already present in the primary databases. Both SWISS-PROT and PIR take the majority of their protein sequences from nucleotide databases. A small proportion of SWISS-PROT sequence data is submitted directly or enters through a journal-scanning effort, in which the sequence is (quite literally) taken directly from the published literature. This process, for both SWISS-PROT and PIR, has been described in detail elsewhere (Bairoch and Apweiller, 2000 Barker et al., 2000.)... [Pg.47]

Recently, NCBI introduced a new search service aimed at identifying conserved domains within a protein sequence. The source database for these searches is called the Conserved Domain Database or CDD. This is a secondary database, with entries... [Pg.262]

The secondary databases contain the fruits of analyses of the sequences or structures in the primary sources such as patterns, motifs, functional sites and so on. Many databases known as boutique (specialized) databases select, annotate and recombine data focused on particular topics, and include hnks affording streamlined access to information about subjects of interest. [Pg.551]

The DNA secondary databases offer analytical results (e.g. gene motifs, splice sties, transcription regulators) derived form the primary databases of INSDC, some of which are listed in Table 15.8. [Pg.571]

TABLE 15.8 Some nucleic acid secondary databases... [Pg.573]

Nucleic acid secondary databases Computational genomic serves ... [Pg.593]

Secondary databases for protein sequences Enzyme databases... [Pg.654]

Another important aspect is how processed the information is in the database. Databases are often categorized as primary and secondary ones [74]. Primary databases contain only raw experimental results of a certain kind. As high-throughput technologies evolve in all fields of life sciences, the number of primary databases increases day to day. Secondary databases contain the analysis of raw data found in primary databases. Even if it is possible to calculate some of these parameters by analyzing the data... [Pg.165]

Interpro A search facility that integrates the information from other secondary databases http //www.ebi.ac.uk/interpro/... [Pg.3961]

GenBank has become a major sequence resource containing 150 million sequence records in 2009 and giving rise to hundreds of secondary, specialized databases worldwide. More than 500 milhon records in thirty-plus secondary databases exist at the National Center for Biotechnology Information alone. Worldwide, the number of bioinfor-matics records based on GenBank is in the billions. [Pg.206]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...