THE GENBANK SEQUENCE DATABASE

National Center for Biotechnology Information National Library of Medicine National Institutes of Health Bethesda, Maryland [Pg.45]

Centre for Molecular Medicine and Therapeutics Children s and Women s Health Centre of British Columbia [Pg.45]

University of British Columbia Vancouver, British Columbia [Pg.45]

Primary protein and nucleic acid sequence databases are so pervasive to our way of thinking in molecular biology that few of us stop to wonder how these ubiquitous tools are built. Understanding how the these databases are put together will allow us to move forward in om understanding of biology and in fully harvesting the abstracted information present in these records. [Pg.45]

The next lines, the OS (Organism Species) and OC (Organism Classification), describe the species from which the protein has been derived. The OS line shows the scientific name of the organism and, if existing, the common English name. The OC lines give the taxonomic tree. SWISS-PROT, as well as the DDBJ/EMBL/GenBank nucleotide sequence databases, uses the NCBI taxonomy to standardize the taxonomies of the molecular sequence databases. [Pg.37]

Various verification steps have been introduced to ensure that SPTR is comprehensive and contains all relevant data sources. The main source of new protein sequences is the translations of CDS in the nucleotide sequence databases. The up-to-date inclusion of new protein sequence entries is ensured by the weekly translation of EMBL-NEW (the updates to the EMBL nucleotide sequence database). The three collaborating nucleotide sequence databases DDBJ, EMBL, and GenBank exchange their data on a daily basis. Therefore any protein coding sequence submitted to DDBJ/EMBL/GenBank will appear in SPTR within 2 weeks in the worst case and within less than 1 week in the average case. [Pg.66]

The sequence of the gene can be used to deduce the amino acid sequence of the protein encoded by the gene. The DNA and amino acid sequences can then be used to identify similar sequences in the large sequence databases such as GenBank (www.ncbi.nlm.nh.gov) or SWISSPROT (www.expasy.org). The chemical data obtained form the mutant combined with the sequence data from the gene that is defective in the mutant can then provide information on the function of the gene in the biosynthesis of a certain class of phenolic compounds. [Pg.67]

As mentioned in the introduction, there is now a considerable number of molecular biology and related databases available. While some are freely available, such as the DNA sequence collection EMBL [30] and GENBANK [18], others are only freely available to the academic community, such as the protein sequence database SWISS-PROT [15], and others are only available on subscription, such as the EST databases available from the company Incyte Genomics. Academics and pharmaceutical companies also have their own proprietary data which must be integrated into a system so that relationships with publicly available data can be found. [Pg.441]

Although this chapter is about the GenBank nucleotide database, GenBank is just one member of a community of databases that includes three important protein databases SWISS-PROT, the Protein Information Resomce (PIR), and the Protein DataBank (PDB). PDB, the database of nucleic acid and protein structures, is described in Chapter 5. SWISS-PROT and PIR can be considered secondary databases, curated databases that add value to what is already present in the primary databases. Both SWISS-PROT and PIR take the majority of their protein sequences from nucleotide databases. A small proportion of SWISS-PROT sequence data is submitted directly or enters through a journal-scanning effort, in which the sequence is (quite literally) taken directly from the published literature. This process, for both SWISS-PROT and PIR, has been described in detail elsewhere (Bairoch and Apweiller, 2000 Barker et al., 2000.)... [Pg.47]

A full release of GenBank occins on a bimonthly schedule with incremental (and nonincremental) daily updates available by anonymous FTP. The International Nucleotide Sequence Database Collaboration also exchanges new and updated records daily. Therefore, all sequences present in GenBank are also present in DDBJ and EMBL, as described in the introduction to this chapter. The three databases rely on a common data format for information described in the feature table documentation (see below). This represents the lingua franca for nucleotide sequence database annotations. Together, the nucleotide sequence databases have developed defined submission procedures (see Chapter 4), a series of guidelines for the content and format of all records. [Pg.49]

Fig. 1. Frequency of chemokine cDNAs in the GenBank EST database. The complete cDNA sequence for each chemokine was compared against dbEST using the BLASTN program. Results reflect sequences deposited through February 1998 (1,442,166 entries).

GenBank (NCBI, USA) EMBL Nucleotide Sequence Database (Europe) DDBJ (Japan) The three main nucleotide sequence databases, which are synchronised daiiy... [Pg.571]

The combination of GenBank cDNA sequence comparison and data from the public SNP database reveal the presence of several SNPs in MRP3, many of which involve non-synonymous amino acid changes (Table 9.3). These polymorphisms have not been functionally characterized and, therefore, the clinical pharmacological impact of such MRP3 variants remains unknown. [Pg.196]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...