Nucleic acid sequence databases

Upon its generation, sequence information is normally submitted to various databases. The major databases in which protein primary sequence data are available are listed in Table 2.4. Also included in this table are the major nucleic acid sequence databases, as amino acid sequence information can potentially be derived from these. [Pg.21]

Gouy M., Gautier C., Attimonelli M., Lanave C., di Paola G. (1985). ACNUC - a portable retrieval system for nucleic acid sequence databases logical and physical designs and usage. Comput. Appl. Biosci. 1 167-172. [Pg.408]

Primary protein and nucleic acid sequence databases are so pervasive to our way of thinking in molecular biology that few of us stop to wonder how these ubiquitous tools are built. Understanding how the these databases are put together will allow us to move forward in om understanding of biology and in fully harvesting the abstracted information present in these records. [Pg.45]

Nucleic acid sequence databases typically contain sequence data, which includes information at the level of the gene structures, introns and exons (for eukaryotics), cDNA (complementary DNA), RNA and transcription regulations. The important nucleic acid sequence data repositories as the primary resources known as International Nucleotide Sequence Database Collaboration (INSDC) are ... [Pg.568]

EMBL (European Molecular Biology Laboratory) [33] is a nucleotide sequence database provided from the online host EBl. Release 73 (December, 2002) consists of over 20 million nucleotide sequences with more than 28 billion nucleotides. The information includes sequence name, species, sequence length, promoter, taxonomy, and nucleic acid sequence. [Pg.261]

The following is a list of Web sites that teadets may find useful. The sites have been visited at various times by one of the authots (RKM). Most ate located in the USA, but many provide extensive finks to international sites and to databases (eg, for protein and nucleic acid sequences) and onhne journals. RKM would be grateful if readers who find other useful sites would notify him of their URLs by e-mail (rmurray6745 rogers. com) so that they may be considered for inclusion in fumre editions of this text. [Pg.639]

It is only natural that, to date, bioinformatics tools contribute most to the analysis of amino acid sequences. Only a small amount of current sequence data is subjected to direct experimentation. The majority of amino acid sequences currently accessible in public databases have been derived by in silico translations of nucleic acid sequence data, despite the fact that amino acid sequencing was introduced historically long before nucleic acid sequencing. It is hard to predict the future of the experimental generation of primary data. Certainly, sequencing of nucleic acids continues to become cheaper and faster, and novel techniques may further enhance the production of data. DNA chips are already used to detect differences between very similar sequences other methods may generate DNA data even more efficiently. [Pg.495]

The database would present all known protein sequences and structures nucleic acid sequences exosporium structure metric parameters such as mean density and size spectral properties such as the fluorescence, fluorescence... [Pg.38]

Both the nucleic acid sequences and the protein sequences derived from the biological information are collected in most such databases. Large amounts of data in these databases need to be sorted, stored, retrieved, and analyzed. Selection of subsets of data for particular analysis should also be done. IT providers designed such a data warehouse and developed an interface that provides an important benefit to researchers by making it easy to access the existing information and also to submit new entries (i.e., datamining) (Table 5.6). Middlewares and structured query language (SQL) softwares were developed for this purposes. The former one is used... [Pg.120]

Searches for similar protein and nucleic acid sequences Protein structures on moving 3D coordinates Sequence retrieval system for cross-referencing databases Searches for similar protein sequences Database of gene sequences... [Pg.220]

BLAST Chime Entrez (NCBI) FASTA GenBank (NCBI) Molecules R Us RasMol (Ras Mac) SRS (EMBL) Searches for similar protein and nucleic acid sequences Protein structures on moving 3D coordinates Sequence retrieval system for cross-referencing databases Searches for similar protein sequences Database of gene sequences Provides coordinates for protein 3D structure and manipulation Provides coordinates for protein 3D structure and manipulation Sequence retrieval system for cross referencing databases... [Pg.220]

The advances in protein, and especially DNA, sequencing technology means that there is now a vast amount of primary structural information relating to biological macromolecules and it is hence essential for laboratories in the field to make use of computers to analyse data on protein and nucleic acid sequences. At present (June 1994) there are more than 80000 sequences in the OWL protein sequence database [8] and there are more than 170000 nucleic acid sequences in the EMBL (European Molecular Biology Laboratory) database [9]. [Pg.78]

Sequence similarity database searching and protein sequence analysis constitute one of the most important computational approaches to understanding protein structure and function. Although most computational methods used for nucleic acid sequence analysis are also applicable to protein sequence studies, how to capture the enriched features of amino acid alphabets (Chapter 6) poses a special challenge for protein analysis. [Pg.129]

Taylor JW A contemporary view of the holomorph Nucleic acid sequence and computer databases are changing fungal classification in Reynolds DR, Taylor JW (eds) The Fungal Holomorph Mitotic, Meiotic and Pleomorphic Speciation in Fungal Systematics. Wallingford, CAB International, 1993, pp 3—13. [Pg.283]

Protein and nucleic acid sequences are submitted electronically to the United States Patent and Trademark Office (USPTO) to avoid the introduction of errors in printed documents and to simplify the job of examining patent claims that include biosequences. Short sequence listings are printable in the USPTO s full text database, but for longer sequences the electronic sequence records are stored in the Publication Site for Issued and Published Sequences (PSIPS), located at http // seqdata.uspto.gov/. [Pg.226]

Figure 14.3. (A) Both nucleic acid and protein sequences, as linear polymers, can be represented as strings of English letters. This is, indeed, exactly how they are stored in global, centralized databases of biological data. (B) The genetic code is the system of rules that maps nucleic acid sequences into proteins. Nucleotides are read, ree at a time (as codons ), and converted into a single amino acid by means of tRNAs, specialized adaptor molecules.

Explicit sequences in a PDB hie are provided in lines starting with the keyword SEQRES. Unlike other sequence databases, PDB records use the three-letter amino acid code, and nonstandard amino acids are found in many PDB record sequence entries with arbitrarily chosen three-letter names. Unfortunately, PDB records seem to lack sensible, consistent rules. In the past, some double-helical nucleic acid sequence entries in PDB were specihed in a 3 -to-5 order in an entry above the complementary strand, given in 5 -to-3 order. Although the sequences may be obvious to a user as a representation of a double helix, the 3 -to-5 explicit sequences are nonsense to a computer. Fortunately, the NDB project has hxed many of these types of problems, but the PDB data format is still open to ambiguity disasters from the standpoint of computer readability. As an aside, the most troubling glitch is the inability to encode element type separately from the atom name. Examples of where... [Pg.89]

Evolution of nucleic acid sequence Phylogenetic analyses often examine whether nucleotide substitutions are synonymous (not altering encoded amino acid) or nonsynonymous, and trace the history of gene- TABLE 18.6 Phylogenetic databases and utilities ... [Pg.697]

The increasing numbers of stored protein and nucleic acid sequences, and the recognition that functionally related proteins often had similar sequences, catalyzed the development of statistical techniques for sequence comparison which underlie many of the core bioinformatic methods used in proteomics today. Nucleic acid sequences are stored in three primary sequence databases - GenBank, the EMBL nucleotide sequence database, and the DNA database of Japan (DDBJ) - which exchange data every day. These databases also contain protein sequences that have been translated from DNA sequences. A dedicated protein sequence database, SWISS-PROT, was founded in 1986 and contains highly curated data concerning over 70 000 proteins. A related database, TrEMBL, contains automatic translations of the nucleotide sequences in the EMBL database and is not manually curated. [Pg.3960]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...