Sequence databases Information

Secondary sequence databases (information extracted and summarized from primary databases) ... [Pg.3961]

Besides such textual databases that provide bibhographic information, sequence databases have attained an even more important role in biochemistry. Sequence databases are composed of amino add sequences of peptides or proteins as well as nudeotide sequences of nudeic acids. The 20 amino adds are mostly represented by a three-letter code or by one letter according to the biochemical conventions) the four nudeic adds are defined by a one-letter code. Thus the composition of a biochemical compound is searchable by text retrieval methods. [Pg.260]

EMBL (European Molecular Biology Laboratory) [33] is a nucleotide sequence database provided from the online host EBl. Release 73 (December, 2002) consists of over 20 million nucleotide sequences with more than 28 billion nucleotides. The information includes sequence name, species, sequence length, promoter, taxonomy, and nucleic acid sequence. [Pg.261]

The protein sequence database is also a text-numeric database with bibliographic links. It is the largest public domain protein sequence database. The current PIR-PSD release 75.04 (March, 2003) contains more than 280 000 entries of partial or complete protein sequences with information on functionalities of the protein, taxonomy (description of the biological source of the protein), sequence properties, experimental analyses, and bibliographic references. Queries can be started as a text-based search or a sequence similarity search. PIR-PSD contains annotated protein sequences with a superfamily/family classification. [Pg.261]

Sequences of the genes/cDNAs can be retrieved from databases on the Internet at various web sites. For example, GeneBank (at the National Center for Biotechnology Information, NCBI) is at http //www.ncbi.nlm.nih.gov/ Web/Search/index.html. The EMBL Nucleotide Sequence database (through the European Bioinformatic Institute, EBI) can be found at http //www.ebi.ac.uk/queries/queries.html, whilst that of the DNA Data Bank of Japan is at http //www.ddbj.nig.ac.jp/. [Pg.273]

Mann, M., Hojrup, P, Roepstorff, P (1993). Use of mass spectrometric molecular weight information to identify proteins in sequence databases. Biol. Mass Spectrom. 22, 338-345. [Pg.316]

Finally, knowledge of the peptide masses that resulted from the PIR conjugation provides information to identify the parent proteins from which they originated. Peptide mass and sequence databases now are sufficiently developed to provide rapid confirmation of protein-protein interaction partners. [Pg.1015]

Upon its generation, sequence information is normally submitted to various databases. The major databases in which protein primary sequence data are available are listed in Table 2.4. Also included in this table are the major nucleic acid sequence databases, as amino acid sequence information can potentially be derived from these. [Pg.21]

PIR (http //pir.georgetown.edu/), Protein Information Resource, located at Georgetown University Medical Center, which has provided the first international Protein Sequence Database. [Pg.342]

The protein sequence databases are the most comprehensive source of information on proteins. The goal of this chapter is to describe the different protein sequence databases available to researchers. It is necessary to distinguish between universal databases that cover proteins from all species and specialized data collections that store information about specific families or groups of proteins, or about the proteins of a specific organism. Two categories of universal protein sequence databases can be discerned simple archives of sequence data and annotated databases in which additional information has been added to the sequence record. The next section describes the Protein Information Resource (PIR), the oldest protein sequence database SWISS-PROT, an annotated universal sequence database and TrEMBL, the supplement of... [Pg.31]

Protein information resource (Barker et al., 1999) was established in 1984 by the National Biomedical Research Foundation (NBRF) as a successor to the original NBRF Protein Sequence Database, developed over 20 years by the late Margaret O. Dayhoff and published as the Atlas of Protein Sequence and Structure (Dayhoff et al., 1965 Dayhoff, 1979). Since 1988 the database has been maintained by PIR-Intemational, a collaboration between the NBRF, the Munich Information Center for Protein Sequences (MIPS), and the Japan International Protein Information Database (JIPID). [Pg.32]

The DR lines link SWISS-PROT to other biomolecular databases. SWISS-PROT is currently linked to 29 different databases. The preceding example shows links to 19 different entries in 6 different databases. The cross references allow users to navigate to linked databases to retrieve part or all of the related information. The format of a DR line, except for cross references to PROSITE (Hofmann et al., 1999), Pfam (Bateman et al., 1999), and the EMBL nucleotide sequence databases (Stoesser et al., 1999), is the following ... [Pg.44]

PIR Protein sequence database of Protein Information Resource (PIR)... [Pg.45]

SP TR NRDB (or abbreviated SPTR) was created to overcome these limitations. SPTR provides a comprehensive, nonredundant and up-to-date protein sequence database with a high information content. The components are ... [Pg.65]

There are many specialized protein sequence databases. Some of them are quite small and contain only a handful of entries others are wider in scope and larger in size. This section describes three examples of specialized protein sequence databases. As this category of databases is quite changeable, any list provided here would soon be outdated. However, under the URL http //www.expasy.ch/alinks.html Proteins is a www document that lists information sources for molecular biologists that is continually update. [Pg.68]

It can be difficult if not impossible to find the domain structure of a protein of interest from the primary literature. The sequence may contain many common domains, but these are usually not apparent from searches of literature. Articles defining new domains may include the protein, but only in an alignment figure, which are not searchable. Perhaps, with the advent of online access to articles, the full text including figures may become searchable. Fortunately there have been several attempts to make this hidden information available in away that can be easily searched. These resources, called domain family databases, are exemplified by Prosite, Pfam, Prints, and SMART. These databases gather information from the literature about common domains and make it searchable in a variety of ways. They usually allow a researcher to look at the domain organization of proteins in the sequence database that have been precalculated and also provide a way to search new sequences... [Pg.143]

ExPASy (Expert Protein Analysis System, www.expasy.ch) or the National Centre for Biotechnology Information (NCBI, www.ncbi.nlm.gov) websites. Both websites provide bioinformatics tools, links to sequence databases and extensive bibliographic resources. As an example of the wealth of information available on individual enzymes, at the time of writing a search based on nitrilase in the Entrez protein section of NCBI will recover more than 10000 references to nitrilase enzyme amino acid sequences. These can be rapidly screened online by organism, and the individual entries will have links to amino acid and gene sequence, relevant literature and information on protein features (such as conserved domains). [Pg.90]

In addition to sequence structural information on molecules themselves, these molecules have been categorized and classified according to their functions and interactions with drugs or other molecules. For example, some Web sites and databases can be used for studying such specific genetic molecules, including receptors and transporters. [Pg.6]

Selected entries from Methods in Enzymology [vol, page(s)] Databases and Resources Information services of European Bioinformatics Institute, 266, 3 TDB new databases for biological discovery, 266, 27 PIR-international protein sequence database, 266, 41 superfamily classification in PIR-international protein sequence database, 266, 59 gene classification artificial neural system, 266, 71 blocks database and its applications, 266, 88 indexing and using sequence databases, 266, 105 SRS information retrieval system for molecular biology data banks, 266, 114. [Pg.436]

The second approach is the tandem mass spectrometric method (Wilm et ah, 1996 Link et ah, 1999 Yates, 2000). This method relies on fragmentation of individual peptides in the tryptic peptide mixture to gain sequence information. Its main advantage is that sequence information derived from several peptides is much more specific for the identification of a protein than a list of peptide masses. The sequence data can be used to search not only protein sequence databases but also nucleotide databases such as expressed sequence tag (EST) databases and, more recently, even... [Pg.80]

The sequence of the gene can be used to deduce the amino acid sequence of the protein encoded by the gene. The DNA and amino acid sequences can then be used to identify similar sequences in the large sequence databases such as GenBank (www.ncbi.nlm.nh.gov) or SWISSPROT (www.expasy.org). The chemical data obtained form the mutant combined with the sequence data from the gene that is defective in the mutant can then provide information on the function of the gene in the biosynthesis of a certain class of phenolic compounds. [Pg.67]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...