Specialized Protein Sequence Databases

There are many specialized protein sequence databases. Some of them are quite small and contain only a handful of entries others are wider in scope and larger in size. This section describes three examples of specialized protein sequence databases. As this category of databases is quite changeable, any list provided here would soon be outdated. However, under the URL http //www.expasy.ch/alinks.html Proteins is a www document that lists information sources for molecular biologists that is continually update. [Pg.68]

GCRDb (Kolakowski, 1994) is a database of sequences and other data relevant to the biology of G-protein coupled receptors (GCRs), a large protein family of critical components of many different signaling systems in animals. As can be seen in Fig. 9, the information available in a [Pg.68]

IUPHAR 5HT1A-5-Hydroxytrptamine receptor ABBREV 5HT1A SPECIES Homo sapiens UNIQ YYYMYY39YY [Pg.69]

REF%K Human gene for plasma membrane receptor REF%K GB X13556 [Pg.69]

REF%K G-protein coupled receptor glycoprotein membrane protein [Pg.69]

The protein sequence databases are the most comprehensive source of information on proteins. The goal of this chapter is to describe the different protein sequence databases available to researchers. It is necessary to distinguish between universal databases that cover proteins from all species and specialized data collections that store information about specific families or groups of proteins, or about the proteins of a specific organism. Two categories of universal protein sequence databases can be discerned simple archives of sequence data and annotated databases in which additional information has been added to the sequence record. The next section describes the Protein Information Resource (PIR), the oldest protein sequence database SWISS-PROT, an annotated universal sequence database and TrEMBL, the supplement of... [Pg.31]

Developments in mass spectrometry technology, together with the availability of extensive DNA and protein sequence databases and software tools for data mining, has made possible rapid and sensitive mass spectrometry-based procedures for protein identification. Two basic types of mass spectrometers are commonly used for this purpose Matrix-assisted laser desorption/ionization (MALDI)-time-of-flight (TOF) mass spectrometry (MS) and electrospray ionization (ESI)-MS. MALDI-TOF instruments are now quite common in biochemistry laboratories and are very simple to use, requiring no special training. ESI instruments, usually coupled to capillary/nanoLC systems, are more complex and require expert operators. We will therefore focus on the use of MALDI-... [Pg.227]

Introduction to Molecular Biology Databases. 1994-2004. R. Apweiler, R. Lopez, B. Marx, UniProt, SWISS-PROT, Switzerland. URL http //www.ebi.ac.uk/swissprot/Publications/ mbdl.html. Contents include bibliographic, taxonomy, nucleotide sequence, genetic, and protein sequence databases PIR, SWISS-PROT, and TrEMBL, and specialized protein, protein sequence, secondary protein, and structme databases. [Pg.52]

We see two major appearing frontiers for new kinds of molecular data. The first is proteomics (See Chapter 4 of volume 2) and metabolomics. With a combination of 2D gel, mass spectrometry, protein microarray and yeast-two-hybrid methods, a large amount of protein sequence, expression, and interaction data will be produced on a cell-wide level. On the one hand, bioinformatics has to address the challenge of interpreting these data. On the other hand, especially the protein interaction data will provide an interesting basis for probing deeper into the details of regulatory networks. Such data are collected in special protein interaction databases such as DIP [9,10] and BIND [11],... [Pg.611]

National Biomedical Research Foundation specializes in providing a database for protein primary structure. This database contains all the information from the Atlas of Protein Sequence and Structure edited by M.O. Dayhoff. In this database proteins are categorized according to their super family grouping. In addition to the primary structure information, detailed descriptions of proteins, including active site, prosthetic group, etc., are included. [Pg.35]

SwissProt is a computational biology database specializing in protein sequence analysis maintained by the Swiss Federal Institute of Technology (ETH) in Zurich, Switzerland. Like the other general databases described above, a number of more specialized biological databases draw information from this source. [Pg.401]

Sequence similarity database searching and protein sequence analysis constitute one of the most important computational approaches to understanding protein structure and function. Although most computational methods used for nucleic acid sequence analysis are also applicable to protein sequence studies, how to capture the enriched features of amino acid alphabets (Chapter 6) poses a special challenge for protein analysis. [Pg.129]

Figure 14.3. (A) Both nucleic acid and protein sequences, as linear polymers, can be represented as strings of English letters. This is, indeed, exactly how they are stored in global, centralized databases of biological data. (B) The genetic code is the system of rules that maps nucleic acid sequences into proteins. Nucleotides are read, ree at a time (as codons ), and converted into a single amino acid by means of tRNAs, specialized adaptor molecules.

Sequence databases generally specialize in one type of sequence data, i.e. DNA, RNA or protein (Higgins and Taylor, 2(XX)). Structure data must unambiguously define the atomic connectivities and the precise three-dimensional coordinates of all atoms within the molecule. These sequences and structures are the itans to be eomputed on and worked with as the valuable components of the primary databases. Generally, the gateways to sequence and structure databases include ... [Pg.550]

Some of the major protein sequence and structure property databases are listed in Table 2. Although many more general or specialized property databases are available, " the list given in Table 2 is a good start for exploring protein property databases. Table 3 gives a list of gene expression repositories. [Pg.391]

Besides the worldwide WPI database, Derwent provides on the ORBIT system the USPatents database, a bibhographic file of patent front page and cl aim information for U.S. patents since 1971. Derwent also produces a biotechnology database, GENESEQ, that indexes sequence stmetures of proteins or nucleic acids disclosed specifically or genetically in patents. This database is searchable with special sequence software on the InteUiGenetics system, and is a new addition to STN s database catalog. [Pg.54]

Immunoglobulins and T-cell receptors (hie name Immuno.dat) Most REM-TrEMBL entries are immunoglobulins and T-cell receptors. The integration of additional immunoglobulins and T-cell receptors into SWISS-PROT has been stopped, because SWISS-PROT does not want to add all known somatic recombined variations of these proteins to the database. Currently there are more than 18,000 immunoglobulins and T-cell receptors in REM-TrEMBL. SWISS-PROT plans to create a specialized database dealing with these sequences as another supplement to SWISS-PROT but will keep only a representative cross section of these proteins in SWISS-PROT. [Pg.54]

Rather than using an amino acid sequence to search SWISS-PROT, AACompI-dent of ExPASy Proteomic tools (http //www.expasy.ch/tools/) uses the amino acid composition of an unknown protein to identify known proteins of the same composition. The program requires the desired amino acid composition, the pi and molecular weight of the protein (if known), the appropriate taxonomic class, and any special keywords. The user must select from one of six amino acid constellations that influence how the analysis is performed. For each sequence in the database, the algorithm computes a score based on the difference in compositions between the sequence and the query composition. The results, returned by e-mail, are organized as three ranked lists. Because the computed scores are a measure of difference, a score of zero implies that there is exact correspondence between the query compo-... [Pg.210]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...