Primary Sequence Database

Set the task of discovering new, previously unknown druggable receptors, how would we go about it In particular, how would we find a GPCR The first step toward functional annotation of a new GPCR sequence usually involves searching a primary sequence database with pairwise similarity tools. Such searches can reveal clear similarities between the query sequence... [Pg.129]

Mascot (http //www.matrixscience.com/), a search engine that uses mass spectrometry data to identify peptides and proteins from primary sequence databases (MSDB, SwissProt, and others). [Pg.343]

The Protein Data Bank (PDB) is the primary structure database serving as the international repository for the processing and distribution of 3D structures of biomacromolecules (Bernstein et al, 1977). The database is operated by the Research Collaboratory for Structural Bioinformatics (RCSB) and is accessible from the primary RCSB site at http //www.rcsb.org/pdb/ (Berman et al., 2000). Most of the structure fold/motif/domain databases (Conte et al., 2000) and analysis servers (Brenner et al, 2000 Hofmann et al, 2000 Kelley et al., 2000 Shi et al., 2001) utilize 3D-structure information from PDB and sequence information from primary sequence databases. Some of these databases/analysis servers and their URL are listed in Table 12.3. [Pg.242]

There are numerous collection databases that have been derived from primary sequence databases such as GenBank and Swiss-Prot. These special collection databases include... [Pg.252]

All primary sequence databases provide tools for essential sequence analyses. Many servers are also available on the web to perform useful computation on DNA/RNA sequences and structures. These web servers (Table 15.9) provide an array of diverse computational genomic tools. [Pg.571]

The increasing numbers of stored protein and nucleic acid sequences, and the recognition that functionally related proteins often had similar sequences, catalyzed the development of statistical techniques for sequence comparison which underlie many of the core bioinformatic methods used in proteomics today. Nucleic acid sequences are stored in three primary sequence databases - GenBank, the EMBL nucleotide sequence database, and the DNA database of Japan (DDBJ) - which exchange data every day. These databases also contain protein sequences that have been translated from DNA sequences. A dedicated protein sequence database, SWISS-PROT, was founded in 1986 and contains highly curated data concerning over 70 000 proteins. A related database, TrEMBL, contains automatic translations of the nucleotide sequences in the EMBL database and is not manually curated. [Pg.3960]

Primary sequence databases (raw and annotated sequence data) ... [Pg.3961]

The three primary sequence databases exchange and update data on a daily basis Now being integrated as the World-Wide Protein DataBank http //www.wwpdb.org/... [Pg.3961]

Primary structure peptide aird/or nucleotide sequence and the relationship between the PUB sequence and that found in the sequence database(s) StQUHS... [Pg.115]

Table 2.4 The major primary sequence (protein and nucleic acid) databases and the web addresses from which they may be accessed...

Upon its generation, sequence information is normally submitted to various databases. The major databases in which protein primary sequence data are available are listed in Table 2.4. Also included in this table are the major nucleic acid sequence databases, as amino acid sequence information can potentially be derived from these. [Pg.21]

It can be difficult if not impossible to find the domain structure of a protein of interest from the primary literature. The sequence may contain many common domains, but these are usually not apparent from searches of literature. Articles defining new domains may include the protein, but only in an alignment figure, which are not searchable. Perhaps, with the advent of online access to articles, the full text including figures may become searchable. Fortunately there have been several attempts to make this hidden information available in away that can be easily searched. These resources, called domain family databases, are exemplified by Prosite, Pfam, Prints, and SMART. These databases gather information from the literature about common domains and make it searchable in a variety of ways. They usually allow a researcher to look at the domain organization of proteins in the sequence database that have been precalculated and also provide a way to search new sequences... [Pg.143]

Selected entries from Methods in Enzymology [vol, page(s)] General Protein kinase classification, 200, 3 protein kinase catalytic domain sequence database identification of conserved features of primary structure and classification of family members,... [Pg.579]

The advances in protein, and especially DNA, sequencing technology means that there is now a vast amount of primary structural information relating to biological macromolecules and it is hence essential for laboratories in the field to make use of computers to analyse data on protein and nucleic acid sequences. At present (June 1994) there are more than 80000 sequences in the OWL protein sequence database [8] and there are more than 170000 nucleic acid sequences in the EMBL (European Molecular Biology Laboratory) database [9]. [Pg.78]

Tremendous amounts of information from the experimental methods of X-ray crystallography and (NMR) spectroscopy and from in silico modeling have been entered into databases, and this information is shared by many researchers worldwide. Once the amino acid residue sequence of a new protein of interest has been identified, it is intuitively of interest to search these databases for well-characterized proteins with similar sequences, to note their function, and to consider the possibility that the function of the new protein may be similar. Unfortunately, things are seldom that simple in this field. Proteins with similar primary sequences may have quite different structures and functions, and, taking this one step further, some proteins, termed moonlighting proteins, display different functions depending on their immediate cellular surroundings (Ofran and Rost, 2005). [Pg.233]

The important DNA sequence data repositories as the primary resources known as International Nucleotide Sequence Database Collaboration are ... [Pg.166]

EST data are held in the dbEST database, which maintains its own format and identification number system and is accessible via the NCBI Web server, http // www.nbi.nlm.nih.gov/dbEST/. The sequence data, together with a summary of the dbEST annotation, are also distributed as a subsection of the primary DNA database. The publicly available EST analysis tools fall into three categories ... [Pg.190]

There are different classes of protein sequence databases. Primary and secondary databases are used to address different aspects of sequence analysis. Composite databases amalgamate a variety of different primary sources to facilitate sequence searching efficiently. The primary structure (amino acid sequence) of a protein is stored in primary databases as linear alphabets that represent the constituent residues. The secondary structure of a protein corresponding to region of local regularity (e.g., a-helices, /1-strands, and turns), which in sequence alignments are often apparent as conserved motifs, is stored in secondary databases as patterns. The tertiary structure of a protein derived from the packing of its secondary structural elements which may form folds and domains is stored in structure databases as sets of atomic coordinates. Some of the most important protein sequence databases are PIR (Protein Information Resource), SWISS-PROT (at EBI and ExPASy), MIPS (Munich Information Center for Protein Sequences), JIPID (Japanese International Protein Sequence Database), and TrEMBL (at EBI). ... [Pg.213]

Fig. 1. Comparison of the three-dimensional structures of human Interleukin-8 (green) MCP-1 (blue) and Fractalkine (EST Z44443) (red). The 11-8 structure is taken from the Protein Database (PDB) entry (1IL8), and the MCP-1 structure is a model built of the NMR structure of MI P-l (> (PDB entry 1HUM). The intrachain disulfide bonds are shown in yellow. The model for the chemokine domain of Fractalkine was built using the SwissModel server (16,17). As can be seen the three structures show a large degree of conservation of the overall structure, despite a relatively low level of primary sequence identity. The additional three amino acids in Fractalkine are accommodated as a 310 helix between the two N-terminal cysteines. The steric requirements here presumably forbid a CX2C motif. The model building software can be accessed at http www.expasy.ch swissmod SWISS-MODEL.html...

Since that time many more sequences have become available through the advent of recombinant DNA technology and the deduction of amino acid sequences from the base sequences of cloned DNA. At the present time, the primary structures (amino acid sequences) of 14 proteins of the transferrin family have been established. These include seven serum transferrins, from human 10, 36), pig (37), horse 38), rabbit 39), toad Xenopus laevis) 40), sphinx moth (M. sexta) 13), and cockroach Blaberus discoidalis 4) chicken 34, 35) and duck 41) ovotransfer-rins four lactoferrins, from human (11, 42), mouse 43), pig 44) and cattle 45, 46) and the human tumor cell melanotransferrin 47). All of these sequences are available from sequence databases such as EMBL and SWISSPROT. [Pg.393]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...