Protein sequence secondary databases

Introduction to Molecular Biology Databases. 1994-2004. R. Apweiler, R. Lopez, B. Marx, UniProt, SWISS-PROT, Switzerland. URL http //www.ebi.ac.uk/swissprot/Publications/ mbdl.html. Contents include bibliographic, taxonomy, nucleotide sequence, genetic, and protein sequence databases PIR, SWISS-PROT, and TrEMBL, and specialized protein, protein sequence, secondary protein, and structme databases. [Pg.52]

Secondary databases for protein sequences Enzyme databases... [Pg.654]

The secondary identifier is the "PROTEIN ID," which stands for the Protein Sequence Identifier. In nucleotide sequence entries, it is a string stored, in a qualifier called protein id, which is tagged to every CDS in the nucleotide database. Example ... [Pg.44]

The final method of RNA structure prediction, empirical algorithms, are also analogous to primary-structure motif detection methods. Known RNA structural motifs are extracted from structural databases, and the primary-structure patterns underlying these motifs are identified. Novel RNA sequences are then scanned for these primary-structure motifs much like a novel protein sequence might be scanned for CDs. In essence, these methods search the primary structure of sequences for conserved motifs that indicate secondary structure. One of the most flexible and powerful empirical tools is RNAMotif, which is freely available for download, but does not have an associated web-server (23). [Pg.527]

Databases are electronic filing cabinets that serve as a convenient and efficient means of storing vast amounts of information. An important distinction exists between primary (archival) and secondary (curated) databases. The primary databases represent experimental results with some interpretation. Their record is the sequence as it was experimentally derived. The DNA, RNA, or protein sequences are the items to be computed on and worked with as the valuable components of the primary databases. The secondary databases contain the fruits of analyses of the sequences in the primary sources such as patterns, motifs, functional sites, and so on. Most biochemical and/or molecular biology databases in the public domains are flat-file databases. Each entry of a database is given a unique identifier (i.e., an entry name and/or accession number) so that it can be retrieved uniformly by the combination of the database name and the identifier. [Pg.48]

Retrieve one primary database of protein sequence and its secondary databases and discuss their relationships. [Pg.52]

There are different classes of protein sequence databases. Primary and secondary databases are used to address different aspects of sequence analysis. Composite databases amalgamate a variety of different primary sources to facilitate sequence searching efficiently. The primary structure (amino acid sequence) of a protein is stored in primary databases as linear alphabets that represent the constituent residues. The secondary structure of a protein corresponding to region of local regularity (e.g., a-helices, /1-strands, and turns), which in sequence alignments are often apparent as conserved motifs, is stored in secondary databases as patterns. The tertiary structure of a protein derived from the packing of its secondary structural elements which may form folds and domains is stored in structure databases as sets of atomic coordinates. Some of the most important protein sequence databases are PIR (Protein Information Resource), SWISS-PROT (at EBI and ExPASy), MIPS (Munich Information Center for Protein Sequences), JIPID (Japanese International Protein Sequence Database), and TrEMBL (at EBI). ... [Pg.213]

R. Lflthy, A. D. McLachlan, and D. Eisenbeig. Secondary structure breed profiles use protein sequence database for structural similarities. Proteins 70 229-239 (1991). [Pg.101]

Reliable secondary structures can enhance the prediction of higher order protein structure, and to a limited extent, secondary-structure motifs can even suggest specific fold structures. Sometimes these secondary structures provide insight into function. Definition of Secondary Structure of Proteins (DSSP), Integrated Sequence-Structure Database (ISSD), Protein Secondary Structure Database (PSSD), and CATH are covered in this section (see Table 2.2). [Pg.20]

Because domains can be considered independent structural and functional units, each domain can be analyzed independently once it has been determined that the query protein contains more than one domain. The identification of functional domains can be performed directly by matching the entire query sequence or a portion of it to a profile from a domain database. Alternatively, the existence of functional domains can be evaluated through indirect inference. For instance, if the query protein contains a well-characterized domain that matches a database profile and the rest of the sequence is not covered by any known domain, that uncovered region (provided it has a reasonable length) can be assumed to contain an additional domain. For cases in which there are no matches to domains or protein families in databases, the existence of multiple domains in the protein of interest can still be inferred through other methods. For example, the connectors between domains tend to be disordered or flexible linkers. Accordingly, predictions of disorder or composition bias, linker predictions, or secondary-structure predictions can be used to infer the spatial location of uncharacterized domains. [Pg.55]

Although this chapter is about the GenBank nucleotide database, GenBank is just one member of a community of databases that includes three important protein databases SWISS-PROT, the Protein Information Resomce (PIR), and the Protein DataBank (PDB). PDB, the database of nucleic acid and protein structures, is described in Chapter 5. SWISS-PROT and PIR can be considered secondary databases, curated databases that add value to what is already present in the primary databases. Both SWISS-PROT and PIR take the majority of their protein sequences from nucleotide databases. A small proportion of SWISS-PROT sequence data is submitted directly or enters through a journal-scanning effort, in which the sequence is (quite literally) taken directly from the published literature. This process, for both SWISS-PROT and PIR, has been described in detail elsewhere (Bairoch and Apweiller, 2000 Barker et al., 2000.)... [Pg.47]

Recently, NCBI introduced a new search service aimed at identifying conserved domains within a protein sequence. The source database for these searches is called the Conserved Domain Database or CDD. This is a secondary database, with entries... [Pg.262]

Nucleic acid and protein sequence analysis, including secondary structure prediction. GenBank, EMBL, SWISS-PROT, PIR, PROSITE, NASITE, and Vector-Bank databases on CD-ROM. Macintosh. PC/GENE for sequence analysis on PCs. IntelliGenetics Suite Sequence Analysis Software and GENESEQ database with patented protein and nucleic acid sequences. Sun and VAX. [Pg.342]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...