UniProt Reference databases

The UniProt Reference databases (UniRel) provide nonredundant data collections based on the UniProtKB and UniParc, in order to obtain complete coverage of sequence space at several resolutions. UniRef databases (sequence collections clustered by sequence identity, for performing faster homology searches) are created as representative protein sequence databases witli high information content. [Pg.602]

Search or downlo the UniProt Reference Cluster databases- UniRef coinbines dosely related sequences into a single record tg spaed sequence searches. [Pg.601]

The Universal Protein Resource (UniProt) provides the scientific community with a centralized, authoritative resource for protein sequences and functional information with three database components. (1) The UniProt Knowledgebase (UniProtKB), produced by a combination of automation and over 25 years of human curation, is the central protein sequence database with accurate, consistent, functional annotation and extensive cross-references. (2) The UniProt Reference Clusters (UniRef) provide clustered sets of sequences from UniProtKB (including splice variants and isoforms) in order to obtain complete coverage of sequence space at several resolutions. The UniRef 100 database is particularly useful for Mass Spec identifications as it exposes known sequence variation and splice-form annotation contained in UniProtKB records. (3) The UniProt Archive (UniParc) provides a stable comprehensive sequence collection by storing the complete body of all publicly available protein sequence data. [Pg.204]

To create our terminology containing both internal terms and external terms we semiautomatically extract terms from available external resources (e.g., MeSH, EMTREE, UniProt). Then we fit the extracted terms to our data structure and preserve the reference to the source system because sometimes terms are very specific to certain databases. We refer to the terms specific to a database as local terms. These local terms are stored in a dedicated data structure, the Metastore. It must be noted that we refer to accession codes and identifiers used in databases such as UniProt, RefSeq, and GO as local terms (see Tables 31.1 and 31.2). [Pg.733]

Figure 2.4 Sample entry for human insulin as present in the Swiss-Prot database. Refer to text for further details. Reproduced from the Swiss-Prot database on the Uniprot website htt //www.ebi.uniprot.org/...

The UniProt KB is an automatically and manually annotated protein database drawn from translation of DDBJ/EMBL-Bank/GenBank coding sequences and directly sequenced proteins. Each sequence receives a imique, stable identifier allowing unambiguous identification of any protein across datasets. The KB also provides cross-references to external data collections such as the underlying DNA sequence entries in the DDBJ/EMBL-Bank/GenBank nucleotide sequence databases, 2D PAGE and 3D protein structure databases, various protein domain... [Pg.23]

UniProt is a central repository of protein sequence and function created by joining the information contained in Swiss-Prot, TrEMBL, and PIR. UniProt is comprised of three components, each optimized for different uses. The UniProt Knowledgebase (UniProt) is the central access point for extensive curated protein information, including function, classification, and cross-reference. The UniProt Non-redundant Reference (UniRef) databases combine closely related sequences into a single record to speed searches. The UniProt Archive (UniParc) is a comprehensive repository, reflecting the history of all protein sequences. [Pg.16]

In 2002, UniProt consortium (http //www.uniprot.org) was formed by uniting the SWISS-PROT -I- TrEMBL and PIR-PSD activities by maintaining a high-quality database that serves as a stable, comprehensive, fully classified and accurately annotated protein sequence knowledge base (Figure 16.2). The database offers extensive cross-references and querying interfaces fuUy accessible to the scientific community (Bairoch et al., 2005). The UniProt consortium produces three layers of protein sequence databases ... [Pg.601]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...