PROSITE

The PROSITE database, maintained by the Swiss Institute of Bioinformatics (SIB), was the first database that tried to catalog functional motifs and domains of proteins1621. Nowadays, PROSITE consists of two major parts storing different types of descriptors the pattern library and the profile library1631. [Pg.155]

The pattern entries of the PROSITE database are based on a regular expression syntax, which emphasises only the most highly conserved residues in a protein family, corresponding approximately to what is termed a conservation pattern in Sect. 5.3. In contrast to the other databases mentioned below, PROSITE patterns do not attempt to describe a complete domain or even protein, but rather try to identify the functionally most important residue combinations, which in enzymes typically correspond to the active site. As an example of the PROSITE syntax, K-x(l,2)-[DEj would mean a lysine residue, followed by one or two arbitrary residues, followed by a residue that is either aspartate or glutamate. When a sequence is compared with a library of such patterns, any pattern is found to be either present or absent, no intermediate scores are assigned. Currently, the PRO SITE pattern libraries contains approximately 1400 entries. [Pg.155]

Prosite (http //www.expasy.org/prosite), database of protein domains, families, and functional sites. [Pg.343]

The DR lines link SWISS-PROT to other biomolecular databases. SWISS-PROT is currently linked to 29 different databases. The preceding example shows links to 19 different entries in 6 different databases. The cross references allow users to navigate to linked databases to retrieve part or all of the related information. The format of a DR line, except for cross references to PROSITE (Hofmann et al., 1999), Pfam (Bateman et al., 1999), and the EMBL nucleotide sequence databases (Stoesser et al., 1999), is the following ... [Pg.44]

PROSITE PROSITE protein domains and families database... [Pg.45]

The specific format for cross references to the PROSITE and Pfam protein domain and family databases is ... [Pg.46]

DR PROSITE PFAM ACCESSION NUMBER ENTRY NAME STATUS. [Pg.46]

ACCESSION NUMBER" stands for the accession number of the PROSITE or Pfam pattern, profile or HMM entry "ENTRY NAME" is the name of the entry and "STATUS" is one of the following ... [Pg.46]

First, the taxonomic classification of the TrEMBL entry must be within the known taxonomic range of the PROSITE pattern. For instance, a match of an a priori prokaryotic pattern against a human protein is regarded as false positive and filtered out. [Pg.59]

The raw PROSITE hits and all results of the confirmation steps are stored in a hidden section of the TrEMBL entry, but only those hits that satisfy all confirmation conditions are made publicly visible in a "DR prosite" line. [Pg.59]

Approximately 35% of all TrEMBL entries can be characterized by a PROSITE signature but only approximately 30% of all TrEMBL entries are true positive matches. The characterization based only on PROSITE would lead to 10% to 20% of false-positive assignments. The confirmation steps reduce the level of characterization by nearly a third to 25%. At this stage, we achieve a level of less than 0.07% of false positive assignments. [Pg.59]

PROT entries of the relevant protein family. Other sources include manual descriptions of protein families and translations of trustworthy description libraries into SWISS-PROT wording. For example, there is a /SITE=9,heme iron description for the cytochrome b heme pattern in PROSITE. This is translated to the correct SWISS-PROT syntax ... [Pg.60]

Approximately 20% of the TrEMBL entries get additional annotation as described above. There are two main reasons for this low coverage (1) to avoid overprediction stringent criteria have been used and (2) rules have been created for only one fourth of all PROSITE families. [Pg.61]

It can be difficult if not impossible to find the domain structure of a protein of interest from the primary literature. The sequence may contain many common domains, but these are usually not apparent from searches of literature. Articles defining new domains may include the protein, but only in an alignment figure, which are not searchable. Perhaps, with the advent of online access to articles, the full text including figures may become searchable. Fortunately there have been several attempts to make this hidden information available in away that can be easily searched. These resources, called domain family databases, are exemplified by Prosite, Pfam, Prints, and SMART. These databases gather information from the literature about common domains and make it searchable in a variety of ways. They usually allow a researcher to look at the domain organization of proteins in the sequence database that have been precalculated and also provide a way to search new sequences... [Pg.143]

Prosite is perhaps the best known of the domain databases (Hofmann et al., 1999). The Prosite database is a good source of high quality annotation for protein domain families. Prosite documentation includes a section on the functional meaning of a match to the entry and a list of example members of the family. Prosite documentation also includes literature references and cross links to other databases such as the PDB collection of protein structures (Bernstein et al., 1977). For each Prosite document, there is a Prosite pattern, profile, or both to detect the domain family. The profiles are the most sensitive detection method in Prosite. The Prosite profiles provide Zscores for matches allowing statistical evaluation of the match to a new protein. Profiles are now available for many of the common protein domains. Prosite profiles use the generalized profile software (Bucher et al., 1996). [Pg.144]

The majority of prosite documentation refers to motifs rather than profiles. The motifs are less sensitive than profiles and do not provide statistical scores. The motifs correspond to active sites and other important functional sites in proteins. The motifs are expressed as regular expressions that can be used to detect matching proteins in the database. An example of a motif from Prosite would be the /V-glycosylation motif,... [Pg.144]

The latest release 15.0 contains 1352 patterns and profiles. Prosite contains detailed documentation for each family. [Pg.145]

The BLOCKS database contains blocks for each family (Henikoff and Henikoff, 1991 Henikoff et al., 1999). Blocks are ungapped multiple sequence alignments that are exacdy equivalent to the motifs found in the PRINTS database. The families in BLOCKS are currently derived from Prosite and PRINTS families. The bulk of BLOCKS entries are constructed from Prosite, using the lists of true positive members they provide. Motifs are automatically derived from the members of the Prosite family. Note that BLOCKS does not use the Prosite patterns to construct its motifs. BLOCKS provides functionality to search motifs against motifs this feature is not provided by other databases. [Pg.146]

When a novel homology domain has been discovered, it is possible to store the corresponding domain descriptor (profile or HMM) in a number of dedicated domain databases, which can be used to analyze newly identified sequences for their domain content [9, 10]. Several competing domain- and motif-databases exist, including PROSITE, PFAM, SMART, and Superfam, which contain descriptors for most, if not all, of the known domains involved in the ubiquitin system [11-14]. Recently, a new meta-database named INTERPRO has been established, which tries to combine the descriptors of several domain databases under a single user interface [15]. Pointers to the very useful search engines of the domain databases are provided in Table 12.1. [Pg.321]

See also in sourсe #XX -- [ Pg.127 ]

See also in sourсe #XX -- [ Pg.86 ]

See also in sourсe #XX -- [ Pg.4 , Pg.6 , Pg.8 ]

See also in sourсe #XX -- [ Pg.236 ]

See also in sourсe #XX -- [ Pg.9 ]

See also in sourсe #XX -- [ Pg.348 ]

See also in sourсe #XX -- [ Pg.17 ]

See also in sourсe #XX -- [ Pg.155 , Pg.156 , Pg.158 , Pg.160 ]

See also in sourсe #XX -- [ Pg.82 ]

See also in sourсe #XX -- [ Pg.260 , Pg.261 ]

See also in sourсe #XX -- [ Pg.413 ]

See also in sourсe #XX -- [ Pg.342 ]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...