Domains PROSITE

Prosite is perhaps the best known of the domain databases (Hofmann et al., 1999). The Prosite database is a good source of high quality annotation for protein domain families. Prosite documentation includes a section on the functional meaning of a match to the entry and a list of example members of the family. Prosite documentation also includes literature references and cross links to other databases such as the PDB collection of protein structures (Bernstein et al., 1977). For each Prosite document, there is a Prosite pattern, profile, or both to detect the domain family. The profiles are the most sensitive detection method in Prosite. The Prosite profiles provide Zscores for matches allowing statistical evaluation of the match to a new protein. Profiles are now available for many of the common protein domains. Prosite profiles use the generalized profile software (Bucher et al., 1996). [Pg.144]

Prosite (http //www.expasy.org/prosite), database of protein domains, families, and functional sites. [Pg.343]

PROSITE PROSITE protein domains and families database... [Pg.45]

The specific format for cross references to the PROSITE and Pfam protein domain and family databases is ... [Pg.46]

It can be difficult if not impossible to find the domain structure of a protein of interest from the primary literature. The sequence may contain many common domains, but these are usually not apparent from searches of literature. Articles defining new domains may include the protein, but only in an alignment figure, which are not searchable. Perhaps, with the advent of online access to articles, the full text including figures may become searchable. Fortunately there have been several attempts to make this hidden information available in away that can be easily searched. These resources, called domain family databases, are exemplified by Prosite, Pfam, Prints, and SMART. These databases gather information from the literature about common domains and make it searchable in a variety of ways. They usually allow a researcher to look at the domain organization of proteins in the sequence database that have been precalculated and also provide a way to search new sequences... [Pg.143]

When a novel homology domain has been discovered, it is possible to store the corresponding domain descriptor (profile or HMM) in a number of dedicated domain databases, which can be used to analyze newly identified sequences for their domain content [9, 10]. Several competing domain- and motif-databases exist, including PROSITE, PFAM, SMART, and Superfam, which contain descriptors for most, if not all, of the known domains involved in the ubiquitin system [11-14]. Recently, a new meta-database named INTERPRO has been established, which tries to combine the descriptors of several domain databases under a single user interface [15]. Pointers to the very useful search engines of the domain databases are provided in Table 12.1. [Pg.321]

Pattern and motif analysis Motif Scan PROSITE Pfam http //myhits.isb-sib.ch/cgi-bin/motif scan http //us.expasy.org/prosite/ http //www.sanger.ac.uk/Software/Pfam/ Einding motifs in a sequence. Protein fanulies and domains. Protein fanulies database of hidden Markov models (HMMs). [Pg.8]

Motif umi domain databases. PROSITE (http //www.expasy. org/cgi-bin/nicesite) and PFAM (http //www.sanger.ac.uk/ cgi-bin/Pfam/getacc). [Pg.43]

PROSITE PS00059 66 80 Copper fist DNA binding domain profile... [Pg.264]

Identify the ProSite residues and protein domain families for the following proteins of given amino acid sequences ... [Pg.266]

As a full-scale family classification system, more than 1200 MOTIFIND neural networks were implemented, one for each ProSite protein group. The training set for the neural networks consisted of both positive (ProSite family members) and negative (randomly selected non-members) sequences at a ratio of 1 to 2. ProClass groups non-redundant SwissProt and PIR protein sequence entries into families as defined collectively by PIR superfamilies and ProSite patterns. By joining global and motif similarities in a single classification scheme, ProClass helps to reveal domain and family relationships, and classify multi-domained proteins. [Pg.138]

Sigrist CJA, Cerutti L, de Castro E et al (2010) PROSITE, a protein domain database for functional characterization and annotation. Nucleic Acids Res 38 D161-D166... [Pg.33]

The dual question to the one that assigns related sequences from a database to a given query sequence or family is the question that tries to assign to a query sequence the family that it is a member of or the domains that it contains. One simple yet very effective resource for this purpose is the Prosite database [107, 108] which contains amino acid patterns that are descriptive for particular domains, families, or functions. These patterns allow to specify alternative residues in particular positions or variable length spacers between positions. Matching a sequence against a Prosite entry... [Pg.68]

The PROSITE database, maintained by the Swiss Institute of Bioinformatics (SIB), was the first database that tried to catalog functional motifs and domains of proteins1621. Nowadays, PROSITE consists of two major parts storing different types of descriptors the pattern library and the profile library1631. [Pg.155]

The pattern entries of the PROSITE database are based on a regular expression syntax, which emphasises only the most highly conserved residues in a protein family, corresponding approximately to what is termed a conservation pattern in Sect. 5.3. In contrast to the other databases mentioned below, PROSITE patterns do not attempt to describe a complete domain or even protein, but rather try to identify the functionally most important residue combinations, which in enzymes typically correspond to the active site. As an example of the PROSITE syntax, K-x(l,2)-[DEj would mean a lysine residue, followed by one or two arbitrary residues, followed by a residue that is either aspartate or glutamate. When a sequence is compared with a library of such patterns, any pattern is found to be either present or absent, no intermediate scores are assigned. Currently, the PRO SITE pattern libraries contains approximately 1400 entries. [Pg.155]

PFAM is a database of Hidden Markov Models of protein families and domains, maintained at the Sanger Centre in Cambridge1651. The concept of PFAM is comparable to that of the PROSITE profile section. Similar to the profiles, the HMMs in PFAM have been derived by the iterative refinement procedure mentioned in Sect. 5.2.4. Unlike the PROSITE profiles, which all have been created manually by the curators, the HMMs in PFAM are generated semi-automatically, which accounts for a slightly lower sensitivity. However, this lack is more than compensated for by the facilitated update procedure, allowing the database to grow much faster than PROSITE and to have a shorter generation cycle. Currently, PFAM holds 2727 entries. [Pg.156]

NAD -dependent alcohol dehydrogenases (EC 1.1.1.1) are encoded in the C. elegans genome (Fig. 15.1). The list of standard PEDANT queries includes EC numbers, PROSITE patterns, Pfam domains, BLOCKS, and SCOP domains, as well as PIR keywords and PIR superfamilies. Although PEDANT does not allow the users to enter their own queries, the variety of data available at this Web site makes it a convenient entry point into the held of comparahve genome analysis. [Pg.361]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...