Classifying Protein Domain Families

In databases built using bottom-up approaches, any computational representation believed to be common to all members of a particular domain family can be used. This representation, in conjunction with appropriate searching software, should optimally be able to distinguish all true family members from the background noise of unrelated proteins stored in sequence databases. This is a challenging problem, tackled with varying degrees of sophistication by different approaches. At the most basic level, the representation can consist of a simple pattern of amino acids common to a particular domain. Such an approach is found in [Pg.81]

These are complementary stratagems and one is not necessarily preferable to another. On the one hand, the pattern-based approach is useful only if the functional determinants for families have been experimentally derived. On the other hand, the multiple alignments approach is useful only if it is assumed that the most sequence-similar proteins possess the most similar functions. In this regard, it is emphasized that a difference of only a single residue, for example, that in an active site, between two [Pg.83]

The PROSITE database is used to determine the domain and the family of the protein sequences that, in turn, are based on biologically significant sites, patterns, and profiles.This database is similar to the HOMologous STRucture Alignment Database " (HOMSTRAD) and the Protein family (Pfam) database, both of which contain domain and family information for proteins. HOMSTRAD uses sequence and structure to group proteins into domains and families. Pfam classifies protein domains and families, based... [Pg.62]

RIZl was originally isolated as a molecule associating with the retinoblastoma protein, Rb (Buyse et al, 1995). RIZl is classified as a member of the PR-domain family (PRDM-2) (the PR-domain family is now a sub-family of the SET-domain family proteins). It exerts H3K9 methyltransferase activity (Derunes et al, 2005). [Pg.339]

The CATH protein domain database (http //www.biochem.ucl.ac.uk/bsm/cath) is a hierarchical classification of protein domain structures into evolutionary families and structural groupings depending on sequence and structure similarity (Pearl et al, 2000). The protein domains are classified according to four major levels. [Pg.240]

As a full-scale family classification system, more than 1200 MOTIFIND neural networks were implemented, one for each ProSite protein group. The training set for the neural networks consisted of both positive (ProSite family members) and negative (randomly selected non-members) sequences at a ratio of 1 to 2. ProClass groups non-redundant SwissProt and PIR protein sequence entries into families as defined collectively by PIR superfamilies and ProSite patterns. By joining global and motif similarities in a single classification scheme, ProClass helps to reveal domain and family relationships, and classify multi-domained proteins. [Pg.138]

Once proteins are divided into domains the domains are then classified hierarchically. At the top of the classification we usually find the class of a protein domain, which is generally determined from its overall composition in secondary structure elements. Three main classes of protein domains exist mainly a domains, mainly (3 domains, and mixed a p domains (the domains in the a — p class are sometimes subdivided into domains with alternating a/p secondary structures and domains with mixed a + p secondary structures). In each class, domains are clustered into folds according to their topology. A fold is determined from the number, arrangement, and connectivity of the domain s secondary structure elements. The folds are subdivided into superfamilies. A superfamily contains protein domains with similar functions, which suggests a common ancestry, often in the absence of detectable sequence similarity. Sequence information defines families, i.e., subclasses of superfamilies that regroup domains whose sequences are similar. [Pg.40]

Comparison (or alignment) of amino acid sequences, also called homology search, often provides first-hand information on such conserved structural features and enables one to classify enzymes into families and predict the possible function of a new enzyme (86). A family of enzymes usually folds into similar 3-D structures, at least at the active site region. A typical example is the serine protease family whose members—trypsin, chymotrypsin, elastase, and subtilisin—commonly contain three active-site residues, Asp/His/Ser, which are known as the catalytic triad or charge relay system. Another example is the conserved features of catalytic domains of the highly diverse protein kinase family. In this kinase family, the ATP-binding (or phosphate-anchoring) sites present a consensus sequence motif of Gly-X-Gly-X-X-Gly (67,87). [Pg.27]

Peptidases have been classified by the MEROPS system since 1993 [2], which has been available viatheMEROPS database since 1996 [3]. The classification is based on sequence and structural similarities. Because peptidases are often multidomain proteins, only the domain directly involved in catalysis, and which beais the active site residues, is used in comparisons. This domain is known as the peptidase unit. Peptidases with statistically significant peptidase unit sequence similarities are included in the same family. To date 186 families of peptidase have been detected. Examples from 86 of these families are known in humans. A family is named from a letter representing the catalytic type ( A for aspartic, G for glutamic, M for metallo, C for cysteine, S for serine and T for threonine) plus a number. Examples of family names are shown in Table 1. There are 53 families of metallopeptidases (24 in human), 14 of aspartic peptidases (three of which are found in human), 62 of cysteine peptidases (19 in human), 42 of serine peptidases (17 in human), four of threonine peptidases (three in human), one of ghitamicpeptidases and nine families for which the catalytic type is unknown (one in human). It should be noted that within a family not all of the members will be peptidases. Usually non-peptidase homologues are a minority and can be easily detected because not all of the active site residues are conserved. [Pg.877]

Mammalian HAT enzymes can be divided into subfamilies (Tan, 2001). However, it is currently difficult to classify a protein as a potential HAT enzyme based on its amino acid sequence, since these subfamilies display no obvious similarity in their primary sequence, nor in the size of their HAT domains or the surrounding protein modules (Kuo and Allis, 1998 Marmorstein, 2001). The only region that is partly conserved between HAT subfamilies, either on the amino acid sequence and/or structural level, is a small subdomain first noticed in GCN5-related N-acetyltransferases, which encompasses the coenzyme A (CoA) binding site (Neuwald and Landsman, 1997 Martinez-Balbas et al, 1998 Yan et al, 2000 Marmorstein and Roth, 2001). Four families of mammalian HATs that have been implicated in human disease will be discussed here. [Pg.235]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...