Domain families using sequence profiles

The SBASE database is a collection of annotated protein sequence segments (Murvai et al., 1999). SBASE avoids using consensus methods such as profile-HMMs and uses pairwise methods to detect domains. The database includes more than 130,000 annotated sequence segments that have been clustered into groups on the basis of BLAST similarities. SBASE currently contains 1038 domain families. [Pg.147]

The PROSITE database is used to determine the domain and the family of the protein sequences that, in turn, are based on biologically significant sites, patterns, and profiles.This database is similar to the HOMologous STRucture Alignment Database " (HOMSTRAD) and the Protein family (Pfam) database, both of which contain domain and family information for proteins. HOMSTRAD uses sequence and structure to group proteins into domains and families. Pfam classifies protein domains and families, based... [Pg.62]

PALI (Phylogeny and Alignment of Homologous Protein Domains) Database. The PALI (v 2.6) database provides three-dimensional structure-based sequence alignments for homologous proteins of known three-dimensional structure (24-26). The protein families have been derived from the SCOP (Structural Classification of Proteins) database (27). There are 2,518 protein families, and using more than one sequence as reference, 37,986 profiles have been generated. [Pg.157]

HMMER [96] is a freely distributable collection of software for protein-sequence analysis using profile HMMs. A profile HMM [97] is a statistical model of a multiple alignment of sequences drawn from a putative protein family. It captures position-specific information about the relative degree of conservation of different columns in an alignment and the relative likelihood of particular residues occurring in specific positions. Profile HMMs can thus capture the essential features of a structural or functional domain. [Pg.33]

Because domains can be considered independent structural and functional units, each domain can be analyzed independently once it has been determined that the query protein contains more than one domain. The identification of functional domains can be performed directly by matching the entire query sequence or a portion of it to a profile from a domain database. Alternatively, the existence of functional domains can be evaluated through indirect inference. For instance, if the query protein contains a well-characterized domain that matches a database profile and the rest of the sequence is not covered by any known domain, that uncovered region (provided it has a reasonable length) can be assumed to contain an additional domain. For cases in which there are no matches to domains or protein families in databases, the existence of multiple domains in the protein of interest can still be inferred through other methods. For example, the connectors between domains tend to be disordered or flexible linkers. Accordingly, predictions of disorder or composition bias, linker predictions, or secondary-structure predictions can be used to infer the spatial location of uncharacterized domains. [Pg.55]

As has been described in Sect. 5.3, the conservation patterns of enzymes are often indicative of the particular family they belong to and can be used for their classification. However, the iterative searches and multiple alignment methods used for their establishment require a certain bioinformatic infrastructure as well as some experience with these issues. If the goal of the analysis is not the detection of novel enzyme families, but rather the classification of a novel sequence into one of the already existing enzyme families, there are a number of protein domain and motif databases that will be useful in this respect[60 61. These databases do not store the sequences themselves but rather work with descriptors of protein families and protein domains. These descriptors can consist of the Profiles or Hidden Markov Models mentioned above, but other types are also being used. With a particular... [Pg.154]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...