SMART protein database

It can be difficult if not impossible to find the domain structure of a protein of interest from the primary literature. The sequence may contain many common domains, but these are usually not apparent from searches of literature. Articles defining new domains may include the protein, but only in an alignment figure, which are not searchable. Perhaps, with the advent of online access to articles, the full text including figures may become searchable. Fortunately there have been several attempts to make this hidden information available in away that can be easily searched. These resources, called domain family databases, are exemplified by Prosite, Pfam, Prints, and SMART. These databases gather information from the literature about common domains and make it searchable in a variety of ways. They usually allow a researcher to look at the domain organization of proteins in the sequence database that have been precalculated and also provide a way to search new sequences... [Pg.143]

DOMO [21] contain protein families and their alignments, so that changes in the proteins can be observed. Sequences also share domains, or modules, and the ProDom [20] and Smart [2] databases contain these domains and the proteins which fit those domains. [Pg.444]

Harvester also collects protein domain analyses from the SMART protein server (Letunic et al, 2004). Information from the UNIPROT (Apweiler et al, 2004) and SOURCE (Diehn et al, 2003) databases confirm or complete the SMART protein domain information. [Pg.19]

Similar residues in the cores of protein structures especially hydrophobic residues at the same positions, are responsible for common folds of homologous proteins. Certain sequence profiles of conserved residue successions have been identified which give rise to a common fold of protein domains. They are organized in the smart database (simple modular architecture research tool) http //smait.embl-heidelberg.de. [Pg.778]

In the detection of repeats using SMART an algorithm is used that derives similarity thresholds that are dependent on the number of repeats already found in a protein sequence (Andrade et al., 1999b). These thresholds are based on the assumption that suboptimal local alignment scores of a profile/HMM against a random sequence database are well described by an extreme value distribution (EVD). The result of this protocol is that acceptance thresholds for suboptimal alignments are lowered below the optimal scores of nonhomologous sequences. [Pg.211]

Multiple alignments of repeats are constructed in an iterative manner. The initial alignment is based on definitions from determined protein structures or else from the literature. In the initial database search step, a profile constructed from the multiple alignment is compared with a sequence database. Top scoring sequences are considered using complementary approaches such as PSI-BLAST and FASTA to provide the two thresholds minimum E value and minimum number of repeats per protein required. After one or two iterations, the final alignment and the thresholds are stored in the SMART database to allow the detection of repeats in any sequence. [Pg.212]

A variety of domain or motif families occur only as extensions to other domains. The Bruton s tyrosine kinase motif (BTK), for example, is found only at the C terminus of PH domains. Similarly, a C-terminal extension (the S TK X domain) to some subfamilies of serine/threonine kinases (S TK) is not found in isolation. Cases where only the extension, and not the preceding domain, is found are strong evidence that the proteins are wrongly assembled from genomic sequence or else represent partial cDNA sequences (Fig. 9, see Color insert). Indeed, all five proteins annotated in SMART as containing a S TK X domain with no catalytic domain are noted to be fragments in their corresponding sequence database entries. [Pg.236]

A variety of databases and online tools exist to facilitate searches for protein motifs (Table 6). The most comprehensive resource for the detection of large protein motifs is the Conserved Domain Database (CDD) provided by NCBI. The CDD includes all data present in the SMART and PFAM databases, along with some manually curated entries. All protein-protein BLAST... [Pg.522]

SMART (Simple Modular Architecture Research Tool) [12-14] is a Web-based resource used for the annotation of protein domains and the analysis of domain architectures, with particular emphasis on mobile eukaryotic domains. Extensive annotation for each domain family is available, providing information relating to function, subcellular localization, phyletic distribution and tertiary structure. The January 2002 release has added more than 200 hand-curated domain models. This brings the total to over 600 domain families that are widely represented among nuclear, signalling and extracellular proteins. Annotation now includes links to the Online Mendelian Inheritance in Man (OMIM) database in cases where a human disease is associated with one or more mutations in a particular domain, (http //smart, embl-heidelberg. de/help/smart about. shtml)... [Pg.18]

Enabling the automated annotation of protein domain structure databases like PFAM and SMART contain HMMER models and curated alignments of known domains. These models can be used to specify a putative domain structure for novel protein query sequences. [Pg.33]

In addition, several databases (with accompanying search tools) have recently been developed for detecting domains and exploring architectures of multidomain proteins Pfam (Bateman et al., 2000), ProDom (Corpet et al., 2000), and SMART (Schultz et al., 1998, 2000). [Pg.373]

Although not comprehensive as of this writing, SMART seems to be the most advanced of these systems, combining high sensitivity of domain detection with acciuacy, high speed, and extremely informative presentation of domain architectures. Rapid searches for protein domains, based on a modification of the PSl-BLAST program is now available through the Conserved Domains Database (CDD) at NCBI (cf. Chapter 11). [Pg.373]

CDD Conserved domain database Covers protein domain information from Pfam, SMART, and COG databases http //www.ncbi.nlm.nih.gov/Structure/ cdd/cdd.shtml... [Pg.392]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...