Protein fold, database

Ithough knowledge-based potentials are most popular, it is also possible to use other types potential function. Some of these are more firmly rooted in the fundamental physics of iteratomic interactions whereas others do not necessarily have any physical interpretation all but are able to discriminate the correct fold from decoy structures. These decoy ructures are generated so as to satisfy the basic principles of protein structure such as a ose-packed, hydrophobic core [Park and Levitt 1996]. The fold library is also clearly nportant in threading. For practical purposes the library should obviously not be too irge, but it should be as representative of the different protein folds as possible. To erive a fold database one would typically first use a relatively fast sequence comparison lethod in conjunction with cluster analysis to identify families of homologues, which are ssumed to have the same fold. A sequence identity threshold of about 30% is commonly... [Pg.562]

Holm L and C Sander 1994. The FSSP Database of Structurally Aligned Protein Fold Families. Ni Acids Research 22 3600-3609. [Pg.575]

The first requirement for threading is to have a database of all the known different protein folds. Eisenberg has used his own library of about 800 folds, which represents a minimally redundant set of the more than 6000 structures deposited at the Protein Data Bank. Other groups use databases available on the World Wide Web, where the folds are hierarchically ordered according to structural and functional similarities, such as SCOP, designed by Alexey Murzin and Cyrus Chothia in Cambridge, UK. [Pg.353]

In order to make as much data on the structure and its determination available in the databases, approaches for automated data harvesting are being developed. Structure classification schemes, as implemented for example in the SCOP, CATH, andFSSP databases, elucidate the relationship between protein folds and function and shed light on the evolution of protein domains. [Pg.262]

Sequence conservation is, in general, much weaker than structural conservation. There are proteins, which are clearly not related in sequence but are closely related in 3D-stmcture and fold, like heamoglobin and myoglobin, which have similar functions. In many proteins, fold elements like 4-helical bundles are repeated. Classifications of known structural folds of proteins are organized in the SCOP or CATH database see e.g., http //scop.mrc-lmb.cam.ac.uk/scop/. [Pg.778]

The FID library was applied to the task of predicting the protein folds encoded in complete genomes using the recently developed program IMPALA, which is a modification of PSI-BLAST that effectively reverses the search protocol (Schaffer et al., 1999). PSI-BLAST compares a PSSM to a database of sequences by contrast, a single search by IMPALA is a comparison of a sequence to a library of PSSMs (Fig. 3B). Statistical tests with IMPALA have shown that the theory used for the evaluation of BLAST results is applicable with minimal modifications. [Pg.258]

Computation proteome annotation is the process of proteome database mining, which includes structure/fold prediction and functionality assignment. Methodologies of secondary structure prediction and problems of protein folding are discussed. Approaches to identify functional sites are presented. Protein structure databases are surveyed. Secondary structure predictions and pattern/fold recognition of proteins using the Internet resources are described. [Pg.233]

Sequence Comparisons Proteins called molecular chaperones (described in Chapter 4) assist in the process of protein folding. One class of chaperone found in organisms from bacteria to mammals is heat shock protein 90 (Hsp90). All Hsp90 chaperones contain a 10 amino acid signature sequence, which allows for ready identification of these proteins in sequence databases. Two representations of this signature sequence are shown below. [Pg.38]

One advantage is that the template and test protein do not need to be of similar lengths. A very good fit could be identified for the N-terminal portion of a very long test sequence by a much shorter template Large proteins often adopt different structural domains with identifiable folds. Likewise, a short test sequence could adopt a fold that utilizes only a small portion of the template. This rather straightforward sounding method avoids the problems associated with the identification of secondary structure elements. The assumptions are that most protein folds have already been identified and therefore the unknown structure of the test protein will most likely resemble a fold within the database. It is clear that a novel protein fold will not be identified by this method. [Pg.645]

The ability to recognize the way in which a protein sequence is folded in three dimensions should enable us to model the interactions of specific side-chains in a manner that is simply not possible when considering proteins entirely at the sequence level. This notion has resulted in sequence threading algorithms that assess the level of compatibility of a sequence with a database of fold patterns (65, 66). The principal downside to this approach is that novel structural types cannot be pre- dieted, because at least one example of each fold type must be present in the fold pattern database. Structural genomics may be the means whereby fold pattern databases can be populated with sufficient data to make them useful as predictive tools. [Pg.353]

The number of native contacts is an important and often used parameter in protein folding descriptions. A contact is made when the a-carbons of non-adjacent residues are within a 6 A distance. A native contact is a contact that also occurs in a reference configuration representing the native state. This reference configuration can be taken from, for instance, the Protein Database (PDB) or from simulations (e.g. the most likely structure, minimum free energy, etc). The number of native contacts can be evaluated for arbitrary configurations and measures the similarity between the configuration of interest... [Pg.408]

Hohn, L. and Sander, C., The FSSP database of structurally aligned protein fold families. Nucleic Acids Res., 24, 206, 1996. [Pg.142]

In addition to conventional sequence motifs (Prosite, BLOCKS, PRINTS, etc.), the compilation of structural motifs indicative of specific functions from known structures has been proposed [268]. This should improve even the results obtained with multiple (one-dimensional sequence) patterns exploited in the BLOCKS and PRINTS databases. Recently, the use of models to define approximate structural motifs (sometimes called fuzzy functional forms, FFFs [269]) has been put forward to construct a library of such motifs enhancing the range of applicability of motif searches at the price of reduced sensitivity and specificity. Such approaches are supported by the fact that, often, active sites of proteins necessary for specific functions are much more conserved than the overall protein structure (e.g. bacterial and eukaryotic serine proteases), such that an inexact model could have a partly accurately conserved part responsible for function. As the structural genomics projects produce a more and more comprehensive picture of the structure space with representatives for all major protein folds and with the improved homology search methods linking the related sequences and structures to such representatives, comprehensive libraries of highly discriminative structural motifs are envisionable. [Pg.301]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...