Protein folding class prediction

One approach to the protein structure prediction is to classify the folding patterns of globular proteins. This is based on the observation from examining known tertiary structures that the variety of protein folding patterns is significantly restricted. Therefore, it is likely that a protein may belong to one of the previously identified folding patterns. [Pg.123]

Various schemes have been developed for the classification of protein three-dimensional structures. One common scheme is the classification based on the four tertiary super classes, namely, all a (proteins having mainly a-helix secondary structure), all P (mainly P-sheet secondary structure), a+p (segment of a-helices followed by segment of P-sheets), and o/p (alternating or mixed a-helix and P-sheet segments) (Levitt, 1976). A fifth class is often added to account for globular proteins with irregular secondary [Pg.123]

When the similar approach was applied to discriminate members of a given folding class from members of all other classes in the more comprehensive SCOP database, it was shown that specific amino acid properties work differently on different folding classes [Pg.124]

The basic information of protein tertiary structural class can help improve the accuracy of secondary structure prediction (Kneller et al., 1990). Chandonia and Karplus (1995) showed that information obtained from a secondary structure prediction algorithm can be used to improve the accuracy for structural class prediction. The input layer had 26 units coded for the amino acid composition of the protein (20 units), the sequence length (1 unit), and characteristics of the protein (5 units) predicted by a separate secondary structure neural network. The secondary structure characteristics include the predicted percent helix and sheet, the percentage of strong helix and sheet predictions, and the predicted number of alterations between helix and sheet. The output layer had four units, one for each of the tertiary super classes (all-a, all-p, a/p, and other). The inclusion of the single-sequence secondary structure predictions improved the class prediction for non-homologous proteins significantly by more than 11%, from a predictive accuracy of 62.3% to 73.9%. [Pg.125]

Cedano, J., Oliva, B., Aviles, F. X. Querol, E. (1997). TransMem a neural network implemented in Excel spreadsheets for predicting transmembrane domains of proteins. ComputAppl Biosci 13,231-4. [Pg.125]

Dubchak, I., Holbrook, S. R. Kim, S.-H. (1993a). Prediction of protein folding class from amino acid composition. Proteins 16,79-91. [Pg.126]

The inherent variability of predictive success rate depending on the protein fold class brings important observations (1) When reporting accuracies the selection of the test set proteins should be balanced In order to include a representative number of each of the protein fold classes. (2) the prior knowledge of the protein fold class (Chou and Zhang, 1995) can be a valuable aid for the predictions and with that one can use the different algorithms in combination to predict a specific structural element of the chain. [Pg.793]

The relationship between structure and function is a true many-to-many relation. Recent studies have shown that particular functions can be mounted onto several different protein folds [85] and, conversely, several protein fold classes can perform a wide range of functions [259]. This limits our potential of deducing function from structure. But it is still possible to use aforementioned knowledge on the range of folds supporting a particular function and the range of functions implemented by particular folds in order to make functional prediction from structure. [Pg.300]

Table 10.1 summarizes neural network applications for protein structure prediction. Protein secondary structure prediction is often used as the first step toward understanding and predicting tertiary structure because secondary structure elements constitute the building blocks of the folding units. An estimated 90% or so of the residues in most proteins are involved in three classes of secondary structures, the a-helices, p-strands or reverse turns. Related to the secondary structure prediction are also the prediction of solvent accessibility, transmembrane helices, and secondary structure content (10.2). Neural networks have also been applied to protein tertiary structure prediction, such as the prediction of the backbones or side-chain packing, and to structural class prediction (10.3). [Pg.116]

All the methods developed so far try to extract information, directly or indirectly (Lim, 1974), from the ever growing databases of X-ray crystallography resolved protein structures. Unfortunately, the rate at which new structures are added to the structure databases is far from optimal. Chothia (1992) estimated that all proteins, when their structures are known, would fall into about one thousand folding classes, more than half of them yet to be discovered. If so, this means that a great deal of information in the forthcoming structures is not available for the current methods, and therefore we still must rely on the future to see a coherent and realistic increase in the accuracy of secondary structure prediction methods. [Pg.783]

The first observation of this kind of analysis is that for all types of measures utilized the behavior of predictive methods varies significantly according to the protein fold family. This can be relevant in pointing out what method performs better for the prediction of a determined structural element depending on the protein class. Conversely, it also is possible to diagnose critical points where the algorithms fail. [Pg.791]

The impredict algorithm uses a two-layer, feed-forward neural network to assign the predicted type for each residue (Kneller et al., 1990). In making the predictions, the server uses a FASTA format file with the sequence in either one-letter or three-letter code, as well as the folding class of the protein (a, j8, or a//8). Residues are classified... [Pg.264]

The Option line specifies the folding class of the protein n uses no folding class for the prediction, a specifies a, b specifies (3, and a/b specifies a/fi. Only one sequence may be submitted per E-mail message. The results returned by the server are shown in modified form in Eigure 11.4. [Pg.265]

Eortunately, a 3D model does not have to be absolutely perfect to be helpful in biology, as demonstrated by the applications listed above. However, the type of question that can be addressed with a particular model does depend on the model s accuracy. At the low end of the accuracy spectrum, there are models that are based on less than 25% sequence identity and have sometimes less than 50% of their atoms within 3.5 A of their correct positions. However, such models still have the correct fold, and even knowing only the fold of a protein is frequently sufficient to predict its approximate biochemical function. More specifically, only nine out of 80 fold families known in 1994 contained proteins (domains) that were not in the same functional class, although 32% of all protein structures belonged to one of the nine superfolds [229]. Models in this low range of accuracy combined with model evaluation can be used for confirming or rejecting a match between remotely related proteins [9,58]. [Pg.295]

For each fold one searches for the best alignment of the target sequence that would be compatible with the fold the core should comprise hydrophobic residues and polar residues should be on the outside, predicted helical and strand regions should be aligned to corresponding secondary structure elements in the fold, and so on. In order to match a sequence alignment to a fold, Eisenberg developed a rapid method called the 3D profile method. The environment of each residue position in the known 3D structure is characterized on the basis of three properties (1) the area of the side chain that is buried by other protein atoms, (2) the fraction of side chain area that is covered by polar atoms, and (3) the secondary stmcture, which is classified in three states helix, sheet, and coil. The residue positions are rather arbitrarily divided into six classes by properties 1 and 2, which in combination with property 3 yields 18 environmental classes. This classification of environments enables a protein structure to be coded by a sequence in an 18-letter alphabet, in which each letter represents the environmental class of a residue position. [Pg.353]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...