Databases, of amino acid sequences

The basic structural unit of these two-sheet p helix structures contains 18 amino acids, three in each p strand and six in each loop. A specific amino acid sequence pattern identifies this unit namely a double repeat of a nine-residue consensus sequence Gly-Gly-X-Gly-X-Asp-X-U-X where X is any amino acid and U is large, hydrophobic and frequently leucine. The first six residues form the loop and the last three form a p strand with the side chain of U involved in the hydrophobic packing of the two p sheets. The loops are stabilized by calcium ions which bind to the Asp residue (Figure S.28). This sequence pattern can be used to search for possible two-sheet p structures in databases of amino acid sequences of proteins of unknown structure. [Pg.84]

Modifications to the well-established technique of mass spectrometry now make it possible to determine protein masses with an accuracy of one mass unit or less in favorable cases. This ability has opened up many new horizons in the study of proteins. Most powerfully, the mass of a peptide or protein can be used as a name tag for picking out a specific molecule in a vast database of amino acid sequences. [Pg.92]

It is only natural that, to date, bioinformatics tools contribute most to the analysis of amino acid sequences. Only a small amount of current sequence data is subjected to direct experimentation. The majority of amino acid sequences currently accessible in public databases have been derived by in silico translations of nucleic acid sequence data, despite the fact that amino acid sequencing was introduced historically long before nucleic acid sequencing. It is hard to predict the future of the experimental generation of primary data. Certainly, sequencing of nucleic acids continues to become cheaper and faster, and novel techniques may further enhance the production of data. DNA chips are already used to detect differences between very similar sequences other methods may generate DNA data even more efficiently. [Pg.495]

Klein et al. (1986) reported the compositional or physicochemical properties (attributes) of amino acid sequences in 1603 protein sequences to establish a classification system for 26 protein functions. These results showed that three or four attributes were generally sufficient to distinguish each of the 26 functional categories from the remainder of the database. The attributes used were related to hydrophobicity, charge and its distribution (frequency or... [Pg.308]

Other very useful databases (mainly in terms of amino acid sequences in allergens) are the ones which characterize proteins (e.g., The UniProt Knowledgebase— http //expasy.org/sprot/. This database consists of... [Pg.408]

Since that time many more sequences have become available through the advent of recombinant DNA technology and the deduction of amino acid sequences from the base sequences of cloned DNA. At the present time, the primary structures (amino acid sequences) of 14 proteins of the transferrin family have been established. These include seven serum transferrins, from human 10, 36), pig (37), horse 38), rabbit 39), toad Xenopus laevis) 40), sphinx moth (M. sexta) 13), and cockroach Blaberus discoidalis 4) chicken 34, 35) and duck 41) ovotransfer-rins four lactoferrins, from human (11, 42), mouse 43), pig 44) and cattle 45, 46) and the human tumor cell melanotransferrin 47). All of these sequences are available from sequence databases such as EMBL and SWISSPROT. [Pg.393]

A set of amino acid sequences called reference sequences Is collected from the NBRF database for the proteins that have the same function. For the 1-th segment of the target protein, similarity search Is done against one of the reference sequences. When the similarity score of the best local alignment Is above a certain level (maxd score), the similarity score is saved. This step Is repeated against the other one of the reference sequences until all the reference sequences are searched In pairs by the i-th segment. [Pg.114]

A sequence tag from an unknown protein allows a number of further options for characterisation. Even a short stretch of amino acid sequence provides a powerful means of interrogating a protein database, and may provide a useful alternative to PMF in poor quality samples that may have peptides derived from more than one protein. More advanced database searching systems will find proteins with homologous sequences to that of the peptide tag. A good example of this type of database searching system is BLAST (Basic Local Alignment Sequence Tool), (Altschul et al., 1997). A modified form of this (MS BLAST) was used in conjunction with PMF to characterise the proteome of the yeast Pichia pastoris, whose genome has not been fully sequenced... [Pg.191]

Because of the advances in the gas-phase ionization of biomacromolecules, such as electrospray ionization (ESI) and matrix-assisted laser desorption ionization (MALDI), mass spectrometry (MS) has become a powerful tool for detection, identification, and structural analysis of proteins, peptides, and polynucleotides. The molecules ionized in a gas phase by these methods are subsequently analyzed by sector, quadrupole, ion-trap, or time-of-flight mass spectrometers. In particular, the MS systems consisting of ESI and triple-stage quadrupole (ESI/TSQ) or ion-trap (IT) mass spectrometry and MALDI time-of-flight (MALDl/TOF) mass spectrometry have been most widely applied to the field of protein chemistry for the accurate determination of molecular mass of proteins and peptides, determination of amino acid sequence, identification of proteins by peptide mass databases, and analysis of posttranslational modifications such as phosphorylation and glycosylation. In general, current techniques allow detenni-... [Pg.646]

Besides such textual databases that provide bibhographic information, sequence databases have attained an even more important role in biochemistry. Sequence databases are composed of amino add sequences of peptides or proteins as well as nudeotide sequences of nudeic acids. The 20 amino adds are mostly represented by a three-letter code or by one letter according to the biochemical conventions) the four nudeic adds are defined by a one-letter code. Thus the composition of a biochemical compound is searchable by text retrieval methods. [Pg.260]

Figure 17.2 An example of prediction of the conformations of three CDR regions of a monoclonal antibody (top row) compared with the unrefined x-ray structure (bottom row). LI and L2 are CDR regions of the light chain, and HI is from the heavy chain. The amino acid sequences of the loop regions were modeled by comparison with the sequences of loop regions selected from a database of known antibody structures. The three-dimensional structure of two of the loop regions, LI and L2, were in good agreement with the preliminary x-ray structure, whereas HI was not. However, during later refinement of the x-ray structure errors were found in the conformations of HI, and in the refined x-ray structure this loop was found to agree with the predicted conformations. In fact, all six loop conformations were correctly predicted in this case. (From C. Chothia et al.. Science 233 755-758, 1986.)...

Homologous proteins have similar three-dimensional structures. They contain a core region, a scaffold of secondary structure elements, where the folds of the polypeptide chains are very similar. Loop regions that connect the building blocks of the scaffolds can vary considerably both in length and in structure. From a database of known immunoglobulin structures it has, nevertheless, been possible to predict successfully the conformation of hyper-variable loop regions of antibodies of known amino acid sequence. [Pg.370]

The World Wide Web has transformed the way in which we obtain and analyze published information on proteins. What only a few years ago would take days or weeks and require the use of expensive computer workstations can now be achieved in a few minutes or hours using personal computers, both PCs and Macintosh, connected to the internet. The Web contains hundreds of sites of Interest to molecular biologists, many of which are listed in Pedro s BioMolecular Research Tools (http // www.fmi.ch/biology/research tools.html). Many sites provide free access to databases that make it very easy to obtain information on structurally related proteins, the amino acid sequences of homologous proteins, relevant literature references, medical information and metabolic pathways. This development has opened up new opportunities for even non-specialists to view and manipulate a structure of interest or to carry out amino-acid sequence comparisons, and one can now rapidly obtain an overview of a particular area of molecular biology. We shall here describe some Web sites that are of interest from a structural point of view. Updated links to these sites can be found in the Introduction to Protein Structure Web site (http // WWW.ProteinStructure.com/). [Pg.393]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...