Search and Sequence Alignment

TABLE 11.1. Classification of Amino Acids According to their Biochemical Properties [Pg.213]

DATABASE SEARCH AND SEQUENCE ALIGNMENT 11.2.1. Primary Database [Pg.213]

It is much easier and quicker to produce sequence information than to determine 3D structures of proteins in atomic detail. As a consequence, there is a protein sequence/structure deficit. In order to benefit from the wealth of sequence information, we must establish, maintain, and disseminate sequence databases provide user-friendly software to access the information, and design analytical tools to visualize and interpret the structural/functional clues associated with these data. [Pg.213]

There are different classes of protein sequence databases. Primary and secondary databases are used to address different aspects of sequence analysis. Composite databases amalgamate a variety of different primary sources to facilitate sequence searching efficiently. The primary structure (amino acid sequence) of a protein is stored in primary databases as linear alphabets that represent the constituent residues. The secondary structure of a protein corresponding to region of local regularity (e.g., a-helices, /1-strands, and turns), which in sequence alignments are often apparent as conserved motifs, is stored in secondary databases as patterns. The tertiary structure of a protein derived from the packing of its secondary structural elements which may form folds and domains is stored in structure databases as sets of atomic coordinates. Some of the most important protein sequence databases are PIR (Protein Information Resource), SWISS-PROT (at EBI and ExPASy), MIPS (Munich Information Center for Protein Sequences), JIPID (Japanese International Protein Sequence Database), and TrEMBL (at EBI). [Pg.213]

The PIR Protein Sequence Database (Barker et al., 2001 Wu et al., 2002) developed at the National Biomedical Research Foundation (NBRF) has been maintained by PIR-International Protein Sequence Database (PSD), which is the largest publicly distributed and freely available protein sequence database. The consortium includes PIR at the NBRF, MIPS, and JIPID. PIR-International provides online access at http //pir.georgetown.edu to numerous sequence and auxiliary databases. These include PSD (annotated and classified protein sequences), PATCHX (sequences not yet in PSD), ARCHIVE (sequences as originally reported [Pg.213]

Proteomics is concerned with the analysis of the complete protein complements of genomes. Thus proteomics includes not only the identification and quantification of proteins, but also the determination of their localization, modifications, interactions, activities, and functions. This chapter focuses on protein sequences as the sources of biochemical information. Protein sequence databases are surveyed. Similarity search and sequence alignments using the Internet resources are described. [Pg.209]

The selection of a template typically follows BLAST-type searches and sequence alignments. The template selection is based on the similarity of sequences but neglects the possibility that templates with a similar structure may have differing protein functions. Threading provides a way to account for the possibility that functionally different proteins share similar structures. Instead of matching the target sequence to all possible sequences (with or... [Pg.73]

Figure 11,4. ExPASy Proteomic tools. ExPASy server provides various tools for proteomic analysis which can be accessed from ExPASy Proteomic tools. These tools (locals or hyperlinks) include Protein identification and characterization, Translation from DNA sequences to protein sequences. Similarity searches, Pattern and profile searches, Post-translational modification prediction, Primary structure analysis, Secondary structure prediction, Tertiary structure inference, Transmembrane region detection, and Sequence alignment.

Paste the query sequence and click the Search button. The search result (BLAST against PDB and PS I BLAST againt SCOP) with summary of the search and multiple alignment is returned. [Pg.252]

As has been described in Sect. 5.3, the conservation patterns of enzymes are often indicative of the particular family they belong to and can be used for their classification. However, the iterative searches and multiple alignment methods used for their establishment require a certain bioinformatic infrastructure as well as some experience with these issues. If the goal of the analysis is not the detection of novel enzyme families, but rather the classification of a novel sequence into one of the already existing enzyme families, there are a number of protein domain and motif databases that will be useful in this respect[60 61. These databases do not store the sequences themselves but rather work with descriptors of protein families and protein domains. These descriptors can consist of the Profiles or Hidden Markov Models mentioned above, but other types are also being used. With a particular... [Pg.154]

When the search is complete, a results page will appear. If no matches are found, this will be indicated on the page. An example is shown in Figure 4.12. If matches are found, a new page will display a graphic summary, sequence descriptions, and sequences alignments. Figure 4.13 shows an example. [Pg.115]

GD Schuler. Sequence alignment and database searching. Methods Biochem Anal 39 145-171, 1998. [Pg.302]

For each fold one searches for the best alignment of the target sequence that would be compatible with the fold the core should comprise hydrophobic residues and polar residues should be on the outside, predicted helical and strand regions should be aligned to corresponding secondary structure elements in the fold, and so on. In order to match a sequence alignment to a fold, Eisenberg developed a rapid method called the 3D profile method. The environment of each residue position in the known 3D structure is characterized on the basis of three properties (1) the area of the side chain that is buried by other protein atoms, (2) the fraction of side chain area that is covered by polar atoms, and (3) the secondary stmcture, which is classified in three states helix, sheet, and coil. The residue positions are rather arbitrarily divided into six classes by properties 1 and 2, which in combination with property 3 yields 18 environmental classes. This classification of environments enables a protein structure to be coded by a sequence in an 18-letter alphabet, in which each letter represents the environmental class of a residue position. [Pg.353]

The aim of the fust dimension breadth is to reveal sequence-function relationships by comparing protein sequences by sequence similarity. Simple bioinformatic algorithms can be used to compare a pair of related proteins or for sequence similarity searches e.g., BLAST (Basic Local Alignment Search Tool). Improved algorithms allow multiple alignments of larger number of proteins and extraction of consensus sequence pattern and sequence profiles or structural templates, which can be related to some functions, see e.g., under http //www. expasy.ch/tools/ similarity. [Pg.777]

The conserved residues identified as 50 were chosen from a sequence alignment analysis. There are more than 100 residues among a set of 51 mammalian sodium-dependent NTs of known function that are 100% conserved (alignment not shown). To decide which residue is the most appropriate reference residue for each of the TMs, we searched for additional sequences with similarity to the mammalian NTs, and investigated the decrease in conservation for alignments of increasing size. [Pg.214]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...