Databases definition from sequence

Multiple alignments of repeats are constructed in an iterative manner. The initial alignment is based on definitions from determined protein structures or else from the literature. In the initial database search step, a profile constructed from the multiple alignment is compared with a sequence database. Top scoring sequences are considered using complementary approaches such as PSI-BLAST and FASTA to provide the two thresholds minimum E value and minimum number of repeats per protein required. After one or two iterations, the final alignment and the thresholds are stored in the SMART database to allow the detection of repeats in any sequence. [Pg.212]

The major advantage of the tandem mass spectrometry approach compared to MALDI peptide fingerprinting, is that the sequence information obtained from the peptides is more specific for the identification of a protein than simply determining the mass of the peptides. This permits a search of expressed sequence tag nucleotide databases to discover new human genes based upon identification of the protein. This is a useful approach because, by definition, the genes identified actually express a protein. [Pg.14]

The GENEMAN application is a tool that allows you to access and search for DNA and protein sequences located in six different biological databases. The search for a sequence of interest can be made as broad or restrictive as desired, since there are 12 different fields (definition, reference, source, accession number, etc.) to choose from when the search is performed. In addition to performing database searches to find sequences of interest, GENEMAN allows you to search the database for sequences that share homology with the sequence of interest, or for entries that contain a particular conserved sequence. Any number of different DNA or protein sequences found in these databases can be isolated and stored as a sequence file for later analysis. [Pg.402]

The possibility to identify and quantify protein-splicing variants by mass spectrometry has certainly attracted great interest from researchers in recent years, due to their variety of biological functions and their importance in many health- and disease-related processes. However, database searches are not yet optimized, and the ability to find a balance between the inclusion of all putative proteoform sequences (163) and the reduction of database size to control sequence redundancy and false-positives will definitely determine the success of this approach. [Pg.402]

Figure 11.1. Results of a PROPSEARCH database query based on amino acid composition. The input sequence used was that of the human autoantigen NOR-90. Explanatory material and a histogram of distance scores against the entire target database have been removed for brevity. The columns in the table give the rank of the hit based on the distance score, the SWISS-PROT or PIR identifier, the distance score, the length of the overlap between the query and subject, the positions of the overlap (from POSl to POS2), the calculated pi, and the definition line for the found sequence.

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...