BLAST database

It is also possible to extend this concept to cover the presence of more than one distinct segm pair in a pair of sequences (for example, if there are three MSPs present with scores of 40, and 50 then one can calculate the probabOity of finding three pairs with at least a score of by chance). The ability of BLAST to provide a quantitative significance of any match fou is a particularly useful feature of the program, which, with its continuing development a availability, has made it the most widely used method for sequence database searching. [Pg.549]

Altschul S F, T L Madden, A A Schaffer, J Zhang, Z Zhang, W Miller and D J Lipman 1997. Gapped BLAST and PSI-BLAST A New Generation of Protein Database Search Programs. Nucleic Acids Research 25 3389-3402. [Pg.574]

Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ, Gapped BLAST and PSI-BLAST a new generation of protein database search programs. Nucleic Acids Res 1997 25 3389-402. [Pg.137]

SPTR is distributed in three files sprot.dat.Z, trembl.dat.Z, and trembl new.dat.Z. These files are, as indicated by their Z extension, Unix compress format files, which, when decompressed, produce ASCII files in SWISS-PROT format. Three others files are also available (sprot.fas.Z, trembl.fas.Z, and trembl new.fas.Z), which are compressed fasta format sequence files that are useful for building the databases used by FASTA, BLAST, and other sequence similarity search programs. These files should not be used for other purposes, because all annotation is lost when using this format. The SPTR files are stored in the directory /pub/databases/sp tr nrdb on the EBI FTP server (ftp.ebi.ac.uk) and in the directory /databases/sp tr nrdb on the ExPASyFTP server (ftp. expasy.ch). [Pg.67]

How does one go about finding all of the relevant proteins in a database once it has been decided to carry out an analysis of an entire protein family The simplest approach is to use similarity search software such as SSEARCH or FASTA (Smith and Waterman, 1981 Pearson and Lipman, 1988) or BLAST (Altschul et al, 1997) with the amino acid sequences of one or two well-known members of the family as queries. The problem is initially the same as that of identifying all proteins that are homologous to a family of proteins, although with some important practical differ-... [Pg.112]

This chapter begins with an introduction to protein domains, followed by the steps usually attempted to define domains in a protein. The process begins by looking for well-known domains in the sequence using domain family databases. Then other less well-known domains are sought in the sequence using two popular methods, HMMER and PSI-BLAST. [Pg.138]

The SBASE database is a collection of annotated protein sequence segments (Murvai et al., 1999). SBASE avoids using consensus methods such as profile-HMMs and uses pairwise methods to detect domains. The database includes more than 130,000 annotated sequence segments that have been clustered into groups on the basis of BLAST similarities. SBASE currently contains 1038 domain families. [Pg.147]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...