Public domain databases

The protein sequence database is also a text-numeric database with bibliographic links. It is the largest public domain protein sequence database. The current PIR-PSD release 75.04 (March, 2003) contains more than 280 000 entries of partial or complete protein sequences with information on functionalities of the protein, taxonomy (description of the biological source of the protein), sequence properties, experimental analyses, and bibliographic references. Queries can be started as a text-based search or a sequence similarity search. PIR-PSD contains annotated protein sequences with a superfamily/family classification. [Pg.261]

Technica has compiled computerized failure rate data from the public domain that can developed into a database. Each database can be customized by adding client plant-specific data and updated easily in its electronic form. CLEF is also software compatible with the IRRAS fault tree package put out by EG4G. Failure rate libraries can be generated and imported from CLEF to the IRRAS program. [Pg.38]

Sources of incident data include a variety of public-domain databases, technical literature, and news accounts (Appendix E). Sources are categorized in Appendix E as reviewed only if incident data did not meet the CSB definition of reactive chemical incident (Section 1.3). [Pg.300]

As with chemical synthesis, the first step when prospecting for a particular biotransformation is to perform a literature search to check whether a suitable precedent has been described. Extensive technical literature resources in the public domain provide both examples of specific enzyme-catalysed reactions and descriptions of transformations where enzyme activity is inferred if not explicitly described. Currently, searches of online databases such as PubMed reveal over 2000 new publications per annum in the subject of enzyme catalysis (excluding reviews). [Pg.86]

A.2 Periodic Tabie of the Elements A.3 Public Domain Databases... [Pg.351]

Several attempts were performed to determine the accuracy of in silica prediction tools developed for lipophilicity (for a recent review, see [34]). The main factor limiting the accuracy of all predictive methods is the training sets used to generate the models, in terms of population and quality of the experimental data they contain. Since most of the methods proposed in commercial software were built with data available in the public domain, their accuracy can be expected to be comparable. Thus, in order to select the most suitable prediction tool, other criteria than accuracy have to be used such as the speed of the calculation for large databases, the price of commercial software or the application domain of the model. [Pg.96]

To make the best use of the technology one of the prerequisite is the availability of extensive databases of toxicogenomixcs data, and there are already several databases, both public and commercial, that incorporate gene expression data with toxicology and biological end-points. The availability of highly annotated databases in the public domain would be extremely important to realize the full potential of such technology, and the formation of international consortia to harmonize the work would be a very effective way to move the field forward. [Pg.348]

Part of the EMBL, the European Bioinformatics Institute (EBI) is a centre for research and services in bioinformatics. The mission of the EBI is to ensure that information from molecular biology and genome research is placed in the public domain and is accessible freely to all facets of the scientific community. The Institute manages databases of biological data including nucleic acid, protein sequences, and macromolecular structures. [Pg.502]

Databases are electronic filing cabinets that serve as a convenient and efficient means of storing vast amounts of information. An important distinction exists between primary (archival) and secondary (curated) databases. The primary databases represent experimental results with some interpretation. Their record is the sequence as it was experimentally derived. The DNA, RNA, or protein sequences are the items to be computed on and worked with as the valuable components of the primary databases. The secondary databases contain the fruits of analyses of the sequences in the primary sources such as patterns, motifs, functional sites, and so on. Most biochemical and/or molecular biology databases in the public domains are flat-file databases. Each entry of a database is given a unique identifier (i.e., an entry name and/or accession number) so that it can be retrieved uniformly by the combination of the database name and the identifier. [Pg.48]

In addition to Target SAR databases, other databases contain information on ADME and Toxicity parameters, information on proteins and signaling systems involved in the pathways specific genes/proteins, such as Sequence information, SNP details and their functions. Information is derived from public domain data as well as patents and scientific journal articles. [Pg.164]

The development of protease assays is very straightforward if (1) the substrate recognition sequence for the protease of interest is known and (2) short peptides can be used as substrates. Extensive amounts of information about substrates are available from electronically searchable databases in the public domain. In addition, several strategies to experimentally identify substrates for proteases of unknown specificity have been described. The most recent method is the so-called PICS technology that covers all the sequences relevant in a human proteome. The MS/MS-based identification of the cleavage products and the subsequent identification of single substrate peptides... [Pg.43]

Today, the number of crystal stmctures in the public domain runs into hundreds of thousands, so that the traditional literature search has long since become impractical. Fortunately, a number of crystallographic databases are available, which store all the pubhshed and some unpublished structures and provide very efficient access to this wealth of information. [Pg.1129]

FIGURE 11.2. The 3D protein structure of fasciculin 1 derived from green mamba (Dendroaspis angusticeps) snake venom. [Image obtained from the public domain at the US National Library of Medicine, National Center for Biotechnology Information, Molecular Modeling Database 3D Structure Database (MMDB)]. [Pg.146]

As an essential component of NIH s Molecular Libraries Roadmap Initiative, PubChem is the largest chemical database in the public domain. As of October 2007 it contains 19 600000 substance records for the Substance database and 10 900 000 unique compound records for the Compound database, with links to bioassay description, literature, references, and assay data for each entry. Its BioAssay Database provides searchable descriptions of nearly 600 bioassays, including descriptions of the conditions and readouts specific to a screening protocol. [Pg.297]

Databases are the essential resource for work in this area. They are the repositories of sequence information on a variety of organisms, induding the human genome, and databases in the public domain can all be acessed via the Internet. Sequences are stored in databases using the internationally agreed one-letter codes from nucleic acids and amino acids listed in Table 8-2. [Pg.314]

In terms of quantities of data, a single microarray experiment looking at 40,000 genes from 10 different samples, under 20 different conditions, produces at least 8,000,000 pieces of data (26). Chip technologies, though originally expensive because of the costs of chip fabrication, are now being used to contribute data to public domain databases and are... [Pg.344]

Whereas most patent sequences are available in the public domain for use in research and for commercial exdoitation, there is a substantial body that are the subject of patent protection. It is often useful when conducting searches of sequence databases to be aware of the sequences that are patented because this may imply certain restrictions on the use to which these sequences can be put in a commercial context. The commercial repository is maintained by Derwent (Thomson Scientific), which generates the Geneseq database of patented sequences. This is a useful collectionbe-cause it contains a broad historical collection as well as more recent examples, although the terms for a commercial license to use the database may be off-putting to some potential users. There are also patent sections of Gen-Bank/EMBL DNA databases too. but these are of limited value because they contain only more recent sequence data. [Pg.346]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...