The Database Approach

RDF descriptors exhibit a series of unique properties that correlate well with the similarity of structure models. Thus, it would be possible to retrieve a similar molecular model from a descriptor database by selecting the most similar descriptor. It sounds strange to use again a database retrieval method to elucidate the structure, and the question lies at hand Why not directly use an infrared spectra database The answer is simple. Spectral library identification is extremely limited with respect to about 28 million chemical compounds reported in the literature and only about 150,000 spectra available in the largest commercial database. However, in most cases scientists work in a well-defined area of structural chemistry. Structure identification can then be restricted to special databases that already exist. The advantage of the prediction of a descriptor and a subsequent search in a descriptor database is that we can enhance the descriptor database easily with any arbitrary compound, whether or not a corresponding spectrum exists. Thus, the structure space can be enhanced arbitrarily, or extrapolated, whereas the spectrum space is limited. [Pg.181]

This simple fact is the major advantage of the database approach — which is also the basis for the modeling approach — against conventional spectra catalog searches. This approach is designed for use with an expert system and user-supplied databases. It generally provides a fast prediction within 1 to 20 seconds and a higher success rate than any other automated method of structure elucidation with infrared spectra. [Pg.181]

The database approaches are heavily dependent on the size and quality of the database, particularly on the availability of entries that are related to the query structure. Such an approach is relatively fast it is possible to predict the H NMR spectrum of a molecule with 50-100 atoms in a few seconds. The predicted values can be explained on the basis of the structures that were used for the predictions. Additionally, users can augment the database with their own structures and experimental data, allowing improved predictions for compounds bearing similarities to those added. [Pg.522]

In order to make as much data on the structure and its determination available in the databases, approaches for automated data harvesting are being developed. Structure classification schemes, as implemented for example in the SCOP, CATH, andFSSP databases, elucidate the relationship between protein folds and function and shed light on the evolution of protein domains. [Pg.262]

An RDF descriptor cannot be back transformed in explicit mathematic equations to provide the Cartesian coordinates of a 3D structure. However, we will focus on two other methods. The first method relies on the availability of a large diverse descriptor database, called the database approach. The second method, the modeling approach, is a modeling technique designed to work without appropriate descriptor databases. [Pg.180]

The following steps outline the method for the database approach. The investigation can be performed with Algorithms for Radial Coding (ARC), presented in chapter 5 of this volume. [Pg.182]

The results prove the ability of the database approach to make correct predictions for a wide range of compounds if the compounds are available in the RDF descriptor database. Because of the previously mentioned fact that the RDF descriptor database can be compiled with any arbitrary compound, a prediction for any spectrum is generally possible. [Pg.187]

The database approach enables the prediction of structures that are already available in a descriptor database of arbitrary molecules. If the database contains no identical but similar molecules, the modeling approach may provide a correct prediction. This approach is an enhancement of the previously described method that uses a modeling process for optimizing the prediction (Figure 6.9) [52]. [Pg.187]

The most similar molecular descriptor is retrieved from a database in the same way as in the database approach. The retrieved molecule is referred to as the initial model. [Pg.187]

As in the database approach, the structure descriptor is calculated without hydrogen atoms, which can be added implicitly after the decoding process. However, the positions of the hydrogen atoms are stored and used later as potential vectors pointing to new atoms. Besides the previously used similarity criteria (RMS and R) for RDF descriptors, the difference in number and position of peaks between two descriptors can be applied as improvement criterion. [Pg.188]

The computation times are much higher than in the database approach the recalculations in the modeling process must be performed on each relevant initial model found in the database. Depending on the number of operations, this leads to between approximately 500 and 5000 recalculations of new 3D models and RDF descriptors for each initial model. With respect to about 100,000 compounds in the binary database for the initial models, this can result in several million calculations per prediction if several initial models should be regarded. The method can be improved by implementation of a fast 3D structure generator into the prediction software. In this case, a reliable 3D structure is calculated after each modeling operation directly. [Pg.190]

The database approach is more straightforward and is well suited if user-defined databases with similar structures can be compiled. As previously mentioned, the advantage of the database approach is that the database can be compiled individually without needing the corresponding infrared spectra. This method can be seen as an intelligent database search. The success rate of this method depends on the chosen descriptor parameters. With the experimental conditions previously described, the database approach requires no experience in interpreting RDF descriptors and is therefore well suited for routine analysis. [Pg.190]

It can be said that these three main strategies have been applied equally and very often in combination. Basically, the first approach implies the use of a faster computer or a parallel architecture. To some extent it sounds like a brute force approach but the exponential increase of the computer power observed since 1970 has made the hardware solution one of the most popular approaches. The Chemical Abstracts Service (CAS) [10] was among first to use the hardware solution by distributing the CAS database onto several machines. [Pg.297]

A number of other software packages are available to predict NMR spectra. The use of large NMR spectral databases is the most popular approach it utilizes assigned chemical structures. In an advanced approach, parameters such as solvent information can be used to refine the accuracy of the prediction. A typical application works with tables of experimental chemical shifts from experimental NMR spectra. Each shift value is assigned to a specific structural fragment. The query structure is dissected into fragments that are compared with the fragments in the database. For each coincidence, the experimental chemical shift from the database is used to compose the final set of chemical shifts for the... [Pg.519]

Estimates of Composition. The best approach toward estimating the chemistry of most contaminant species is to assume chemical equiHbrium. Computer programs and databases (qv) for calculating chemical equiHbria are widely available (47). Care must be taken that all species of concern are in the database referenced by the program being used, and if necessary, important species must be added in order to get the complete picture. [Pg.58]

There are two main classes of loop modeling methods (1) the database search approaches, where a segment that fits on the anchor core regions is found in a database of all known protein structures [62,94], and (2) the conformational search approaches [95-97]. There are also methods that combine these two approaches [92,98,99]. [Pg.285]

Classicists believe that probability has a precise value uncertainty is in finding the value. Bayesians believe that probability is not precise but distributed over a range of values from heterogeneities in the database, past histories, construction tolerances, etc. This difference is subtle but changes the two approaches. [Pg.50]

Quantitative assessment requires historical data which may be suspect for two reasons. There is the possibility that there are latent accidents not in the database. It is possible that past accidents have been rectified and will not recurr. In the absence of data, judgment based on experience and speculation must be used. Notwithstanding this weakness, the quantitative approach was adopted, d he investigating team identified situations that could cause a number of public casualties. R vents limited to the employees or which might cause single off-site casualties were not included in the assessment. [Pg.433]

This technique is the longest established of all the human reliability quantification methods. It was developed by Dr. A. D. Swain in the late 1960s, originally in the context of military applications. It was subsequently developed further in the nuclear power industry. A comprehensive description of the method and the database used in its application, is contained in Swain and Guttmann (1983). Further developments are described in Swain (1987). The THERP approach is probably the most widely applied quantification technique. This is due to the fact that it provides its own database and uses methods such as event trees which are readily familiar to the engineering risk analyst. The most extensive application of THERP has been in nuclear power, but it has also been used in the military, chemical processing, transport, and other industries. [Pg.227]

Although the steps outlined above would in theory be capable of generating a quantitative database, it seems unrealistic to expect the degree of cooperation that would be required across the industry to develop such a resource. A more likely possibility is that large multinationals will support the development of in-house databases, possibly using the same approach as advocated here. [Pg.254]

The need for an overall and combined chemical structure and data search system became clear to us some time ago, and resulted in the decision to build CHIRBASE, a molecular-oriented factual database. The concept utilized in this database approach is related to the importance of molecular interactions in chiral recognition mechanisms. Solely a chemical information system permits the recognition of the molecular key fingerprints given by the new compound among thousands of fingerprints of known compounds available in a database. [Pg.96]

From these initial results we have seen that this approach has exciting practical issues. However, we have also found that it does not match the accuracy of a database structure search, and the latter will certainly continue to be the best approach for CSP prediction for separation of a particular structure. [Pg.122]

Toxicology Many companies are known to use gene expression profiling to assess the potential toxicity of lead compounds. This approach may require a database of reference compounds with known pharmacological and toxicological properties. Lead compounds can be compared to the database to predict compound-related or mechanism-related toxicity [5]. [Pg.769]

For the Leadership Consortium, i used my networks within BT to identify a iikeiy mentor in a given area. If this cBdn t work, i wouid use the database of BT people and contact details and make an initial approach by e-mail followed up with a telephone call. The e-mail contained a description of the mentee without giving away the identity, and what the mentee hoped to achieve from the mentoring. An attachment gave the background to the scheme and what was expected. [Pg.60]

Structure and substructure searching are very powerful ways of accessing a database, but they do assume that the searcher knows precisely the information that is needed, that is, a specific molecule or a specific class of molecules, respectively. The third approach to database searching, similarity searching, is less precise in nature because it searches the database for molecules that are similar to the user s query, without formally defining exactly how the molecules should be related (Fig. 8.3). [Pg.193]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...