Spectrum prediction using databases

FIGURE 6.4 (continued) (c) Derivation of the 3D structure of a compound from its infrared spectrum. After training, the query infrared spectrum is used to predict the RDF descriptor, and a structure database is searched for the most similar descriptor. The corresponding structure is retrieved as the initial model. [Pg.184]

The query compound is considered as unknown that is, only infrared spectrum is used for prediction. The prediction of a molecule is performed by a search for the most similar descriptors in a binary descriptor database. The database contains compressed low-pass filtered D20 transformed RDF descriptors of 64 components each. The descriptors originally used for training (Cartesian RDF, 128 components) were compressed in the same way before the search process. [Pg.184]

The result of spectrum prediction is also quite dependent on the reference data collection used as the knowledge base. From our own experience with the CSEARCH database system usually deviations of 1-3 ppm can be expected, depending on the method used for calculation and the representation of the query structure within the database. [Pg.1856]

Basic conditions for efficient structure elucidation are the collection of spectroscopic data in centralized databases and easy access by all spectroscopic laboratories. Besides the classical similarity searches for structures and spectra, the prediction of data is of increasing importance. Database-supported spectrum predictions not only have the advantage of being very precise but they also enhance their precision automatically when new data are introduced. Another application of structure-oriented spectral databases is their use in partially or fully automated structure elucidation software. [Pg.2632]

A significant number of the spectra in present data collections have been abstracted From the literature. This procedure is acceptable for data from - C (or other heteronuclei) NMR, where the chemical shift information plays the central role. Reduced information of that kind can even be used as a basis for spectrum prediction algorithms, as described later. In other techniques such as IR spectroscopy and H NMR, lineshape and peak patterns play an important role and should be stored in a database, not only for enhanced search capabilities but also as a basis for prediction tools. In MS a similar situation occurs. Peaks with low intensity may contain significant structure information, therefore all peaks above a certain intensity level should be stored. Since we talk about up to 64k data-points for H NMR and around 4k for MS and IR, a manual excerption is impossible. [Pg.2633]

Spectroscopic databases are a very valuable tool for the identification of known and unknown substances. In most spectroscopic laboratories they are available and frequently used. Retrieval of data and spectra similarity searches are established tools for the fast identification of unknown compounds. The spectroscopic information stored in the databases offers the generation of structure-spectra correlations, which can be used for predicting spectral features of new compounds. Effective spectrum prediction tools are available for C NMR and H NMR, and will become available for IR spectroscopy in the near future. The prediction of mass spectra is still a challenge. [Pg.2645]

Several empirical approaches for NMR spectra prediction are based on the availability of large NMR spectral databases. By using special methods for encoding substructures that correspond to particular parts of the NMR spectrum, the correlation of substructures and partial spectra can be modeled. Substructures can be encoded by using the additive model greatly developed by Pretsch [11] and Clerc [12]. The authors represented skeleton structures and substituents by individual codes and calculation rules. A more general additive model was introduced... [Pg.518]

The spectral signals are assigned to the HOSE codes that represent the corresponding carbon atom. This approach has been used to create algorithms that allow the automatic creation of "substructure-sub-spectrum databases that are now used in systems for predicting chemical structures directly from NMR. [Pg.519]

The database approaches are heavily dependent on the size and quality of the database, particularly on the availability of entries that are related to the query structure. Such an approach is relatively fast it is possible to predict the H NMR spectrum of a molecule with 50-100 atoms in a few seconds. The predicted values can be explained on the basis of the structures that were used for the predictions. Additionally, users can augment the database with their own structures and experimental data, allowing improved predictions for compounds bearing similarities to those added. [Pg.522]

So if this all sounds a bit bleak, what s the good news Well, strangely, there is quite a lot. For a start, let s not forget that had the 13C nucleus been the predominant carbon isotope, the development of the whole NMR technique itself would have been held back massively and possibly even totally overlooked as proton spectra would have been too complex to interpret. Whimsical speculation aside, chemical shift prediction is far more reliable for 13C than it is for proton NMR and there are chemical shift databases available to help you that are actually very useful (see Chapter 14). This is because 13 C shifts are less prone to the effects of molecular anisotropy than proton shifts as carbon atoms are more internal to a molecule than the protons and also because as the carbon chemical shifts are spread across approximately 200 ppm of the field (as opposed to the approx. 13 ppm of the proton spectrum), the effects are proportionately less dramatic. This large range of chemical shifts also means that it is relatively unlikely that two 13C nuclei are exactly coincident, though it does happen. [Pg.128]

We have already met one tool that can be used to investigate the links that exist among data items. When the features of a pattern, such as the infrared absorption spectrum of a sample, and information about the class to which it belongs, such as the presence in the molecule of a particular functional group, are known, feedforward neural networks can create a computational model that allows the class to be predicted from the spectrum. These networks might be effective tools to predict suitable protective glove material from a knowledge of molecular structure, but they cannot be used if the classes to which samples in the database are unknown because, in that case, a conventional neural network cannot be trained. [Pg.53]

Naive approaches avoid theoretical assumptions and instead focus on statistics about solved RNA structures, using these to probabilistically align new sequences with solved structures. One elegant approach to this problem has used an rRNA database to generate a novel RNA-specific substitution matrix. The advantage of this approach is that it makes the whole spectrum of primary-structure sequence-analysis tools available for secondary-structure prediction (27). [Pg.527]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...