Descriptor database

Several general measures for describing the similarity of descriptors are valuable. The root mean square (RMS) error is calculated by the mean squared individual differences of the components of two descriptors, gj j = gAi-gAr and gy = [gBi-(n is the number of components) and is used as the default measure of similarity for descriptor database searches ... [Pg.81]

Wavelets transforms are useful for compression of descriptors for searches in binary descriptor databases and as alternative representations of molecules for neural networks in classification tasks. [Pg.97]

If a training has been performed in reverse mode, a descriptor command will be available — instead of a property command — which opens a chart containing a comparison of two descriptors. In contrast to a property vector, a descriptor can be directly searched for in a binary descriptor database (e.g., to search for corresponding structures). The result window contains then a hit list and two three-dimensional molecule models one displaying the original molecule of the test set entry (if available), and the other showing the molecule of the actually selected entry in the hit list of similar molecules. [Pg.157]

An RDF descriptor cannot be back transformed in explicit mathematic equations to provide the Cartesian coordinates of a 3D structure. However, we will focus on two other methods. The first method relies on the availability of a large diverse descriptor database, called the database approach. The second method, the modeling approach, is a modeling technique designed to work without appropriate descriptor databases. [Pg.180]

RDF descriptors exhibit a series of unique properties that correlate well with the similarity of structure models. Thus, it would be possible to retrieve a similar molecular model from a descriptor database by selecting the most similar descriptor. It sounds strange to use again a database retrieval method to elucidate the structure, and the question lies at hand Why not directly use an infrared spectra database The answer is simple. Spectral library identification is extremely limited with respect to about 28 million chemical compounds reported in the literature and only about 150,000 spectra available in the largest commercial database. However, in most cases scientists work in a well-defined area of structural chemistry. Structure identification can then be restricted to special databases that already exist. The advantage of the prediction of a descriptor and a subsequent search in a descriptor database is that we can enhance the descriptor database easily with any arbitrary compound, whether or not a corresponding spectrum exists. Thus, the structure space can be enhanced arbitrarily, or extrapolated, whereas the spectrum space is limited. [Pg.181]

A molecule with an RDF descriptor most similar to the one retrieved from the neural network is searched in the binary descriptor database using the minimum RMS error or the highest correlation coefficient between the descriptors. [Pg.184]

The query compound is considered as unknown that is, only infrared spectrum is used for prediction. The prediction of a molecule is performed by a search for the most similar descriptors in a binary descriptor database. The database contains compressed low-pass filtered D20 transformed RDF descriptors of 64 components each. The descriptors originally used for training (Cartesian RDF, 128 components) were compressed in the same way before the search process. [Pg.184]

FIGURE 6.6 Benzene derivatives predicted by a CPG neural network (low-pass Z>2o Cartesian RDF, 128 components). The 2D images of the eight best matching structures from the descriptor database are shown together with the correlation coefficients between their descriptor and the one predicted from the neural network. [Pg.185]

The results prove the ability of the database approach to make correct predictions for a wide range of compounds if the compounds are available in the RDF descriptor database. Because of the previously mentioned fact that the RDF descriptor database can be compiled with any arbitrary compound, a prediction for any spectrum is generally possible. [Pg.187]

The database approach enables the prediction of structures that are already available in a descriptor database of arbitrary molecules. If the database contains no identical but similar molecules, the modeling approach may provide a correct prediction. This approach is an enhancement of the previously described method that uses a modeling process for optimizing the prediction (Figure 6.9) [52]. [Pg.187]

To evaluate the performance of the descriptors one needs a database of compoimds for which the biological activities are known, e.g.. either the MDDR or the NCI databases. Queries are selected that are typical of a drug-hke molecule and from therapeutic categories that... [Pg.312]

The abbreviation QSAR stands for quantitative structure-activity relationships. QSPR means quantitative structure-property relationships. As the properties of an organic compound usually cannot be predicted directly from its molecular structure, an indirect approach Is used to overcome this problem. In the first step numerical descriptors encoding information about the molecular structure are calculated for a set of compounds. Secondly, statistical methods and artificial neural network models are used to predict the property or activity of interest, based on these descriptors or a suitable subset. A typical QSAR/QSPR study comprises the following steps structure entry or start from an existing structure database), descriptor calculation, descriptor selection, model building, model validation. [Pg.432]

The strncturcs in the database arc encoded using the radial distribution function (RDF) as a descriptor (cf Section 8,4,4). [Pg.531]

When a structure is input for spectra simulation this structuie is also coded as an RDF descriptor, which allows an easy comparison with the structures in the database, Those 50 stnictures which are most similai to the input structui e are then selected together with their spectra. [Pg.531]

Multivariate data analysis usually starts with generating a set of spectra and the corresponding chemical structures as a result of a spectrum similarity search in a spectrum database. The peak data are transformed into a set of spectral features and the chemical structures are encoded into molecular descriptors [80]. A spectral feature is a property that can be automatically computed from a mass spectrum. Typical spectral features are the peak intensity at a particular mass/charge value, or logarithmic intensity ratios. The goal of transformation of peak data into spectral features is to obtain descriptors of spectral properties that are more suitable than the original peak list data. [Pg.534]

Private (EPA) Databases. The U.S. EPA maintains a Hst of approximately 600 current information systems, as weU as some of the models and databases used within the organi2ation. The Hst is pubHshed in Information Systems Inventoy (ISI) which is updated yearly and maintained by the Information Management and Services Division of the Office of Information Resources Management (109). ISI Hsts the system name and acronym, system level, responsible organi2ation, contact person, legislative authorities, database descriptors, access information, hardware and software, system abstract, and keywords. [Pg.130]

These structural key descriptors incorporate a remarkable amount of pertinent molecular arrangements covering each type of interaction involved in ligand-receptor bindings [26]. Since every structure in a database is represented by one or more of the 960 key codes available in ISIS, suppose that two molecules include respectively A and B key codes, then the Tanimoto coefficient is given by ... [Pg.113]

Schuffenhauer A, Gillet VJ, Willett P. Similarity searching in files of 3D chemical structures analysis of the BIOSTER database using 2D fingerprints and molecular field descriptors. J Chem Inf Comput Sci 2000 40 295-307. [Pg.208]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...