Database comparison histograms

Figure 13.11. Database comparison histograms to illustrate an optimal diverse database selection (upper panel), a highly redundant database selection (middle panel), and a database selection with loss of information (lower panel). The left column show simplified representations of databases as distribution of molecules (filled circles) in an arbitrary 2D molecular property space. In the middle left column, idealized self-similar histograms are given, while the plots in the middle right column show plots obtained by comparing the database subset to the entire database. The right column refers to plots obtained by comparison to a corporate database. Dotted vertical lines indicate the similarity radius for a particular descriptor.

A similar pair-wise comparison can be used to evaluate the self-similarity of a database of structures. This approach also allows direct visual comparison of databases or database subsets if the coefficient distributions are plotted as a graph or histogram and this approach can be used for either self-similarity or for database comparison (Chart 1). [Pg.120]

Eq. [6]). The ES is the bin distance between the most populated bins or statistical modes ( M in Eq. [6]) of the comparison histogram divided by half of the average of the two distributions of individual SE values. For example, if for one database the molecular weight histogram had its most populated bin at bin number 13, and the database to which the first was being compared had its most populated bin number at bin 27 (with all histogram parameters held constant), the intermode bin distance, or Ma - Mb, would be 14. [Pg.278]

FIGURE 16.3 (cont d). Output from the CSD. (c) Molecular formula and three-dimensional diagrams generated from search, (d) Geometric searches leading to, for example, histograms of comparisons of numerical output. Courtesy the Cambridge Structural Database. [Pg.695]

No external databases were available for comparison, so validation in this analysis used internal methods where the complete database was divided into a model development data set and a validation data set. The goodness of fit plot, residual plot, and histograms for the validation data set under Eq. (9.17) are shown in Fig. 9.20. [Pg.333]

Fig. 2.1 Property histograms of fragrance and taste databases in comparison to ChEMBL, ZINC and GDB-13...

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...