Representative subsets

OptiDOCK An extension of the CombiDOCK methodology, which selects a diverse but representative subset of compounds that span the structural space encompassed by the full library Tripos, Inc. http //www.tripos.com [33]... [Pg.359]

With this set of five optimized reaction conditions in hand (Fig. 5.6), the production of a small DHPM library was performed. As a set of structurally diverse representative building blocks, 17 individual CH-acidic carbonyl compounds, 25 aldehydes, and 8 ureas/thioureas were chosen. Combination of all these building blocks would lead to a library of 3400 individual DHPMs. To demonstrate the practicability of the presented concept, a representative subset library of 48 DHPM analogues involving all of the aforementioned building blocks was generated [2],... [Pg.102]

Large Data Set. LTAs were analyzed by FTIR for 50 coals, ranging in rank from lignite to Ivb. These were a representative subset of 95 unwashed and clean commercial coals from the eastern, mid-western and western United States and Alberta, Canada. The 50-coal set contained no duplicate coal samples, but different coal samples from the same mine were included. [Pg.45]

Median partitioning is another statistical method distinct from RR The development of this methodology was driven by the need to select representative subsets from very large compound pools. Hierarchical clustering techniques... [Pg.292]

Fig. 1. Median partitioning and compound selection. In this schematic illustration, a two-dimensional chemical space is shown as an example. The axes represent the medians of two uncorrelated (and, therefore, orthogonal) descriptors and dots represent database compounds. In A, a compound database is divided in into equal subpopulations in two steps and each resulting partition is characterized by a unique binary code (shared by molecules occupying this partition). In B, diversity-based compound selection is illustrated. From the center of each partition, a compound is selected to obtain a representative subset. By contrast, C illustrates activity-based compound selection. Here, a known active molecule (gray dot) is added to the source database prior to MP and compounds that ultimately occur in the same partition as this bait molecule are selected as candidates for testing. Finally, D illustrates the effects of descriptor correlation. In this case, the two applied descriptors are significantly correlated and the dashed line represents a diagonal of correlation that affects the compound distribution. As can be seen, descriptor correlation leads to over- and underpopulated partitions.

As mentioned above, the MP approach was originally designed to aid in diversity evaluation of large compound collections and selection of representative subsets (see Note 6). In this case, an MP grid with a predefined number of par-... [Pg.295]

It turns out that MR CISD represents again the most suitable source of the required higher-order clusters. Carefully chosen small reference space MR CISD involves a very small, yet representative, subset of such cluster amplitudes. Moreover, in this way we can also overcome the eventual intruder state problems by including such states in MR CISD, while excluding them from CMS SU CCSD. In other words, while we may have to exclude some references from Ado in order to avoid intruders, we can safely include them in the MR CISD model space Adi. In fact, we can even choose the CMS for Adi. Thus, designating the dimensions of Ado and Adi spaces by M and N, respectively, we refer to the ec SU CCSD method employing an NR-CISD as the external source by the acronym N, M)-CCSD. Thus, with this notation, we have that (N, 1)-CCSD = NR-RMR CCSD and (0, M)-CCSD = MR SU CCSD. Also, (0,1)-CCSD = SR CCSD. For details of this procedure and its applications we refer the reader to Refs. [63,64,71]. [Pg.28]

Clark, R. D. (1997) Optisim an extended dissimilarity selection method for finding diverse representative subsets. J Chem Inf Comput Sci 37(6), 1181-1188. [Pg.89]

In this section we describe the various experimental techniques which have been used to measure aT and we include a critical evaluation of their limitations so as to aid in the intercomparison of the various sets of data. Many experimental groups have contributed to this field, but the aim here is not to discuss every technique in great detail but rather to select a representative subset in order to illustrate the most interesting features. Discussions of the results obtained are given in sections 2.5 and... [Pg.48]

The NCTR Four-Phase system was recentiy applied to two environmental data sets recognized by EPA as representative subsets of potential EDCs ... [Pg.315]

Hudson et al. [26] describes a method called the Most Descriptive Compound (MDC) method for selecting representative subsets and a sphere exclusion method for selecting sets of compounds that cover the available property space. The MDC method aims to select subsets that most effectively represent the compounds in the original collection. It operates by calculating a vector / of N elements where there are N compounds. For each compound, the other compounds are ranked in order of distance to it. The reciprocal of the rank of each compound n is then stored in vector position / . The process... [Pg.353]

Matter [58] has also validated a range of 2-D and 3-D structural descriptors on their ability to predict biological activity and on their ability to sample structurally and biologically diverse datasets effectively. The compound selection techniques used were maximum dissimilarity and clustering. Their results also showed the 2-D fingerprint-based descriptors to be the most effective in selecting representative subsets of bioactive compounds. [Pg.358]

Transferability of spectral data and models in NIR spectroscopy. This subject is an issue that is pertinent to the future use of NIR for bioprocess monitoring. Pre-processing to remove baseline shifts and noise in spectra from individual machines or direct standardisation by data transformation with a representative subset can be used to calibrate across instruments [61]. [Pg.89]

There are two main strategies developed to select diverse and representative subsets of molecules, namely, cell-based methods and distance-based methods. [Pg.83]

Select a factorial design as follows, with one extra point in the centre, to obtain a range of tests which is a representative subset of the original tests ... [Pg.112]

Shemetulskis et al. [44] describe a method based on clustering that was used to compare two external databases with a corporate database. Each database was clustered independently using the Jarvis-Patrick method [46] representative subsets of each database were chosen and the subsets were then mixed and re-clustered. The number of clusters that contain compounds from only one of the databases was then used as an indication of the degree of overlap between the two databases. A limitation of this approach is the computational effort required to re-cluster the mixed subsets. [Pg.59]

Clark, R.D. OptiSim An Extended Dissimilarity Selection Method for Finding Diverse Representative Subsets. J. Chem. Inf. Comput. Sci., 1997,37, 1181-1188. [Pg.247]

While there are no plans for improvements of the platform for the PHED model, some of its features are highly acclaimed, since a representative subset, possibly large enough, to cover the basic needs for the required extrapolation may be selected from the database. This can now be done by computer for EUROPOEM. On the other hand, the quality assessment of the studies from which exposure data are taken is much more science-based in EUROPOEM than for the PHED, where selection only depends on the analytical quality. This latter deficiency should be rectified in the new Agricultural Handlers Exposure Database (AHED) described earlier. [Pg.203]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...