Dissimilarity-based methods

A major potential drawback with cluster analysis and dissimilarity-based methods f selecting diverse compounds is that there is no easy way to quantify how completel one has filled the available chemical space or to identify whether there are any hole This is a key advantage of the partition-based approaches (also known, as cell-bas( methods). A number of axes are defined, each corresponding to a descriptor or son combination of descriptors. Each axis is divided into a number of bins. If there are axes and each is divided into b bins then the number of cells in the multidimension space so created is ... [Pg.701]

Given the variety of different descriptors and subset selection methods that are available, several studies have been carried out in an attempt to validate both the compound selection methods and the various descriptors. To some extent the choice of descriptors and subset selection methods are interlinked. For example, partitioning schemes are restricted to low-dimensional descriptors such as physicochemical descriptors, whereas clustering and dissimilarity-based methods can be used with high dimensional descriptors such as fingerprints. [Pg.357]

There is already an extensive literature relating to compound-selection methods, from which it is possible to identify four major classes of method although, as we shall see, there is some degree of overlap between these four classes, viz cluster-based methods, dissimilarity-based methods, partition-based methods and optimisation methods. The next four sections of this chapter present the various algorithms that have been suggested for each approach we then discuss comparisons and applications of these algorithms, and the chapter concludes with some thoughts on further developments in the field. [Pg.117]

Dissimilarity-based methods seek to identify a subset comprising the n most diverse molecules in a dataset containing N molecules (where, typically, n N). There are no less than w size-H subsets that can be generated from a size-/V dataset, where... [Pg.121]

Cluster-based and dissimilarity-based methods for compound selection were first discussed in the Eighties but it is only in the last few years that the area has attracted substantial attention as a result of the need to provide a rational basis for the design of combinatorial libraries. The four previous sections have provided an overview of the main types of selection method that are already available, with further approaches continuing to appear in the literature. Given this array of possible techniques, it is appropriate to consider ways in which the various methods can be evaluated, both in absolute terms and when compared with each other. A method can be evaluated in terms of its efficiency, /.< ., the computational costs associated with its use, and its effectiveness, /.< ., the extent to which it achieves its aims. As we shall see, it is not immediately obvious how effectiveness should be quantified and we shall thus consider the question of efficiency first, focusing upon the normal algorithmic criteria of CPU time and storage requirements. [Pg.129]

Compound selection is a core process of library design, and three main methods can be mentioned. Dissimilarity-based methods select compounds in terms of similar-ity/distance between individuals in chemical space. Clustering methods first group compounds into clusters based on similarity/distance and then choose representative compounds from different clusters. Partitioning methods first create a uniform cell space that subdivides the chemical space, then assign all virtual compounds to the relative cells according to their properties, and finally choose representative compounds from different cells. [Pg.184]

Dissimilarity-Based Methods. The methods for compound selection described above essentially group compounds either by partitioning into cells or by clustering. Dissimilarity-based methods (66) avoid this step. [Pg.206]

In dissimilarity-based compound selection the required subset of molecules is identified directly, using an appropriate measure of dissimilarity (often taken to be the complement of the similarity). This contrasts with the two-stage procedure in cluster analysis, where it is first necessary to group together the molecules and then decide which to select. Most methods for dissimilarity-based selection fall into one of two categories maximum dissimilarity algorithms and sphere exclusion algorithms [Snarey et al. 1997]. [Pg.699]

Dissimilarity-based compound selection (DECS) methods involve selecting a subset of compounds directly based on pairwise dissimilarities [37]. The first compound is selected, either at random or as the one that is most dissimilar to all others in the database, and is placed in the subset. The subset is then built up stepwise by selecting one compound at a time until it is of the required size. In each iteration, the next compound to be selected is the one that is most dissimilar to those already in the subset, with the dissimilarity normally being computed by the MaxMin approach [38]. Here, each database compound is compared with each compound in the subset and its nearest neighbor is identified the database compound that is selected is the one that has the maximum dissimilarity to its nearest neighbor in the subset. [Pg.199]

Dissimilarity and clustering methods only describe the compounds that are in the input set voids in diversity space are not obvious, and if compounds are added then the set must be re-analyzed. Cell-based partitioning methods address these problems by dividing descriptor space into cells, and then populating those cells with compounds [67, 68]. The library is chosen to contain representatives from each cell. The use of a partition-based method with BCUT descriptors [69] to design an NMR screening library has recently been described [70]. [Pg.401]

The selections of compounds are made using a variety of methods, such as dissimilarity selection (16), optiverse library selection (17), Jarvis-Park clustering (18), and cell-based methods (19). All these methods attempt to choose a set of compounds that represent the molecular diversity of the available compounds as efficiently as possible. A consequence of this is that only a few compounds around any given molecular scaffold may be present in a HTS screening... [Pg.87]

Many different methods have been developed for compound selection. They include clustering, dissimilarity-based compound selection, partitioning a collection of compounds into a low-dimensional space and the use of optimization methods such as simulated annealing and genetic algorithms. Filtering techniques are often employed prior to compound selection to remove undesirable compounds. [Pg.351]

Figure 5.15 Dissimilarity-based selection methods maximum dissimilarity approach.

Many other approaches have been developed for measuring similarity and dissimilarity, including a multitude of variations on clustering or partitioning strategies. It is not possible to summarise here the wide variety of structure-based methods for diversity analysis however, a number of detailed papers and reviews are available, particularly recommended for advanced theory is that of Agrafiotis. ... [Pg.120]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...