Dissimilar cluster selection

In dissimilarity-based compound selection the required subset of molecules is identified directly, using an appropriate measure of dissimilarity (often taken to be the complement of the similarity). This contrasts with the two-stage procedure in cluster analysis, where it is first necessary to group together the molecules and then decide which to select. Most methods for dissimilarity-based selection fall into one of two categories maximum dissimilarity algorithms and sphere exclusion algorithms [Snarey et al. 1997]. [Pg.699]

Since diversity is a collective property, its precise quantification requires a mathematical description of the distribution of the molecular collection in a chemical space. When a set of molecules are considered to be more diverse than another, the molecules in this set cover more chemical space and/or the molecules distribute more evenly in chemical space. Historically, diversity analysis is closely linked to compound selection and combinatorial library design. In reality, library design is also a selection process, selecting compounds from a virtual library before synthesis. There are three main categories of selection procedures for building a diverse set of compounds cluster-based selection, partition-based selection, and dissimilarity-based selection. [Pg.39]

Many different methods have been developed for compound selection. They include selection based on clustering compounds, dissimilarity-based selection, selection based on partitioning a collection of compounds into some multidimensional space, experimental design methods and the use of stochastic methods such as simulated annealing and genetic algorithms. Filtering techniques are often employed prior to compound selection to remove undesirable compounds. [Pg.259]

Partitional clustering using Euclidean distance as a measure of dissimilarity between pattern classes has been selected for the grouping of AE hits. [Pg.39]

A major potential drawback with cluster analysis and dissimilarity-based methods f selecting diverse compounds is that there is no easy way to quantify how completel one has filled the available chemical space or to identify whether there are any hole This is a key advantage of the partition-based approaches (also known, as cell-bas( methods). A number of axes are defined, each corresponding to a descriptor or son combination of descriptors. Each axis is divided into a number of bins. If there are axes and each is divided into b bins then the number of cells in the multidimension space so created is ... [Pg.701]

The selections of compounds are made using a variety of methods, such as dissimilarity selection (16), optiverse library selection (17), Jarvis-Park clustering (18), and cell-based methods (19). All these methods attempt to choose a set of compounds that represent the molecular diversity of the available compounds as efficiently as possible. A consequence of this is that only a few compounds around any given molecular scaffold may be present in a HTS screening... [Pg.87]

Given the variety of different descriptors and subset selection methods that are available, several studies have been carried out in an attempt to validate both the compound selection methods and the various descriptors. To some extent the choice of descriptors and subset selection methods are interlinked. For example, partitioning schemes are restricted to low-dimensional descriptors such as physicochemical descriptors, whereas clustering and dissimilarity-based methods can be used with high dimensional descriptors such as fingerprints. [Pg.357]

Matter [58] has also validated a range of 2-D and 3-D structural descriptors on their ability to predict biological activity and on their ability to sample structurally and biologically diverse datasets effectively. The compound selection techniques used were maximum dissimilarity and clustering. Their results also showed the 2-D fingerprint-based descriptors to be the most effective in selecting representative subsets of bioactive compounds. [Pg.358]

Potter and Matter [64] compared maximum dissimilarity methods and hierarchical clustering with random methods for designing compound subsets. The compound selection methods were applied to a database of 1283 compounds extracted from the IndexChemicus 1993 database that contain 55 biological activity classes. A second database consisted of 334 compounds from 11 different QSAR target series. They compared the distribution of actives in randomly chosen subsets with the rationally... [Pg.54]

There is already an extensive literature relating to compound-selection methods, from which it is possible to identify four major classes of method although, as we shall see, there is some degree of overlap between these four classes, viz cluster-based methods, dissimilarity-based methods, partition-based methods and optimisation methods. The next four sections of this chapter present the various algorithms that have been suggested for each approach we then discuss comparisons and applications of these algorithms, and the chapter concludes with some thoughts on further developments in the field. [Pg.117]

Cluster-based and dissimilarity-based methods for compound selection were first discussed in the Eighties but it is only in the last few years that the area has attracted substantial attention as a result of the need to provide a rational basis for the design of combinatorial libraries. The four previous sections have provided an overview of the main types of selection method that are already available, with further approaches continuing to appear in the literature. Given this array of possible techniques, it is appropriate to consider ways in which the various methods can be evaluated, both in absolute terms and when compared with each other. A method can be evaluated in terms of its efficiency, /.< ., the computational costs associated with its use, and its effectiveness, /.< ., the extent to which it achieves its aims. As we shall see, it is not immediately obvious how effectiveness should be quantified and we shall thus consider the question of efficiency first, focusing upon the normal algorithmic criteria of CPU time and storage requirements. [Pg.129]

Compound selection is a core process of library design, and three main methods can be mentioned. Dissimilarity-based methods select compounds in terms of similar-ity/distance between individuals in chemical space. Clustering methods first group compounds into clusters based on similarity/distance and then choose representative compounds from different clusters. Partitioning methods first create a uniform cell space that subdivides the chemical space, then assign all virtual compounds to the relative cells according to their properties, and finally choose representative compounds from different cells. [Pg.184]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...