MaxMin selection

Dissimilarity-based compound selection (DECS) methods involve selecting a subset of compounds directly based on pairwise dissimilarities [37]. The first compound is selected, either at random or as the one that is most dissimilar to all others in the database, and is placed in the subset. The subset is then built up stepwise by selecting one compound at a time until it is of the required size. In each iteration, the next compound to be selected is the one that is most dissimilar to those already in the subset, with the dissimilarity normally being computed by the MaxMin approach [38]. Here, each database compound is compared with each compound in the subset and its nearest neighbor is identified the database compound that is selected is the one that has the maximum dissimilarity to its nearest neighbor in the subset. [Pg.199]

In the Maximum Dissimilarity (MD) selection method described by Lajiness [40] the first compound is selected at random and subsequent compounds are then chosen iteratively, such that the distance to the nearest of the compounds already chosen is a maximum. This method is known as MaxMin. In this study, the compounds were represented by COUSIN 2-D fragment-based bitstrings. Polinsky et al. [41] use a similar algorithm in the LiBrain system. In this case, the molecules are represented by a feature vector that contains information about the following affinity types—aliphatic hydrophobic, aromatic hydrophobic, basic, acidic, hydrogen bond donor, hydrogen bond acceptor and polarizable heteroatom. [Pg.353]

The basic DBCS algorithm has time complexity 0(n2N), where n compounds are selected from N. Since n is generally a small fraction of N, the time is thus cubic in N. DBCS can also be very computational demanding however, fast implementations have been developed, for example the MaxSum method described by Holliday et al. [42] and a MaxMin method described by Agrafiotis and Lobanov that can be used with low-dimensional descriptors [55],... [Pg.357]

This approach is particularly efficient when combined with the Cosine coefficient (69) and was used by Pickett et al. in combination with pharmacophore descriptors (70). In lower dimensional spaces the maxsum measure tends to force selection from the comers of diversity space (6b, 71) and hence maxmin is the preferred function in these cases. A similar conclusion was drawn from a comparison of algorithms for dissimilarity-based compound selection (72). [Pg.208]

Schmuker, M., Givehchi, A. and Schneider, G. (2004) Impact of different software implementations on the performance of the Maxmin method for diverse subset selection. Mol. Div., 8, 421—425. [Pg.1165]

The behaviour of some of these methods is illustrated using a two-dimensional example in Figure 12.30. If the most dissimilar compound is chosen as the first molecule in the maximum-dissimilarity cases then the MaxSum method tends to select compounds at the extremities of the distribution. Hiis is also the initial behaviour of the MaxMin approach, but it then starts to sample from the middle. The sphere exclusion methods typically start somewhere in the middle of the distribution and work outwards. [Pg.684]

Figure 14.1 Four common selection methods compared. The molecules (represented by dots, those selected are black with a white centre) are distributed in an arbitrary two dimensional property space. A illustrates a cell based selection of one molecule per cell, B a MaxMin dissimilarity selection, C uses sphere exclusion clustering, whilst D invokes a more sophisticated clustering method. This figure is adapted from ref. 56.

When the algorithms were compared, howevei the MaxMin algorithm gave better results than the alternatives under study. In fact, several workers o > have highlighted a problem with the MaxSum procedure. The measure is based on the distance of the point from the centroid of the set and so tends to select molecules firom the comers of diversity space, and duplicate selections can appear to add to the diversity. This situation is clearly a problem with traditional descriptors, because the extremes of space tend to be less relevant chemical compounds (very high or very low log P, etc.). [Pg.24]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...