MaxSum

To decide which molecule to add at each iteration requires the dissimilarity values between each molecule remaining in the database and those already placed into the subset to be calculated. Again, this can be achieved in several ways. Snarey et al. investigated two conunon definitions, MaxSum and MaxMin. If there are m molecules in the subset then... [Pg.699]

An alternative way of measuring the dissimilarity of one compound to a set of compounds is to sum the pairwise dissimilarities between the compound and all compounds in the set, a method known as MaxSum. The most dissimilar compound to a set of compounds is the compound which has the maximum sum of pairwise dissimilarities. Holliday et al. [42] have implemented an efficient version of MaxSum that uses the cosine coefficient as the (dis)similarity coefficient. Their algorithm operates in O(nN) time complexity and can thus be applied to very large datasets. However, as Snarey et al. [43] have pointed out, there is a tendency for the algorithm to focus on outliers. [Pg.353]

The basic DBCS algorithm has time complexity 0(n2N), where n compounds are selected from N. Since n is generally a small fraction of N, the time is thus cubic in N. DBCS can also be very computational demanding however, fast implementations have been developed, for example the MaxSum method described by Holliday et al. [42] and a MaxMin method described by Agrafiotis and Lobanov that can be used with low-dimensional descriptors [55],... [Pg.357]

An alternative approach is to use the sum of pairwise similarities in the maxsum approach ... [Pg.208]

This approach is particularly efficient when combined with the Cosine coefficient (69) and was used by Pickett et al. in combination with pharmacophore descriptors (70). In lower dimensional spaces the maxsum measure tends to force selection from the comers of diversity space (6b, 71) and hence maxmin is the preferred function in these cases. A similar conclusion was drawn from a comparison of algorithms for dissimilarity-based compound selection (72). [Pg.208]

Several different DBCS algorithms have been described and they differ in the way the seed compound is chosen and the way in which the dissimilarity of one compound to a set of compounds is measured [28]. For example, in the MaxMin method, the subset is chosen to maximize the minimum distance between all pairs of molecules in the subset [29], whereas in the MaxSum method, the subset that maximizes the sum of pairwise dissimilarities in the subset is chosen [28]. The basic... [Pg.621]

Mount et al. [32] describe a DBCS method that is based on a minimum spanning tree. A spanning tree is a set of edges that connect a set of objects. The objects in this method are the molecules in the subset, and each edge is labeled by the dissimilarity between the two molecules it connects. A minimum spanning tree is the spanning tree that connects all molecules in the subset with the minimum sum of pairwise dissimilarities thus, the diversity is the sum of just some of the intermolecular similarities rather than all of them as in MaxSum. A similar function has also been developed by Brown et al. [34],... [Pg.622]

The behaviour of some of these methods is illustrated using a two-dimensional example in Figure 12.30. If the most dissimilar compound is chosen as the first molecule in the maximum-dissimilarity cases then the MaxSum method tends to select compounds at the extremities of the distribution. Hiis is also the initial behaviour of the MaxMin approach, but it then starts to sample from the middle. The sphere exclusion methods typically start somewhere in the middle of the distribution and work outwards. [Pg.684]

When the algorithms were compared, howevei the MaxMin algorithm gave better results than the alternatives under study. In fact, several workers o > have highlighted a problem with the MaxSum procedure. The measure is based on the distance of the point from the centroid of the set and so tends to select molecules firom the comers of diversity space, and duplicate selections can appear to add to the diversity. This situation is clearly a problem with traditional descriptors, because the extremes of space tend to be less relevant chemical compounds (very high or very low log P, etc.). [Pg.24]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...