Pairwise dissimilarity

Dissimilarity-based compound selection (DECS) methods involve selecting a subset of compounds directly based on pairwise dissimilarities [37]. The first compound is selected, either at random or as the one that is most dissimilar to all others in the database, and is placed in the subset. The subset is then built up stepwise by selecting one compound at a time until it is of the required size. In each iteration, the next compound to be selected is the one that is most dissimilar to those already in the subset, with the dissimilarity normally being computed by the MaxMin approach [38]. Here, each database compound is compared with each compound in the subset and its nearest neighbor is identified the database compound that is selected is the one that has the maximum dissimilarity to its nearest neighbor in the subset. [Pg.199]

SELECT has been designed to allow optimization of a variety of different objectives. Diversity (and similarity) is optimized using functions either based on pairwise dissimilarities and fingerprints or using cell-based measures. The physicochemical properties of libraries are optimized by minimizing the dif-... [Pg.341]

An alternative way of measuring the dissimilarity of one compound to a set of compounds is to sum the pairwise dissimilarities between the compound and all compounds in the set, a method known as MaxSum. The most dissimilar compound to a set of compounds is the compound which has the maximum sum of pairwise dissimilarities. Holliday et al. [42] have implemented an efficient version of MaxSum that uses the cosine coefficient as the (dis)similarity coefficient. Their algorithm operates in O(nN) time complexity and can thus be applied to very large datasets. However, as Snarey et al. [43] have pointed out, there is a tendency for the algorithm to focus on outliers. [Pg.353]

Several different DBCS algorithms have been described and they differ in the way the seed compound is chosen and the way in which the dissimilarity of one compound to a set of compounds is measured [28]. For example, in the MaxMin method, the subset is chosen to maximize the minimum distance between all pairs of molecules in the subset [29], whereas in the MaxSum method, the subset that maximizes the sum of pairwise dissimilarities in the subset is chosen [28]. The basic... [Pg.621]

Mount et al. [32] describe a DBCS method that is based on a minimum spanning tree. A spanning tree is a set of edges that connect a set of objects. The objects in this method are the molecules in the subset, and each edge is labeled by the dissimilarity between the two molecules it connects. A minimum spanning tree is the spanning tree that connects all molecules in the subset with the minimum sum of pairwise dissimilarities thus, the diversity is the sum of just some of the intermolecular similarities rather than all of them as in MaxSum. A similar function has also been developed by Brown et al. [34],... [Pg.622]

The previous discussion subtly shifted between molecular similarity and molecular properties. It is important to elucidate the relationship between the two. If each of the molecular properties can be treated as a separate dimension in a Euclidean property space, and dissimilarity can be equated with distance between property vectors, similarity/diversity problems can be solved using analytical geometry. A set of vectors (chemical structures) in property space can be converted to a matrix of pairwise dissimilarities simply by applying the Pythagorean theorem. This operation is like measuring the distances between all pairs of cities from their coordinates on a map. [Pg.78]

For a set of 1133 carboxylic acid derived substituents, MDS reduced the 2048-bit fingerprints to just seven continuous variables that reproduce all 642,000 pairwise dissimilarities with a relative standard deviation of just 10%. The calculations required 7 hours on an IBM RS/6000 580 computer using SAS PROC MDS. Spellmeyer has written a spedalized MDS code that reproduced this result in 11 minutes on the same machine. He was able to reduce a set of 3984 amines to 10 dimensions in 4.4 hours. [Pg.80]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...