Maximum-dissimilarity algorithm

In dissimilarity-based compound selection the required subset of molecules is identified directly, using an appropriate measure of dissimilarity (often taken to be the complement of the similarity). This contrasts with the two-stage procedure in cluster analysis, where it is first necessary to group together the molecules and then decide which to select. Most methods for dissimilarity-based selection fall into one of two categories maximum dissimilarity algorithms and sphere exclusion algorithms [Snarey et al. 1997]. [Pg.699]

The maximum dissimilarity algorithm works in an iterative manner at each step one compormd is selected from the database and added to the subset [Kennard and Stone 1969]. The compound selected is chosen to be the one most dissimilar to the current subset. There are many variants on this basic algorithm which differ in the way in which the first compound is chosen and how the dissimilarity is measured. Three possible choices for fhe initial compormd are (a) select it at random, (b) choose the molecule which is most representative (e.g. has the largest sum of similarities to the other molecules) or (c) choose the molecule which is most dissimilar (e.g. has the smallest sum of similarities to the other molecules). [Pg.699]

The D-score is computed using the maximum dissimilarity algorithm of Lajiness (20). This method utilizes a Tanimoto-like similarity measure defined on a 360-bit fragment descriptor used in conjunction with the Cousin/ChemLink system (21). The important feature of this method is that it starts with the selection of a seed compound with subsequent compounds selected based on the maximum diversity relative to all compounds already selected. Thus, the most obvious seed to use in the current scenario is the compound that has the best profile based on the already computed scores. Thus, one needs to compute a preliminary consensus score based on the Q-score and the B-score using weights as defined previously. To summarize this, one needs to... [Pg.121]

In the Maximum Dissimilarity (MD) selection method described by Lajiness [40] the first compound is selected at random and subsequent compounds are then chosen iteratively, such that the distance to the nearest of the compounds already chosen is a maximum. This method is known as MaxMin. In this study, the compounds were represented by COUSIN 2-D fragment-based bitstrings. Polinsky et al. [41] use a similar algorithm in the LiBrain system. In this case, the molecules are represented by a feature vector that contains information about the following affinity types—aliphatic hydrophobic, aromatic hydrophobic, basic, acidic, hydrogen bond donor, hydrogen bond acceptor and polarizable heteroatom. [Pg.353]

At each iteration of the sphere-exclusion algorithm [Hudson et at 1996], a compound is selected for inclusion in the subset and then all other molecules in the database which have a dissimilarity to this compound less than some threshold value are removed from further consideration. Variation is possible depending upon the way in which the first compound is selected, the threshold value, and the way in which the next compound is selected at each stage. It is typical to try to select this next compound so that it is least dissimilar to those already selected. Hudson et al. suggested the use of a MinMax method, where the molecule with the smallest maximum dissimilarity with the current subset is selected. However, it is also possible to select this next compound at random from those still remaining. [Pg.684]

Maximum Dissimilarity-Based Selection The original algorithm for dissimilarity ranking in the chemical structure context seems to have been proposed by Bawden, although the basic algorithm may be due to Kennard and Stone. The basic operation of a dissimilarity selection algorithm is to start with a compound selected at random and make this the first selected compound. Subsequent compounds are selected so that they are maximally dissimilar to all those in the currendy selected set. Dissimilarity may be measured by... [Pg.23]

MaxD Maximum dissimilarity selection ( DfragalT) algorithm... [Pg.1]

An alternative way of measuring the dissimilarity of one compound to a set of compounds is to sum the pairwise dissimilarities between the compound and all compounds in the set, a method known as MaxSum. The most dissimilar compound to a set of compounds is the compound which has the maximum sum of pairwise dissimilarities. Holliday et al. [42] have implemented an efficient version of MaxSum that uses the cosine coefficient as the (dis)similarity coefficient. Their algorithm operates in O(nN) time complexity and can thus be applied to very large datasets. However, as Snarey et al. [43] have pointed out, there is a tendency for the algorithm to focus on outliers. [Pg.353]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...