Maximum dissimilarity

In dissimilarity-based compound selection the required subset of molecules is identified directly, using an appropriate measure of dissimilarity (often taken to be the complement of the similarity). This contrasts with the two-stage procedure in cluster analysis, where it is first necessary to group together the molecules and then decide which to select. Most methods for dissimilarity-based selection fall into one of two categories maximum dissimilarity algorithms and sphere exclusion algorithms [Snarey et al. 1997]. [Pg.699]

The maximum dissimilarity algorithm works in an iterative manner at each step one compormd is selected from the database and added to the subset [Kennard and Stone 1969]. The compound selected is chosen to be the one most dissimilar to the current subset. There are many variants on this basic algorithm which differ in the way in which the first compound is chosen and how the dissimilarity is measured. Three possible choices for fhe initial compormd are (a) select it at random, (b) choose the molecule which is most representative (e.g. has the largest sum of similarities to the other molecules) or (c) choose the molecule which is most dissimilar (e.g. has the smallest sum of similarities to the other molecules). [Pg.699]

Dissimilarity-based compound selection (DECS) methods involve selecting a subset of compounds directly based on pairwise dissimilarities [37]. The first compound is selected, either at random or as the one that is most dissimilar to all others in the database, and is placed in the subset. The subset is then built up stepwise by selecting one compound at a time until it is of the required size. In each iteration, the next compound to be selected is the one that is most dissimilar to those already in the subset, with the dissimilarity normally being computed by the MaxMin approach [38]. Here, each database compound is compared with each compound in the subset and its nearest neighbor is identified the database compound that is selected is the one that has the maximum dissimilarity to its nearest neighbor in the subset. [Pg.199]

A number of different approaches have been used for selecting diverse sets of compounds. One of the simplest is maximum dissimilarity, in which each new compound is chosen to be as dissimilar as possible to those already selected [60]. This can drastically reduce the library size without significantly reducing the likelihood of discovering classes... [Pg.400]

The first application of a computational method to select structurally diverse compounds for purchase started in 1992 at the Upjohn Company, which predated the formation of Pharmacia Upjohn by about three years. The basic approach selected compounds using a method based upon maximum dissimilarity and was implemented using SAS software [11]. This later evolved into the program Dfragall, which was written in C and is described in Section 13.6.3. Basically, a set of compounds that is maximally dissimilar from the corporate compound collection is chosen from the set of available vendor compounds. Early versions of the process relied solely on diversity-based metrics but it was found that many nondrug like compounds were identified. As a result, structural exclusion criteria were developed to eliminate compounds that were considered unsuitable for... [Pg.319]

The D-score is computed using the maximum dissimilarity algorithm of Lajiness (20). This method utilizes a Tanimoto-like similarity measure defined on a 360-bit fragment descriptor used in conjunction with the Cousin/ChemLink system (21). The important feature of this method is that it starts with the selection of a seed compound with subsequent compounds selected based on the maximum diversity relative to all compounds already selected. Thus, the most obvious seed to use in the current scenario is the compound that has the best profile based on the already computed scores. Thus, one needs to compute a preliminary consensus score based on the Q-score and the B-score using weights as defined previously. To summarize this, one needs to... [Pg.121]

An example of a final consensus list can be seen in Fig. 6. In this figure one can see the Q-score and B-score and the computed preliminary consensus score. On the basis of the preliminary consensus, NP-103930 was chosen as the best compound and selected to be the dissimilarity seed. After the maximum dissimilarity calculation, the diversity score was input and the final consensus score was calculated. As one can see from this figure, the first compound in the preliminary run remains the best. The second compound from the preliminary run does not appear in this list as it was very similar to the NP-103930 and was de-prioritized and moved down the list accordingly. Also the 155th compound in the preliminary ranking moved up the 14th rank because it was considered as a structurally novel compound. This, we feel, illustrates the power of this approach. Compounds with the most desirable properties move up the list and compounds with less desirable properties move down the list. [Pg.122]

In the Maximum Dissimilarity (MD) selection method described by Lajiness [40] the first compound is selected at random and subsequent compounds are then chosen iteratively, such that the distance to the nearest of the compounds already chosen is a maximum. This method is known as MaxMin. In this study, the compounds were represented by COUSIN 2-D fragment-based bitstrings. Polinsky et al. [41] use a similar algorithm in the LiBrain system. In this case, the molecules are represented by a feature vector that contains information about the following affinity types—aliphatic hydrophobic, aromatic hydrophobic, basic, acidic, hydrogen bond donor, hydrogen bond acceptor and polarizable heteroatom. [Pg.353]

Matter [58] has also validated a range of 2-D and 3-D structural descriptors on their ability to predict biological activity and on their ability to sample structurally and biologically diverse datasets effectively. The compound selection techniques used were maximum dissimilarity and clustering. Their results also showed the 2-D fingerprint-based descriptors to be the most effective in selecting representative subsets of bioactive compounds. [Pg.358]

Potter and Matter [64] compared maximum dissimilarity methods and hierarchical clustering with random methods for designing compound subsets. The compound selection methods were applied to a database of 1283 compounds extracted from the IndexChemicus 1993 database that contain 55 biological activity classes. A second database consisted of 334 compounds from 11 different QSAR target series. They compared the distribution of actives in randomly chosen subsets with the rationally... [Pg.54]

Figure 5.15 Dissimilarity-based selection methods maximum dissimilarity approach.

By different mathematical transformations. Molecular Quantum Similarity Indices (MQSI) are derived from molecular quantum similarity measures. They are divided into two main classes C-class indices, referred to as correlation-like indices ranging from 0 (maximum dissimilarity) to 1 (maximum similarity), and D-class indices, referred to as distance-like indices ranging from 0 (maximum similarity) to infinity (maximum dissimilarity). C-class indices can be transformed into D-class indices d, by the following ... [Pg.400]

Fig. 13 Selection of compounds from a virtual combinatorial library, a) First six steps of a maximum dissimilarity selection, b) Selection by excluding similar compounds, c) Maximum diversity selection as a result of clustering, d) Grouping by partitioning of descriptor space.

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...