Method maximum dissimilarity

Figure 5.15 Dissimilarity-based selection methods maximum dissimilarity approach.

In dissimilarity-based compound selection the required subset of molecules is identified directly, using an appropriate measure of dissimilarity (often taken to be the complement of the similarity). This contrasts with the two-stage procedure in cluster analysis, where it is first necessary to group together the molecules and then decide which to select. Most methods for dissimilarity-based selection fall into one of two categories maximum dissimilarity algorithms and sphere exclusion algorithms [Snarey et al. 1997]. [Pg.699]

Dissimilarity-based compound selection (DECS) methods involve selecting a subset of compounds directly based on pairwise dissimilarities [37]. The first compound is selected, either at random or as the one that is most dissimilar to all others in the database, and is placed in the subset. The subset is then built up stepwise by selecting one compound at a time until it is of the required size. In each iteration, the next compound to be selected is the one that is most dissimilar to those already in the subset, with the dissimilarity normally being computed by the MaxMin approach [38]. Here, each database compound is compared with each compound in the subset and its nearest neighbor is identified the database compound that is selected is the one that has the maximum dissimilarity to its nearest neighbor in the subset. [Pg.199]

The first application of a computational method to select structurally diverse compounds for purchase started in 1992 at the Upjohn Company, which predated the formation of Pharmacia Upjohn by about three years. The basic approach selected compounds using a method based upon maximum dissimilarity and was implemented using SAS software [11]. This later evolved into the program Dfragall, which was written in C and is described in Section 13.6.3. Basically, a set of compounds that is maximally dissimilar from the corporate compound collection is chosen from the set of available vendor compounds. Early versions of the process relied solely on diversity-based metrics but it was found that many nondrug like compounds were identified. As a result, structural exclusion criteria were developed to eliminate compounds that were considered unsuitable for... [Pg.319]

The D-score is computed using the maximum dissimilarity algorithm of Lajiness (20). This method utilizes a Tanimoto-like similarity measure defined on a 360-bit fragment descriptor used in conjunction with the Cousin/ChemLink system (21). The important feature of this method is that it starts with the selection of a seed compound with subsequent compounds selected based on the maximum diversity relative to all compounds already selected. Thus, the most obvious seed to use in the current scenario is the compound that has the best profile based on the already computed scores. Thus, one needs to compute a preliminary consensus score based on the Q-score and the B-score using weights as defined previously. To summarize this, one needs to... [Pg.121]

In the Maximum Dissimilarity (MD) selection method described by Lajiness [40] the first compound is selected at random and subsequent compounds are then chosen iteratively, such that the distance to the nearest of the compounds already chosen is a maximum. This method is known as MaxMin. In this study, the compounds were represented by COUSIN 2-D fragment-based bitstrings. Polinsky et al. [41] use a similar algorithm in the LiBrain system. In this case, the molecules are represented by a feature vector that contains information about the following affinity types—aliphatic hydrophobic, aromatic hydrophobic, basic, acidic, hydrogen bond donor, hydrogen bond acceptor and polarizable heteroatom. [Pg.353]

Potter and Matter [64] compared maximum dissimilarity methods and hierarchical clustering with random methods for designing compound subsets. The compound selection methods were applied to a database of 1283 compounds extracted from the IndexChemicus 1993 database that contain 55 biological activity classes. A second database consisted of 334 compounds from 11 different QSAR target series. They compared the distribution of actives in randomly chosen subsets with the rationally... [Pg.54]

Figure 13.7. Percent biological classes covered from the IC93 database versus subset sizes for selection using 2D fingerprints and maximum dissimilarity selection (Unity2D), theoretical random selections (Ran domjheo) and different implementations of the PDT selection method (PDT orig, PDT cul80, etc.).

Several product-based approaches to library design that do not require full enumeration have been developed. Pickett et al. have described the design of a diverse amide library where diversity is measured in product space. The DIVSEL program is a DBCS method where dissimilarity is measured in three-point pharmacophore space [83]. Initially, 11 amines were selected based on maximum pharmacophore diversity. Then a total of 1100 carboxylic acids were identified following substructure searching. A set of 1100 pharmacophores keys was generated, where each key corresponds to one acid combined with the 11 amines. DIVSEL was used to select 100 acids based on the diversity of the products. The final library was found to cover 85% of the pharmacophores represented by the entire 12,100 virtual libraries. [Pg.628]

At each iteration of the sphere-exclusion algorithm [Hudson et at 1996], a compound is selected for inclusion in the subset and then all other molecules in the database which have a dissimilarity to this compound less than some threshold value are removed from further consideration. Variation is possible depending upon the way in which the first compound is selected, the threshold value, and the way in which the next compound is selected at each stage. It is typical to try to select this next compound so that it is least dissimilar to those already selected. Hudson et al. suggested the use of a MinMax method, where the molecule with the smallest maximum dissimilarity with the current subset is selected. However, it is also possible to select this next compound at random from those still remaining. [Pg.684]

The behaviour of some of these methods is illustrated using a two-dimensional example in Figure 12.30. If the most dissimilar compound is chosen as the first molecule in the maximum-dissimilarity cases then the MaxSum method tends to select compounds at the extremities of the distribution. Hiis is also the initial behaviour of the MaxMin approach, but it then starts to sample from the middle. The sphere exclusion methods typically start somewhere in the middle of the distribution and work outwards. [Pg.684]

Reagent-based design can also been applied to generate diverse or, in some cases, focused subsets based on bioisosteric replacement (18), Several methods have been used in the selection of monomers, including maximum dissimilarity (6), D-Optimal design (18), and clustering (15), Hierarchical cluster... [Pg.299]

We use the second-dimension separation from Fig. 6.6 with a 25 pL injection volume and 2.5 min sampling time the separation is an RPLC method that uses a monolithic column. Thus, 10 pL/min is the maximum flow rate in the first-dimension. Fig. 6.7 shows the development of the first-dimension column that utilizes a hydrophilic interaction (or HILIC) column for the separation of proteins at decreasing flow rates. The same proteins were separated in Fig. 6.6 (RPLC) and 6.7 (HILIC) and have a reversed elution order, which is known from the basics of HILIC (Alpert, 1990). It is believed that HILIC and RPLC separations are a good pair for 2DLC analysis of proteins as they appear to have dissimilar retention mechanisms, much like those of NPLC and RPLC it has been suggested that HILIC is similar in retention to NPLC (Alpert, 1990). Because the HILIC column used in Fig. 6.7 gave good resolution at 0.1 mL/min and no smaller diameter column was available, the flow was split 10-fold to match the second-dimension requirement... [Pg.141]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...