Maximum dissimilarity-based selection

Maximum Dissimilarity-Based Selection The original algorithm for dissimilarity ranking in the chemical structure context seems to have been proposed by Bawden, although the basic algorithm may be due to Kennard and Stone. The basic operation of a dissimilarity selection algorithm is to start with a compound selected at random and make this the first selected compound. Subsequent compounds are selected so that they are maximally dissimilar to all those in the currendy selected set. Dissimilarity may be measured by... [Pg.23]

In dissimilarity-based compound selection the required subset of molecules is identified directly, using an appropriate measure of dissimilarity (often taken to be the complement of the similarity). This contrasts with the two-stage procedure in cluster analysis, where it is first necessary to group together the molecules and then decide which to select. Most methods for dissimilarity-based selection fall into one of two categories maximum dissimilarity algorithms and sphere exclusion algorithms [Snarey et al. 1997]. [Pg.699]

In the Maximum Dissimilarity (MD) selection method described by Lajiness [40] the first compound is selected at random and subsequent compounds are then chosen iteratively, such that the distance to the nearest of the compounds already chosen is a maximum. This method is known as MaxMin. In this study, the compounds were represented by COUSIN 2-D fragment-based bitstrings. Polinsky et al. [41] use a similar algorithm in the LiBrain system. In this case, the molecules are represented by a feature vector that contains information about the following affinity types—aliphatic hydrophobic, aromatic hydrophobic, basic, acidic, hydrogen bond donor, hydrogen bond acceptor and polarizable heteroatom. [Pg.353]

Clark [46] has recently described a subset selection algorithm called OptiSim which includes maximum and minimum dissimilarity based selection as special cases. A parameter is used to adjust the balance between representativeness and diversity in the compounds that are selected. [Pg.354]

Figure 5.15 Dissimilarity-based selection methods maximum dissimilarity approach.

Dissimilarity-based compound selection (DECS) methods involve selecting a subset of compounds directly based on pairwise dissimilarities [37]. The first compound is selected, either at random or as the one that is most dissimilar to all others in the database, and is placed in the subset. The subset is then built up stepwise by selecting one compound at a time until it is of the required size. In each iteration, the next compound to be selected is the one that is most dissimilar to those already in the subset, with the dissimilarity normally being computed by the MaxMin approach [38]. Here, each database compound is compared with each compound in the subset and its nearest neighbor is identified the database compound that is selected is the one that has the maximum dissimilarity to its nearest neighbor in the subset. [Pg.199]

The first application of a computational method to select structurally diverse compounds for purchase started in 1992 at the Upjohn Company, which predated the formation of Pharmacia Upjohn by about three years. The basic approach selected compounds using a method based upon maximum dissimilarity and was implemented using SAS software [11]. This later evolved into the program Dfragall, which was written in C and is described in Section 13.6.3. Basically, a set of compounds that is maximally dissimilar from the corporate compound collection is chosen from the set of available vendor compounds. Early versions of the process relied solely on diversity-based metrics but it was found that many nondrug like compounds were identified. As a result, structural exclusion criteria were developed to eliminate compounds that were considered unsuitable for... [Pg.319]

The D-score is computed using the maximum dissimilarity algorithm of Lajiness (20). This method utilizes a Tanimoto-like similarity measure defined on a 360-bit fragment descriptor used in conjunction with the Cousin/ChemLink system (21). The important feature of this method is that it starts with the selection of a seed compound with subsequent compounds selected based on the maximum diversity relative to all compounds already selected. Thus, the most obvious seed to use in the current scenario is the compound that has the best profile based on the already computed scores. Thus, one needs to compute a preliminary consensus score based on the Q-score and the B-score using weights as defined previously. To summarize this, one needs to... [Pg.121]

Matter [58] has also validated a range of 2-D and 3-D structural descriptors on their ability to predict biological activity and on their ability to sample structurally and biologically diverse datasets effectively. The compound selection techniques used were maximum dissimilarity and clustering. Their results also showed the 2-D fingerprint-based descriptors to be the most effective in selecting representative subsets of bioactive compounds. [Pg.358]

Figure 13.5. Selection of subsets from the IC93 database using random picking (theoretical expectation and representative experimental result) and maximum dissimilarity selection based on UNITY 2D fingerprints. The percentage of biological classes sampled from the IC93 database is plotted versus the subset size.

Reagent-based design can also been applied to generate diverse or, in some cases, focused subsets based on bioisosteric replacement (18), Several methods have been used in the selection of monomers, including maximum dissimilarity (6), D-Optimal design (18), and clustering (15), Hierarchical cluster... [Pg.299]

Several product-based approaches to library design that do not require full enumeration have been developed. Pickett et al. have described the design of a diverse amide library where diversity is measured in product space. The DIVSEL program is a DBCS method where dissimilarity is measured in three-point pharmacophore space [83]. Initially, 11 amines were selected based on maximum pharmacophore diversity. Then a total of 1100 carboxylic acids were identified following substructure searching. A set of 1100 pharmacophores keys was generated, where each key corresponds to one acid combined with the 11 amines. DIVSEL was used to select 100 acids based on the diversity of the products. The final library was found to cover 85% of the pharmacophores represented by the entire 12,100 virtual libraries. [Pg.628]

The filter-based methods operate in isolation for ranking the features and do not consider the correlation among the features. Thus, the redundancy among the selected features is not used. To overcome this problem, MRMR method have been used that takes into account both minimum redundancy and maximum relevance criteria to select the additional features that are maximally dissimilar to the already identified features. [Pg.194]

Many of the algorithms discussed in this section construct the solution from the bottom up, that is starting from the null set and incrementally augmenting this set until the desirable number is reached. Taylor" reported an alternative technique which works in the opposite direction. The method starts with the full set, and eliminates one compound at a time based on the maximum similarity principle. In particular, the N x N similarity matrix is scanned to identify the largest element, and one of the compounds associated with that element is selected at random and eliminated. The process continues until a single compound is left in the set. This algorithm sorts the compounds in reverse order of dissimilarity, placing the most diverse molecules at the top of the list. [Pg.750]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...