Maximal dissimilarity selection

The first application of a computational method to select structurally diverse compounds for purchase started in 1992 at the Upjohn Company, which predated the formation of Pharmacia Upjohn by about three years. The basic approach selected compounds using a method based upon maximum dissimilarity and was implemented using SAS software [11]. This later evolved into the program Dfragall, which was written in C and is described in Section 13.6.3. Basically, a set of compounds that is maximally dissimilar from the corporate compound collection is chosen from the set of available vendor compounds. Early versions of the process relied solely on diversity-based metrics but it was found that many nondrug like compounds were identified. As a result, structural exclusion criteria were developed to eliminate compounds that were considered unsuitable for... [Pg.319]

In the left part of the figure the shapes of the response surfaces of the partition coefficients are dissimilar for both compounds. The maximal minimal partition coefficient is found in 0 . 0 also generates the maximal minimal selectivity, which is equal to unity in this case (a, is represented by the minimum value of a,j and afa. The selectivity varies largely with extraction liquid composition. Composition 0 yields a high ratio of P, and Pj, but this ratio is very sensitive to small fluctuations in the composition of the extraction liquid a fraction of extraction liquid component one (x,)... [Pg.272]

Dissimilarity analysis plays a major role in compound selection. Typical tasks include the selection of a maximally dissimilar subset of compounds from a large set or the identification of compounds that are dissimilar to an existing collection. Such issues have played a major role in compound acquisition in the pharmaceutical industry. A typical task would be to select a subset of maximally dissimilar compounds from a data set containing n molecules. This represents a non-trivial challenge because of the combinatorial problem involved in exploring all possible subsets. Therefore, other dissimilarity-based selection algorithms have been developed (Lajiness 1997). The basic idea of such approaches is to initially select a seed compound (either randomly or, better, based on dissimilarity to others), then calculate dissimilarity between the seed compound and all others and select the most dissimilar one. In the next step, the database compound most dissimilar to these two compounds is selected and added to the subset, and the process is repeated until a subset of desired size is obtained. [Pg.9]

Spread design objective select the maximally dissimilar subset of molecules. This requires maximizing the distance of points within the subset from each other. One analogy for this is electron repulsion. [Pg.84]

The objective of a spread design is to identify a subset of molecules in which the molecules are as dissimilar as possible under a given similarity metric. For a given metric to measure the similarity of a subset, all subsets of size k (plus any molecules previously selected) could be evaluated and the subset that produces the lowest similarity measure chosen. In practice, simple non-optimal sequential algorithms are often used to approximate the maximally dissimilar subset two such algorithms are described below. [Pg.84]

The subset seleetion can be performed iteratively. The first compound is chosen at random and the next compound is selected to be maximally dissimilar to the first the third is then selected to be maximally dissimilar to the first two, and so on. The selection stops when a prespecified number of compounds have been selected or no more compounds can be chosen that are below a given similarity or above a certain distance to another compound in the selected set. Pearlman (llb,c) refers to such methods as "addition" algorithms because they add compounds to a diverse set of increasing size. He notes that such algorithms are quite satisfactory when the size of the desired subset is relatively modest but, given that the time required for such algorithms is proportional to the size of the total population and the square of the size of the desired diverse subset, they are far less satisfactory when, for example, selecting a subset of 10,000 from a population of 1,000,000. [Pg.207]

The filter-based methods operate in isolation for ranking the features and do not consider the correlation among the features. Thus, the redundancy among the selected features is not used. To overcome this problem, MRMR method have been used that takes into account both minimum redundancy and maximum relevance criteria to select the additional features that are maximally dissimilar to the already identified features. [Pg.194]

Maximum Dissimilarity-Based Selection The original algorithm for dissimilarity ranking in the chemical structure context seems to have been proposed by Bawden, although the basic algorithm may be due to Kennard and Stone. The basic operation of a dissimilarity selection algorithm is to start with a compound selected at random and make this the first selected compound. Subsequent compounds are selected so that they are maximally dissimilar to all those in the currendy selected set. Dissimilarity may be measured by... [Pg.23]

Fig. 3.5. Similarities and dissimilarities between the selectivity of solvents, simultaneously taking into consideration each tetrazolium salt. Two-dimensional nonlinear selectivity map. Number of iterations, 547 maximal error, 1.42 X 1CT2. Reprinted with permission from E. Forgacs et al. [85].

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...