Similarity dissimilarity measures

Equations (1) and (2) represent the most general form of the optimal clustering problem. The objective is to find the clustering c that minimizes an internal clustering criterion J. J typically employs a similarity/dissimilarity measure to judge the quality of any c. The set C defines c s data structure, including all the feasible clusterings of the set Q of all objects to be clustered. [Pg.136]

Once we have the measures, we have to apply them to chemical objects. Objects of interest to a chemist include molecules, reactions, mbrtures, spectra, patents, journal articles, atoms, functional groups, and complex chemical systems. Most frequently, the objects studied for similarity/dissimilarity are molecular structures. [Pg.309]

MDS presents the structure of a set of objects from data that approximate the distances between pairs of the objects. The data, called similarities, dissimilarities, distances, or proximities, must be in such a form that the degree of similarities and differences between the pairs of the objects (each of which represents a real-life data point) can be measured and handled as a distance (remember the discussion of measures of distances under classifications). Similarity is a matter of degree-small differences between objects cause them to be similar (a high degree of similarity) while large differences cause them to be considered dissimilar (a small degree of similarity). [Pg.947]

Molecular diversity is thus plagued not only with the problems inherent in molecular similarity/dissimilarity [5, 6] but also with those problems associated with molecular populations [7]. One of the foremost problems is that computed molecular similarity values are not invariant to the molecular representation and to the similarity measure used [5]. Nearest-neighbor (NN) relationships, which are employed extensively in many aspects of HTS, are thus problematic, and it is difficult, and in many cases impossible, to obtain consistent subsets [8]. The structure of chemistry space can also be altered significantly in a global sense. As molecular diversity also depends on these factors, it too can be problematic and inconsistencies will no doubt arise. [Pg.317]

Other forms for the pseudo-energy penalty term have also been investigated (61,62). In any case, pseudo-energy penalty term acts as a constraint on the overall energy of the system, which is a balance between favorable conformational energies and overall molecular alignment as measured by field-based similarity (dissimilarity). [Pg.34]

Subheading 2.5. provides a very brief discussion of molecular dissimilarity measures that are basically the complement of their corresponding molecular similarity measures. This section also presents reasons as to why similarity is preferred over dissimilarity, except in studies of diversity, as a measure of molecular resemblance. [Pg.42]

Adamson, G. W. and Bush, J. A. (1975) A comparison of the performance of some similarity and dissimilarity measures in the automatic classification of chemical structures../. Chem. Inf Comput. Sci. 15, 55-58. [Pg.62]

Fig. 3. Coverage of chemistry space by four overlapping sublibraries. (A) Different diversity libraries cover similar chemistry space but show little overlap. This shows three libraries chosen using different dissimilarity measures to act as different representations of the available chemistry space. The compounds from these libraries are presented in this representation by first calculating the intermolecular similarity of each of the compounds to all of the other compounds using fingerprint descriptors and the Tanimoto similarity index. Principal component analysis was then conducted on the similarity matrix to reduce it to a series of principal components that allow the chemistry space to be presented in three dimensions.

One possible reason of such a discrepancy is that during regression fitting an experimental uncertainty may spread over various parameters, thus leading to somewhat distorted final picture. Probably, a more reliable way to measure similarity/dissimilarity of solvents would be to rely on the direct experimental measurements of the distribution ratios rather than on the derived quantities, that is, LSFER parameters. The present authors employed that approach [15]. [Pg.251]

Our approach to selecting a diverse subset is based on utilizing a minimum similarity between each molecule and all other molecules in the virtual library. For the 2-D fingerprints, the similarity is measured by a Tanimoto coefficient20 which measures similarity on a pair-wise basis. A Tanimoto coefficient for any pair of molecular structures lies in the range of zero (dissimilar) to one (similar). It is defined as the ratio of the number of common bits (in this case molecular fragments) set in two molecules divided by the number of bits set in either. [Pg.229]

This chapter is concerned with some of the background theory for molecular diversity analysis and includes a discussion of diversity indices, intermolecular similarity and dissimilarity measures. The extent to which the different approaches to diversity analysis have been validated and compared is reviewed. Algorithms for the selection of diverse sets of compounds are covered in detail elsewhere in this book and are mentioned only briefly here. However, consideration is given to whether these algorithms should be applied in reactant or product space. [Pg.44]

SHAPE SIMILARITY MEASURES AND DISSIMILARITY MEASURES IN THE STUDY OF HOST-GUEST INTERACTIONS... [Pg.607]

For applications of the scaled fuzzy Hausdorff-type metric f p(A,B) for assessing the similarity of molecules, the f p(A,B) distance can be used as a dissimilarity measure. [Pg.154]

By applying the SNSM similarity measure to mirror images, the quantity is a measure of achirality, whereas the dissimilarity measure d A,A ), denoted as Xs J A), is a measure of chirality, where the interrelation (137) between Xs,J A) and implies that this measure can take values from the unit interval. The measure Xs A), first proposed as an example of dissimilarity measures of the second kind, is zero for achiral objects and takes positive values for all chiral objects. Objects perceived as having prominent chirality tend to have large Xs A) values. The SNSM measures have also been applied to more general molecular shape problems. More recently, Klein showed that by a logarithmic transformation of the scaling factors s g, a metric can be constructed to provide a proper distance-like measure of dissimilarity of shapes. [Pg.173]

If the two molecules A and B turn out to be dissimilar by a given (P,W)-shape similarity criterion [i.e., if they do not fulfill the equivalence relation A (P,W) B], then the differences between their numerical shape descriptors can serve as a dissimilarity measure. That is, for a (P,W)-dissimilar molecule pair A and B, the (P,W)-similarity concept allows one to quantify how different their topological invariants are. A simple and straightforward approach is based on a simple vector comparison of the lists of Betti numbers of the shape group technique, or on the numerical comparison of shape matrices. [Pg.146]

Similarity and distance (or dissimilarity) measures provide the means for converting the attributes of the objects into a relevant numerical score. [Pg.134]

Luque Ruiz, L, Urbano-Cuadrado, M. and Gomez-Nieto, M.A. (2007) Data fusion of similarity and dissimilarity measurements using Wiener-based indices for the prediction of the NPY Y5 receptor antagonist capacity of benzoxazinones. J. Chem. Inf Model., 47, 2235-2241. [Pg.1111]

We will take the paths of Table 1 as molecular descriptors to obtain a quantitative measure of molecular similarity. In Table 2 we show the similarity/dissimilarity table for the octane isomers using the Euclidean distance as the measure of similarity. The smaller entries in Table 2 indicate molecules found similar under the procedure adopted, while the larger entries point to the least similar structures. [Pg.177]

Which pair of enantiomers of Figure 34 is most chiral Flow will we decide, when the relative magnitudes for the similarity/dissimilarity depend on the length of the window used for the comparison One way to obtain a single index for the measure of chirality is to add all of the entries in each row and view the total as the measure of chirality. The last column of Table 32 lists the total up to the windows of length 16. If we are to extend the table to the maximal paths of length 22, the relative values would not change since the additional values are all constant. From the last column of Table 32 we can read as the most chiral ... [Pg.226]

Since there is no superior similarity measure, which can address all the issues, the selection of different measures is problem dependent. The optimal distance similarity method can be determined from the clustering results as well as the analysis of cluster quality assessment methods (see below). An example showing the Jif-means clustering results for a subset of gene expression data published by Bhattacharjee et al. (2001) how type of dissimilarity measure can have a great impact on the final clustering results (Fig. 5.2) and the cluster quality (Table 5.1). [Pg.92]

MCD is a 3D-measure of steric misfit between the most active compound and the others within a given series of ligands under study. It translates the topological similarity/dissimilarity MSD parameter, which is an extended Hamming distance, from 2D space into a 3D space (Ciubotariu et al. 1990). [Pg.370]

Gower JC. Measures of similarity, dissimilarity, and distance. In Kotz S, Johnson NL, Read CB, editors. Encyclopedia of Statistics. New York Wiley 1982. p 397 05. [Pg.394]

Although QS has started within such similarity-dissimilarity index premises, essentially the fact is that the elementary QS computational element building block reduces to the well-known scalar product of two DFs, a so-called similarity measure. Indeed, given two quantum systans, say [A,B), the familiar quantum mechanical theoretical basis permits to obtain their attached wavefunctions via solving the respective Schrbdinger equations. From the system wavefunctions, a pair of associated DF p,4(r),pg(r) can be simply set up, with the vector r representing some number of particle coordinates. In molecular QS studies, the usual DF chosen is the first-order one thus, vector r = (x,y,z) corresponds to one-electron position coordinate only. Then, the similarity measure between the system pair of DF is simply defined as the overlap similarity integral ... [Pg.350]

As has been pointed out previously, both similarity-dissimilarity indices can be computed just knowing the corresponding QS measures. The CSI can be written using a generalized cosine expression of the angle subtended by two DFs ... [Pg.352]

In Table 13.3, we show the similarity/dissimilarity table for the nine coding sequences of Table 13.2 based on viewing the 16 matrix elements of the condensed matrices as 16-component vectors and calculating the Euclidean distance between these vectors as a measure of the similarity/dissimilarity. As one can see, the smallest entries in Table 13.3,1.00 and 3.46, belong human and gorilla and cattle and goat, respectively. [Pg.328]

One of the central problems of bioinformatics is DNA and protein alignment, which allows one to arrive at the degree of similarity between different DNA and proteins. Graphical bioinformatics [2,3,6] allows one to arrive at measures of similarity-dissimilarity of DNA and proteins without considering DNA or the protein alignment problem. [Pg.344]

The remainder of this chapter covers set- and vector-based representations of structural and molecular data and how this information is converted into the various similarity, dissimilarity, and distance measures that have found wide application in chemical informatics. Examples of some of the types of structural and molecular descriptors are also presented, along with a discussion of their essential features. Significant emphasis is given to the concept of CS, a concept that plays... [Pg.4]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...