Measures of Dissimilarity

Partitional clustering using Euclidean distance as a measure of dissimilarity between pattern classes has been selected for the grouping of AE hits. [Pg.39]

In dissimilarity-based compound selection the required subset of molecules is identified directly, using an appropriate measure of dissimilarity (often taken to be the complement of the similarity). This contrasts with the two-stage procedure in cluster analysis, where it is first necessary to group together the molecules and then decide which to select. Most methods for dissimilarity-based selection fall into one of two categories maximum dissimilarity algorithms and sphere exclusion algorithms [Snarey et al. 1997]. [Pg.699]

Cluster analysis (CA) performs agglomerative hierarchical clustering of objects based on distance measures of dissimilarity or similarity. The hierarchy of clusters can be represented by a binary tree, called a dendrogram. A final partition, i.e. the cluster assignment of each object, is obtained by cutting the tree at a specified level [24],... [Pg.759]

Bayesian methods have often proved useful for design of experiments, especially in situations in which the optimal design depends on unknown quantities. Certainly, to identify a design for optimal estimation of /3, the correct subset of active effects must be identified. Bayesian approaches that express uncertainty about the correct subset enable construction of optimality criteria that account for this uncertainty. Such approaches typically find a design that optimizes a criterion which is averaged over many possible subsets. DuMouchel and Jones (1994) exploited this idea with a formulation in which some effects have uncertainty associated with whether they are active. Meyer et al. (1996) extended the prior distributions of Box and Meyer (1993) and constructed a model discrimination design criterion. The criterion is based on a Kullback-Leibler measure of dissimilarity between... [Pg.263]

Attribute data were identified from the photomicrographs of each location point in the sample, and each potential categorical variable (attribute) was recorded as present or absent. Because it was desirable to determine whether the evidence at the location points was related, the data were subjected to hierarchical clustering. The measure of dissimilarity used in the project was the number of matches among attribute measurements that two location points shared. For example, two points had a dissimilarity of 0 if they matched on all attribute measurements, and at the other extreme, the two location points had a dissimilarity of 13 (the total number of measured attributes) if they did not match on any of the measurements. Each match was weighed as equally important. In addition to this intuitive measure of... [Pg.456]

By applying the SNSM similarity measure to mirror images, the quantity is a measure of achirality, whereas the dissimilarity measure d A,A ), denoted as Xs J A), is a measure of chirality, where the interrelation (137) between Xs,J A) and implies that this measure can take values from the unit interval. The measure Xs A), first proposed as an example of dissimilarity measures of the second kind, is zero for achiral objects and takes positive values for all chiral objects. Objects perceived as having prominent chirality tend to have large Xs A) values. The SNSM measures have also been applied to more general molecular shape problems. More recently, Klein showed that by a logarithmic transformation of the scaling factors s g, a metric can be constructed to provide a proper distance-like measure of dissimilarity of shapes. [Pg.173]

The dissimilarity of fuzzy sets A and A(R,c) provides a measure of the symmetry aspect R for set A with respect to center c. A large measure of dissimilarity implies a higher degree of symmetry deficiency of fuzzy set A, with respect to symmetry represented by element R. This symmetry deficiency can be described using either one of the fuzzy set dissimilarity metrics. For example, if the fuzzy metric FSNDSM df/A,B) is used, then one obtains the fuzzy symmetry deficiency measure jf JA) ... [Pg.185]

Similarity and distance between objects are complementary concepts for which there is no single formal definition. In practice, distance as a measure of dissimilarity is a much more clearly defined quantity and is more extensively used in cluster analysis. [Pg.96]

Euclidean distance is not the only possible measure of dissimilarity. It is, however, ideal for dealing with mixtures. [Pg.100]

Again, molecular dissimilarity may be numerically defined and calculated in a variety of ways. Taking the simple definition above, however, the calculation is precisely the same as for molecular similarity. The same numerical value is used to describe similarity and dissimilarity. In applying the concept of molecular similarity in chemicsd information systems at Pfizer Central Research, we use the Tanimoto coefficient as the quantitative similarity measure. This measure, first used for the purpose at Sheffield University, has been widely adopted as the standard measure of molecular similarity, its value varying between 1.0 (identity) and 0.0 (no similarity). We use it, without adaption, as a measure of dissimilarity also. High values implies low similarity, and vice versa. [Pg.384]

The shapes of similar molecules can be compared using the idea of the symmetric difference. The symmetric difference of two superposed objects is their union minus their intersection (Figure 2). By superposing molecules so that their overlap is maximal and then measuring the volume of the symmetric difference, it is possible to define a shape metric. A shape metric is a measure of dissimilarity that obeys the triangle inequality the difference of A and B plus the difference of B... [Pg.1699]

To choose k test stores, we partition the n stores of the chain into k clusters. The stores within each cluster are chosen to minimize a measure of dissimilarity based on the percent of total sales represented by sales of each of the prior products in each store. Two stores that sold exactly the same percentage of each of the prior products would be in the same cluster, and all of the stores within a cluster would sell approximately the same percentage of each of the prior products. We then choose a single test store within each cluster that best represents the cluster in the sense that using test sales at this store to forecast sales of other stores in the cluster minimizes the cost of forecast errors. [Pg.114]

To extraa more information from the spearal data, 2D-COS can be employed. Basically, this analysis method ae-ates a pair of synchronous (vj,v2) and asynchronous F(vi,V2) 2D correlation spectra, where the spectral variables vi and V2 are wavenumbers. The synchronous 2D correlation intensity (vi,V2) represents the overall similarity or coincidental changes between two separate intensity variations measured at different spectral variables during variation of the external perturbation. The as3mchronous 2D correlation intensity 1 (vi,V2) may be regarded as a measure of dissimilarity or more strictly speaking, out-of-phase charaaer of the spectral intensity variations. [Pg.274]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...