Pairwise similarity measure

In Eq. (1), (p(ai j, ai j) represents any pairwise similarity measure between the elements of two rows. For instance, one might want to penalize large pairwise differences between adjacent elements in the final rearrangement. For this purpose, one could evaluate the squared difference between two rows as shown in Eq. (2). [Pg.570]

It should be noted that the form of the objective function presented in Eq. (2) is not limited to this Euclidean metric and can accommodate almost any pairwise similarity measure. [Pg.571]

Clustering is the process of dividing a collection of objects into groups (or clusters) so that the objects within a cluster are highly similar whereas objects in different clusters are dissimilar [41]. When applied to databases of compounds, clustering methods require the calculation of all the pairwise similarities of the compounds with similarity measures such as those described previously, for example, 2D fingerprints and the Tanimoto coefficient. [Pg.200]

The first piece of information tells us when two attributes are likely to be similar and is generated by a collection of schema matching modules. This information is typically given by some pairwise attribute similarity measure, say 5. The similarity s(ai, aj) between two source attributes a, and aj depicts how closely the two attributes represent the same real-world concept. [Pg.102]

Martin et al. [2] have developed a diversity measure that uses a combination of log P, topological indices, pairwise similarities calculated from Daylight fingerprints, and atom layer properties based on receptor recognition descriptors. Principal-components analysis and multidimensional scaling are used to produce a vector that is input to D-optimal design. Hassan et al. [34] use molar refractivity and calculated log P. Cummins et al. [30] use free energy of solvation in database comparisons. [Pg.257]

In the first approach, which is also related to material covered in Section 15.5.6, consider a specific reference (probe) molecule, m, that may be active in some assay. The issue now is to identify the, say 250, molecules in a large compound database that are most similar to with respect to a given similarity measure. This creates a list, Lj, of molecules that can be ordered from smallest to largest similarity value. The process is now repeated R - 1 times using other similarity measures that are not functionally related in a mathematical sense (see Gower [76] and the discussion in Section 15.5.3 for further discussion). This yields a set of R ordered lists L= L, L, .that can be compared in a pairwise fashion using statistical correlation methods in the sequel. [Pg.375]

Structural similarity is a pairwise relation between molecules. Similarity values are determined by a similarity measure that has three key components (1) a representation of the relevant chemical or structural features of the molecules being compared, (2) an appropriate weighting of these features, and (3) a function that maps the feature information for pairs of molecules to a value that lies on the unit interval of the real line [0,1]. As noted in the previous section, representations can utilize macroscopic chemical features, electronic structural features of individual molecules, and/or geometric features associated with the structure or substructures of molecules... [Pg.5]

To choose the reference shape for MSA, each available conformation is used in turn as a reference to calculate the pairwise molecular similarity to all other conformations of all other molecules. The conformation of each molecule that has the highest overlap volume with the current reference is used as the similarity measure for that reference. Thus, given M conformations in the database, there will be M MSA parameters that describe the shapes of the compounds. In a 1994 study, the overlapped structures of four molecules were merged to define a reference shape. [Pg.198]

To demonstrate this, Figure 2.8 shows the comparison of similarity for Daylight structural and biological fingerprints created from a panel of 154 assays from the BioPrint database (measured by pairwise Tanimoto distance for 347 drugs with MW 200-600 60031 points) [6]. Figure 2.8a shows the overall scatter plot of the... [Pg.32]

Kendall s tau correlation r Kendall) also measures the extent of monotonically increasing or decreasing relationships between the variables. It is also a nonparametric measure of association. It is computationally more intensive than the Spearman rank correlation because all slopes of pairs of data points have to be computed. Then Kendall s tau correlation is defined as the average of the signs of all pairwise slopes. The range of r is —1 to +1 the method is relatively robust against outliers for many applications p and r give similar answers. [Pg.57]

Similarly, a measure of heterogeneity between two clusters can be based on the maximum, minimum, or average of all pairwise distances between the objects of the two clusters (compare complete, single, and average linkage), or on the pairwise distances between the cluster centers. The latter choice results in a measure of heterogeneity /i / between cluster j and l as... [Pg.284]

Molecular diversity has a relatively brief history, which began in the late eighties and somewhat parallels the development of combinatorial chemistry [1]. Unlike molecular similarity [2-4], which is a pairwise measure, molecular diversity is a measure of the similarity distribution over a population of molecules. Alternatively, molecular diversity can be assessed in terms of the dissimilarity distribution over a population of molecules since the dissimilarity of two molecules i and j is the complement of their similarity, that is D i,j) = 1 — S i,j). [Pg.317]

As discussed in Subheading 1., the primary design criterion is often based on either similarity or diversity. Quantifying these measures requires that the compounds are represented by numerical descriptors that enable pairwise molecular similarities or distances to be calculated or that allow the definition of a multidimensional property space in which the molecules can be placed. [Pg.339]

SELECT has been designed to allow optimization of a variety of different objectives. Diversity (and similarity) is optimized using functions either based on pairwise dissimilarities and fingerprints or using cell-based measures. The physicochemical properties of libraries are optimized by minimizing the dif-... [Pg.341]

In the third portion of the study, the results using five different sampler and analytical method combinations were compared. When obvious outliers were excluded from the data, the normalized percentage differences compared to the mean value for sulfur varied from -21 to +23%. Pairwise comparisons for other elements showed similar variability. The agreement overall for X-ray fluorescence compared to PIXE was good, although there was scatter in the individual measurements, perhaps due to differences in sampling (Nejedly et al., 1998). [Pg.622]

We note that similar conclusions were drawn from the data obtained in the rototranslational bands, and the purely translational bands, pp. 75ff. and 104ff. In all cases considered, the moments calculated with the assumption of pairwise-additivity are smaller than the measurements. [Pg.128]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...