Measures of similarity

Thus, in the area of combinatorial chemistry, many compounds are produced in short time ranges, and their structures have to be confirmed by analytical methods. A high degree of automation is required, which has fueled the development of software that can predict NMR spectra starting from the chemical structure, and that calculates measures of similarity between simulated and experimental spectra. These tools are obviously also of great importance to chemists working with just a few compounds at a time, using NMR spectroscopy for structure confirmation. [Pg.518]

The Ugly-Duckling theorem thus loosely states that there can never be an internally objective way to ascribe a measure of similarity (or dissimilarity) between any two randomly chosen subsets of a given set. In other words, an asymmetry can be induced only via some nec essarily external sense of esthetics. ... [Pg.630]

Similarity searching requires the specification of an entire molecule, called the target structure or reference structure, rather than the partial structure that is required for substructure searching. The target molecule is characterized by a set of structural features, and this set is compared with the corresponding sets of features for each of the database structures. Each such comparison enables the calculation of a measure of similarity between the... [Pg.193]

Dimensionless numbers (Reynolds number = udip/jj., Nusselt number = hd/K, Schmidt number = c, oA, etc.) are the measures of similarity. Many correlations between them (known also as scale-up correlations) have been established. The correlations are used for calculations of effective (mass- and heat-) transport coefficients, interfacial areas, power consumption, etc. [Pg.227]

Similarity Comparison of molecules using molecular descriptors and a measure of similarity, for example a 2D fingerprint and the Tanimoto coefficient... [Pg.32]

Methods that rank compounds based on some measure of similarity to known actives, based on 2D or 3D structure of the molecule (LBVS). [Pg.88]

The pragmatic beauty of the chemical fingerprint is that the more common features of two molecules that there are, the more common bits are set. The mathematic approach used to translate the fingerprint comparison data into a measure of similarity tunes the molecular comparison [5]. The Tanimoto similarity index works well when a relatively sparse fingerprint is used and when the molecules to be compared are broadly comparable in size and complexity [5]. If the nature of the molecules or the comparison desired is not adequately met by the Tanimoto index, multiple other indices are available to the researcher. For example, the Daylight software offers the user over ten similarity metrics, and the Pipeline Pilot as distributed offers at least three. Some of these metrics (e.g., Tversky, Cosine) offer better behavior if the query molecule is significantly smaller than the molecule compared to it. [Pg.94]

ART2 forms clusters from training patterns by first computing a measure of similarity (directional rather than distance) of each pattern vector to a cluster prototype vector, and then comparing this measure to an arbitrarily specified proximity criterion called the vigilance. If the pattern s similarity measure exceeds the vigilance, the cluster prototype or center is updated to incorporate the effect of the pattern, as shown in Fig. 25 for pattern 3. If the pattern fails the similarity test, competition resumes without the node... [Pg.63]

Figure 3.13 Principal component analysis of repetitive GC/MS profiles of M. truncatula root (R), stem (S) and leaves (L). The first and second principal component of each GC/MS analysis were calculated and plotted. The relative distance between points is a measure of similarity or difference. The clustering shows good reproducibility within the independent tissues but clear differentiation of tissues. The results also show that roots and stems are more similar to each other than to leaves.

This technique functions by taking observed measures of similarity or dissimilarity between every pair of M objects, then finding a representation of the objects as points in Euclidean space so that the interpoint distances in some sense match the observed similarities or dissimilarities by means of weighting constants. [Pg.948]

On the other hand, factor analysis involves other manipulations of the eigen vectors and aims to gain insight into the structure of a multidimensional data set. The use of this technique was first proposed in biological structure-activity relationship (i. e., SAR) and illustrated with an analysis of the activities of 21 di-phenylaminopropanol derivatives in 11 biological tests [116-119, 289]. This method has been more commonly used to determine the intrinsic dimensionality of certain experimentally determined chemical properties which are the number of fundamental factors required to account for the variance. One of the best FA techniques is the Q-mode, which is based on grouping a multivariate data set based on the data structure defined by the similarity between samples [1, 313-316]. It is devoted exclusively to the interpretation of the inter-object relationships in a data set, rather than to the inter-variable (or covariance) relationships explored with R-mode factor analysis. The measure of similarity used is the cosine theta matrix, i. e., the matrix whose elements are the cosine of the angles between all sample pairs [1,313-316]. [Pg.269]

A new definition of molecular similarity is presented, based upon the similarity of the corresponding molecular graphs. First, all of the subgraphs of the molecular graph are listed, and then various similarity indices are derived from the numbers of subgraphs. One of these compares favorably with the standard distance measures of sequence comparison. Measurement of similarity provides a new way to measure molecular complexity, as long as the most (or least) complex member of a set of molecules can be identified. [Pg.169]

Garbo, R., Leyda, L., and Arnau, M. How similar is a molecule to another An electron density measure of similarity between two molecular... [Pg.110]

Identification involves the confirmation of a certain chemical entity from its spectrum by matching against the components of a spectral library using an appropriate measure of similarity such as the correlation coefficient, also known as the spectral match value (SMV). SMV is the cosine of the angle formed by the vectors of the spectram for the sample and the average spectrum for each product included in the library. [Pg.471]

We compiled literature and our own extraction data to compare the distribution of the same solutes from water to [C4CiIm][PFg] and 48 various conventional solvents. As a measure of similarity of the extraction properties of any two solvents, we used the Pearson correlation coefficient between IgD for the same solutes. Note that a high correlation coefficient does not mean that the distribution ratios determined with the two solvents are close by absolute value rather, it means that the distribution ratios change in the same marmer from one solute to another. [Pg.251]

Relationships between the individual LOE can be examined via principal components analysis (PCA). Correlations among principal components for individual LOE indicate concordance or agreement. Relationships between different SQT LOE can also be assessed using other methods including Mantel s test (Legendre and Fortin, 1989) coupled with a measure of similarity or ordination canonical discriminant (or correspondence) analyses multidimensional scaling (MDS). [Pg.313]

Hodgkin and Richards suggested a new SI to provide a more sensitive measure of similarity [104] ... [Pg.63]

Figure 12 reveals that when VAi = a VBi(Vi) (a is a constant and i- 0) the Petke MEP-SI is an even more sensitive measure of similarity than the Hodgkin index. This is true particularly in the region ae[- 1, 1] where MEP-SI(P) varies linearly with a. The Sis defined by Eqs. (17), (18) and (22) may be called cumulative indices since in each case the SI is computed by accumulating products of MEP values for a number of grid points [116]. [Pg.67]

To find the structures of the objects in the data set, we need a measure of similarity. Although many types of measures can be applied, the Euclidean distance is the most frequently used similarity measure. According to the law of Pythagoras, the distance between two points Oj and 02 characterized by variables x and y can be presented as follows (Figure 15.1) ... [Pg.371]

All pattern classification methods listed group items by similarity. Measurement of similarity differs depending upon the method, and therefore different methods yield different results. Table 12 describes several similarity metrics and why they produce different results for the same data set. [Pg.542]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...