Distance measurements, molecular similarity

Properties other than distances can also measure molecular similarity and flexibility. Relative changes in electrostatic potentiap23 and changes in principal moments of inertia " have been used for this purpose. [Pg.237]

The distance matrix A, which holds the relative distances (by whatever similarity measure) between the individual confonnations, is rarely informative by itself. For example, when sampling along a molecular dynamics trajectory, the A matrix can have a block diagonal form, indicating that the trajectory has moved from one conformational basin to another. Nonetheless, even in this case, the matrix in itself does not give reliable information about the size and shape of the respective basins. In general, the distance matrix requires further processing. [Pg.85]

In the original kNN method, an unknown object (molecule) is classified according to the majority of the class memberships of its K nearest neighbors in the training set (Fig. 13.4). The nearness is measured by an appropriate distance metric (a molecular similarity measure as applied to the classification of molecular structures). It is implemented simply as follows ... [Pg.314]

In 1980, Carbo et al. were the first to express molecular similarity using the electron density [17]. They introduced a distance measure between two molecules A and B in the sense of a Euclidean distance in the following way ... [Pg.231]

A new definition of molecular similarity is presented, based upon the similarity of the corresponding molecular graphs. First, all of the subgraphs of the molecular graph are listed, and then various similarity indices are derived from the numbers of subgraphs. One of these compares favorably with the standard distance measures of sequence comparison. Measurement of similarity provides a new way to measure molecular complexity, as long as the most (or least) complex member of a set of molecules can be identified. [Pg.169]

As discussed in Subheading 1., the primary design criterion is often based on either similarity or diversity. Quantifying these measures requires that the compounds are represented by numerical descriptors that enable pairwise molecular similarities or distances to be calculated or that allow the definition of a multidimensional property space in which the molecules can be placed. [Pg.339]

The USR (Ultrafast Shape Recognition) Method. This method was reported by Ballester and Richards (53) for compound database search on the basis of molecular shape similarity. It was reportedly capable of screening billions of compounds for similar shapes on a single computer. The method is based on the notion that the relative position of the atoms in a molecule is completely determined by inter-atomic distances. Instead of using all inter-atomic distances, USR uses a subset of distances, reducing the computational costs. Specifically, the distances between all atoms of a molecule to each of four strategic points are calculated. Each set of distances forms a distribution, and the three moments (mean, variance, and skewness) of the four distributions are calculated. Thus, for each molecule, 12 USR descriptors are calculated. The inverse of the translated and scaled Manhattan distance between two shape descriptors is used to measure the similarity between the two molecules. A value of 1 corresponds to maximum similarity and a value of 0 corresponds to minimum similarity. [Pg.124]

Distance-based methods require a definition of molecular similarity (or distance) in order to be able to select subsets of molecules that are maximally diverse with respect to each other or to select a subset that is representative of a larger chemical database. Ideally, to select a diverse subset of size k, all possible subsets of size k would be examined and a diversity measure of a subset (for example, average near neighbor similarity) could be used to select the most diverse subset. Unfortunately, this approach suffers from a combinatoric explosion in the number of subsets that must be examined and more computationally feasible approximations must be considered, a few of which are presented below. [Pg.81]

However, these descriptor-based similarity definitions present only one class of available similarity and distance measures. Approaches to molecular... [Pg.126]

Analysis of molecular similarity is based on the quantitative determination of the overlap between fingerprints of the query structure and all database members. As descriptors of a given molecule can be considered as a vector of real or binary attributes, most of the similarity measures are derived as vectorial distances. Tanimoto and Cosine coefficients are the most popular measures of similarity.Definitions of similarity metrics are collected in Table 3. [Pg.4017]

Most of the methods that have been introduced for the estimation of molecular similarity are based on substructure [53], topological [54], and graph theoretical approaches [55] (for an overview on similarity measures see, for example, Willett [56] or Johnson and Maggiora [57]). However, 3D distance measures have been used quite seldom for similarity purposes [58]. [Pg.194]

Molecular similarity The degree of similarity between molecules, although quantitatively measurable, very much depends on what molecular features are used to establish the degree of similarity. One of the many comparators is the electron density of a pair of molecules. Other comparators include electrostatic potentials, reactivity indices, hydrophobicity potentials, molecular geometry such as distances and angles between key atoms, solvent accessible surface area, etc. It is an open question as to how much or what part(s) of the molecular structure is to be compared. The Tanimoto coefficient which compares dissimilarity to similarity is often used in molecular diversity analysis. [Pg.759]

Molecular Similarity and QSAR. - In a first contribution on the design of a practical, fast and reliable molecular similarity index Popelier107 proposed a measure operating in an abstract space spanned by properties evaluated at BCPs, called BCP space. Molecules are believed to be represented compactly and reliably in BCP space, as this space extracts the relevant information from the molecular ab initio wave functions. Typical problems of continuous quantum similarity measures are hereby avoided. The practical use of this novel method is adequately illustrated via the Hammett equation for para- and me/a-substituted benzoic acids. On the basis of the author s definition of distances between molecules in BCP space, the experimental sequence of acidities determined by the well-known a constant of a set of substituted congeners is reproduced. Moreover, the approach points out where the common reactive centre of the molecules is. The generality and feasibility of this method will enable predictions in medically related Quantitative Structure Activity Relationships (QSAR). This contribution combines the historically disparate fields of molecular similarity and QSAR. [Pg.150]

We will take the paths of Table 1 as molecular descriptors to obtain a quantitative measure of molecular similarity. In Table 2 we show the similarity/dissimilarity table for the octane isomers using the Euclidean distance as the measure of similarity. The smaller entries in Table 2 indicate molecules found similar under the procedure adopted, while the larger entries point to the least similar structures. [Pg.177]

One key aspect of model applicability is the definition of the chemical space and the way in which chemical similarity is measured, as chemical similarity is a relative concept. The similarity or distance values depend on both the type of molecular representation or the distance measure used. Due to this lack of... [Pg.466]

The previous discussion subtly shifted between molecular similarity and molecular properties. It is important to elucidate the relationship between the two. If each of the molecular properties can be treated as a separate dimension in a Euclidean property space, and dissimilarity can be equated with distance between property vectors, similarity/diversity problems can be solved using analytical geometry. A set of vectors (chemical structures) in property space can be converted to a matrix of pairwise dissimilarities simply by applying the Pythagorean theorem. This operation is like measuring the distances between all pairs of cities from their coordinates on a map. [Pg.78]

In 1980, Carbo, Arnau and Leyda were the first to use molecular quantum similarity. As an anecdote, in the submitted version of the manuscript, the title was How far is one molecule from another After a reviewer s comment, this title was changed to How similar is one molecule to another The revised title has a much more obvious reference to similarity. In a sense, both titles are descriptive, because in that manuscript, the first degree of molecular similarity with a distance measure was presented. More precisely, a distance measure was introduced as... [Pg.134]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...