Distance measures molecular similarity/diversity

As discussed in Subheading 1., the primary design criterion is often based on either similarity or diversity. Quantifying these measures requires that the compounds are represented by numerical descriptors that enable pairwise molecular similarities or distances to be calculated or that allow the definition of a multidimensional property space in which the molecules can be placed. [Pg.339]

Distance-based methods require a definition of molecular similarity (or distance) in order to be able to select subsets of molecules that are maximally diverse with respect to each other or to select a subset that is representative of a larger chemical database. Ideally, to select a diverse subset of size k, all possible subsets of size k would be examined and a diversity measure of a subset (for example, average near neighbor similarity) could be used to select the most diverse subset. Unfortunately, this approach suffers from a combinatoric explosion in the number of subsets that must be examined and more computationally feasible approximations must be considered, a few of which are presented below. [Pg.81]

Molecular similarity The degree of similarity between molecules, although quantitatively measurable, very much depends on what molecular features are used to establish the degree of similarity. One of the many comparators is the electron density of a pair of molecules. Other comparators include electrostatic potentials, reactivity indices, hydrophobicity potentials, molecular geometry such as distances and angles between key atoms, solvent accessible surface area, etc. It is an open question as to how much or what part(s) of the molecular structure is to be compared. The Tanimoto coefficient which compares dissimilarity to similarity is often used in molecular diversity analysis. [Pg.759]

The previous discussion subtly shifted between molecular similarity and molecular properties. It is important to elucidate the relationship between the two. If each of the molecular properties can be treated as a separate dimension in a Euclidean property space, and dissimilarity can be equated with distance between property vectors, similarity/diversity problems can be solved using analytical geometry. A set of vectors (chemical structures) in property space can be converted to a matrix of pairwise dissimilarities simply by applying the Pythagorean theorem. This operation is like measuring the distances between all pairs of cities from their coordinates on a map. [Pg.78]

Forster s theory [1], has enabled the efficiency of EET to be predicted and analyzed. The significance of Forster s formulation is evinced by the numerous and diverse areas of study that have been impacted by his paper. This predictive theory was turned on its head by Stryer and Haugland [17], who showed that distances in the range of 2-50 nm between molecular tags in a protein could be measured by a spectroscopic ruler known as fluorescence resonance energy transfer (FRET). Similar kinds of experiments have been employed to analyze the structure and dynamics of interfaces in blends of polymers. [Pg.471]

The measurement of molecular diversity requires the definition of a chemical space. This A-dimensional chemical space is represented by a group of selected molecular descriptors. Each compound in a collection can be assigned coordinates based on the measurement of its descriptor values. Increasing distance, within the dimensions of the assigned coordinate space, should correlate with increasing diversity (or decreasing similarity) between compounds. [Pg.137]

The procedure consists in transforming the initial data matrix X, with n compounds and p molecular descriptors, into a similarity or diversity matrix obtaining anxn square symmetric matrix, after the selection of the distance (similarity) measure and the appropriate scaling of the original variables. A regression model is then performed using as the molecular descriptors the columns dj of the distance matrix (diversity descriptors), where the column elements dy represent the distances between each ith molecule and the jth molecule. Analogously, molecular descriptors can be defined as the columns Sj of the similarity matrix (similarity descriptors). [Pg.704]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...