Distance-Based Metrics

Clustering techniques are mostly based on the concept of similarity expressed through the definition of a metric (distances calculus rule) in... [Pg.153]

None of the cosine/correlation-like similarity indices or their complements (see Subheading 2.5.) are true metrics, that is, they do not obey the distance axioms. Petitjean (68,69), however, has developed a distance-based methodology, but it has not been applied in many cases. [Pg.32]

A diversity metric is a function to aid the quantification of the diversity of a set of compounds in some predefined chemical space. Diversity metrics fall into three main classes (1) Distance-based methods, which express diversity as a function of the pairwise molecular dissimilarities defined through measurement. (2) Cell-based methods, which define diversity in terms of occupancy of a finite number of cells that represent disjoint regions of chemical space. (3) Variance-based methods, which quantify diversity based on the degree of correlation between a compound s important features. [Pg.138]

Distance-based metrics quantify the diversity of a set of compounds as a function of their pairwise (dis)similarities in a descriptor space. It is important to mention that distance coefficients are analogous to distances in multidimensional geometric space, although they are usually not equivalent to such distances. For a distance coefficient to be described as a metric, it must possess the following four properties (1) Distance values must be nonzero and the distance from an object to itself must be zero. (2) Distance values must be symmetric. (3) Distance values must obey the triangular inequality. (4) Distances between nonidentical objects must be greater than zero. A coefficient containing only the first three properties is dubbed a pseudometric, and one without the third property is a nonmetric. [Pg.138]

The descriptors used for pairwise distance measurements can be continuous, as in a physicochemical property, or binary e.g., the presence or absence of a specific substructure). For continuous chemical spaces, nearly all metrics are based on the generalized Minkowski metric given in (1), where % represents the Mi feature of the ith molecule, k is the total number of features, and r the order of the metric. [Pg.138]

After a distance function is defined, the diversity of a compound collection can be measured in a number of ways. Minimum intermolecular dissimilarity (9) (where is the distance between the tth and yth compounds in the collection C), and average nearest neighbor distance, (10), are two common examples of distance-based diversity measures. Figure 1 illustrates examples of compound subsets using a nearest-neighbor design metric. [Pg.140]

For all fuzzy sets, including three-dimensional functions of electron density-like continua provided with suitable membership functions, the differences between the corresponding fuzzy sets can be expressed by a metric based on a generalization of the Hausdorff distance. The basic idea is to take the ordinary Hausdorff distances h a) for the a-cuts of the fuzzy sets for all relevant a values, scale the Hausdorff distance h)a according to the a value, and from the family of the scaled Hausdorff distances, the supremum determines the fuzzy metric distance f A,B) between the fuzzy sets A and B. If, in addition, the relative positions of the fuzzy sets A and B are allowed to change, then the infimum of the f(A, B) values obtained for the various positionings determines a fuzzy metric of the dissimilarities of the intrinsic shapes of the two fuzzy sets. [Pg.145]

As measuring techniques became more precise and the demand for accuracy increased, the standards on which people based their units were improved. In the 18 century, the French invented the metric system, based on a more consistent, systematic, and carefully defined set of standards than had ever been used before. For example, the meter (or metre, from the Greek metron, a measure ) became the standard for length. The first definition for the standard meter was one ten-millionth of the distance from the North Pole to the Equator. This became outdated as the precision of scientist s measuring instruments improved. Today, a meter is defined as the distance light travels in a vacuum in 1/299,792,458 second. Technical instruments for measuring length are calibrated in accordance with this very accurate definition. [Pg.10]

CONFORT performs an exhaustive conformational analysis of a molecule [71]. Two different search modes either generate a user-defined number of conformations, or output a maximally diverse set of conformations, which was used in this study. The diversity metric is based on interconformational distances that circumvent the generation of duplicate structures. The conformations are relaxed and optimized by applying only internal coordinates and analytic gradients and by the Tripos force field package. [Pg.207]

Although any metric can be utihzed for the inner distance on the complex plane, the Euclidean distance based on the 2-norm is usually taken as the distance between two complex numbers, see Equation (8) and Figure 1. The variable s denotes a frequency scale in general and stands for all possible frequency scales, see Section 3.4. [Pg.4]

Marengo, E. and Todeschini, R. (1992) A new algorithm for optimal distance-based experimental design. Chemo-metrics Intell. Lab. Syst. 16,37-44. [Pg.305]

In practice, even approximate distances are not known for most atom pairs rather, one can set upper and lower bounds on acceptable distances, based on the covalent structure of the protein and on the observed NOE cross peaks. Then particular instances can be generated by choosing (often randomly) distances between the upper and lower bounds, and embedding the resulting metric matrix. [Pg.1873]

In the basic metric matrix implementation of the distance constraint technique [16] one starts by generating a distance bounds matrix. This is an A X y square matrix (N the number of atoms) in which the upper bounds occupy the upper diagonal and the lower bounds are placed in the lower diagonal. The matrix is Ailed by information based on the bond structure, experimental data, or a hypothesis. After smoothing the distance bounds matrix, a new distance matrix is generated by random selection of distances between the bounds. The distance matrix is converted back into a 3D confonnation after the distance matrix has been converted into a metric matrix and diagonalized. A new distance matrix... [Pg.75]

The procedure of DG calculations can be subdivided in three separated steps [119-121]. At first, holonomic matrices (see below for explanahon) with pairwise distance upper and lower limits are generated from the topology of the molecule of interest. These limits can be further restrained by NOE-derived distance information which are obtained from NMR experiments. In a second step, random distances within the upper and lower limit are selected and are stored in a metric matrix. This operation is called metrization. Finally, all distances are converted into a complex geometry by mathematical operations. Hereby, the matrix-based distance space is projected into a Gartesian coordinate space (embedding). [Pg.237]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...