Distance-based method

Distance-based methods possess a superior discriminating power and allow highly similar compounds (e.g. substances with different particle sizes or purity grades, products from different manufacturers) to be distinguished. One other choice for classification purposes is the residual variance, which is a variant of soft independent modeling of class analogy (SIMCA). [Pg.471]

To classify a new sample, fc-NN computes its distances (usually, the multivariate Euclidean distances, see Eq. 7) from each of the samples of a training set, whose class membership is known. The k nearest samples are then taken into consideration to perform the classification generally, a majority vote is employed, meaning that the new object is classified into the class mostly represented within the k selected objects. Being a distance-based method, it is sensitive to the measurement units and to the scaling procedures applied. [Pg.85]

Distance-based methods require a definition of molecular similarity (or distance) in order to be able to select subsets of molecules that are maximally diverse with respect to each other or to select a subset that is representative of a larger chemical database. Ideally, to select a diverse subset of size k, all possible subsets of size k would be examined and a diversity measure of a subset (for example, average near neighbor similarity) could be used to select the most diverse subset. Unfortunately, this approach suffers from a combinatoric explosion in the number of subsets that must be examined and more computationally feasible approximations must be considered, a few of which are presented below. [Pg.81]

There are two main strategies developed to select diverse and representative subsets of molecules, namely, cell-based methods and distance-based methods. [Pg.83]

An alternative two-part classification has been proposed by Pearlman et al. [90], who characterise methods as either cell-based or distance-based, these classes corresponding to partition-based methods and to all the other types of method, respectively. As Pearlman et al. note, distance-based methods can be used with any type of structural representation but are most effective when the need is to identify subsets (of whatever sort) cell-based... [Pg.134]

A diversity metric is a function to aid the quantification of the diversity of a set of compounds in some predefined chemical space. Diversity metrics fall into three main classes (1) Distance-based methods, which express diversity as a function of the pairwise molecular dissimilarities defined through measurement. (2) Cell-based methods, which define diversity in terms of occupancy of a finite number of cells that represent disjoint regions of chemical space. (3) Variance-based methods, which quantify diversity based on the degree of correlation between a compound s important features. [Pg.138]

Different approaches to estimate interpolation regions in a multivariate space were evaluated by Jaworska [Jaworska, Nikolova-Jeliazkova et al, 2005], based on (a) ranges of the descriptor space (b) distance-based methods, using Euclidean, Manhattan, and Mahalanobis distances. Hotelling T method and leverage values and (c) probability density distribution methods based on parametric and nonparametric approaches. Both ranges and distance-based methods were also evaluated in the principal component space by Principal Component Armlysis. [Pg.18]

Cell-based methods, as well as clustering or distance-based methods, aim at extracting representative structurally diverse subsets of compounds from large chemical databases [Cummins, Andrews et al, 1996 Mason and Pickett, 1997 Pearlman and Smith, 1999 Earnum, Desjarlais et al, 2003]. They are mainly used in design and optimization of combinatorial libraries the most important aspect being here to ensure maximum diversity within and between libraries before they are produced. Moreover, cell-based methods are used for lead discovery purposes allowing the selection of the compounds most similar to the active reference target. [Pg.84]

The most widely used models of amino acid substitution include distance-based methods, which are based on matrixes such as PAM and BLOSUM. Again, such matrices are described fiuther in other chapters in this book. Briefly, Dayhoff s PAM 001 matrix (Dayhoff, 1979) is an empirical model that scales probabilities of change from one amino acid to another in terms of an expected 1% change between two amino acid sequences. This matrix is used to make a transition probability matrix that allows prediction of the probability of changing from one amino acid to another and also predicts equilibriiun amino acid composition. Phylogenetic distances are calculated with the assumption that the probabilities in the matrix are correct. The... [Pg.338]

Phylogenetic inference methods can be broken into two categories, those that create trees based on genetic distances among taxa and those that create trees based on presence of shared character states (Felsenstein 1988, Swofford and Olsen 1990). Distance-based methods could be subdivided into... [Pg.50]

Sune, V. Carrasco, X (2001). A failure-distance based method to bound the reliability of non-repairable fault-tolerant systems without the knowledge of minimal cuts. IEEE Trans, on Rel. 50(1), 60-74. [Pg.177]

Just as field-based methods in 3D similarity extend and complement the atom/distance-based methods, there are a variety of field-based 3D QSAR methods which complement pharmacophore searching. The most common of these is the CoMFA (comparative molecular field analysis) technique (and related methods). These methods have been applied to... [Pg.2997]

Two-dimensional maps of the surface of interest can be constructed by moving the SECCM probe meniscus laterally over the surface, following the topography and collecting the spatially resolved response. The probe is first approached toward the surface until the liquid meniscus makes contact with it, as described earlier. The probe is then moved laterally across the surface and a feedback loop is used to maintain a user-defined oscillating ion current magnitude, which corresponds to a constant probe-surface distance. Two different methods to maintain a constant set point have been reported a distance-based method and a time-based method. - In the distance-based method, the height of the probe is adjusted based on the lateral distance the probe has moved... [Pg.663]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...