Clustering distance-based

Key Words Biological activity cell-based partitioning chemical descriptors classification clustering distance-based design diversity selection high-throughput screening quantitative structure-activity relationship. [Pg.301]

Various partitions, resulted from the different combinations of clustering parameters. The estimation of the number of classes and the selection of optimum clustering is based on separability criteria such as the one defined by the ratio of the minimum between clusters distance to the maximum of the average within-class distances. In that case the higher the criterion value the more separable the clustering. By plotting the criterion value vs. the number of classes and/or the algorithm parameters, the partitions which maximise the criterion value is identified and the number of classes is estimated. [Pg.40]

Figure 13.25 (a) Structure of the anion As7 , isoelectronic with As4Se3 (p. 581). The sequence of As-As distances (base>cap>side) is typical for such cluster anions but this alters to the sequence base >side>cap for neutral species such as As7(SiMe3)3 shown in (b). [Pg.589]

Another useful method for sample selection is cluster analysis-based selection.3 4,67 in this method, it is typical to start with a compressed PCA representation of the calibration data. An unsupervised cluster analysis (Section 8.6.3.1) is then performed, where the algorithm is terminated after a specific number of clusters are determined. Then, a single sample is selected from each of the clusters, as its representative in the final calibration data set. This cluster-wise selection is often done on the basis of the maximum distance from the overall data mean, but it can also be done using each of the cluster means instead. [Pg.313]

Further details of agglomerative, and several other clustering strategies may be found in the book by MASSART and KAUFMAN [1983] or, along with remarks on the treatment of situations with missing values, in the monograph by MUCHA [1992]. Finally, it may be of interest that OZAWA [1983] even proposed a hierarchical cluster algorithm based on an asymmetric distance matrix. [Pg.159]

The chemistry of reduced Nb and Ta hahdes is rich in clusters with various structures. The metal atoms assemble with metal metal distances close to those in the metal into triangular and tetranuclear clusters but the dominant structural motif is that of the octahedral M6X12 and NbeIg types. Binary, ternary, and quaternary compounds aU crystallize in that type. The Me clusters are characteristic of the chemistry of the lower oxidation states of Nb and Ta, although not restricted to them. These electron-deficient clusters are based on metal ions with average oxidation numbers between III and I. [Pg.2948]

Analysis of diversity in response to polar and nonpolar vapors of all screened polymers was performed using PCA analysis followed by cluster analysis. The scores plots of the first three PCs (Fig. 5.12) illustrate the diversity of performance of all sensing polymers. The larger the distance between polymer data points, the larger the difference in the response pattern between the respective CdSe/polymer nanocomposites. To quantify this diversity, cluster analysis was further performed where the distances based on principal component scores were adjusted to unit variance.45 This distance measure, known as Mahalanobis distance, accounts for the different amount of variation in different directions. An example of such difference is shown in Fig. 5.12a, b, where the distance between polymer 4 and other polymers is much larger on the plot of PC 1 vs. PC 2 when compared with the plot of PC 2 vs. PC 3. [Pg.126]

A simple distance based clustering on the gene expression data is used to generate seeded population. A distance matrix dab is defined, which represents Euclidean s distance between each gene. It is calculated as. [Pg.382]

Cell-based methods, as well as clustering or distance-based methods, aim at extracting representative structurally diverse subsets of compounds from large chemical databases [Cummins, Andrews et al, 1996 Mason and Pickett, 1997 Pearlman and Smith, 1999 Earnum, Desjarlais et al, 2003]. They are mainly used in design and optimization of combinatorial libraries the most important aspect being here to ensure maximum diversity within and between libraries before they are produced. Moreover, cell-based methods are used for lead discovery purposes allowing the selection of the compounds most similar to the active reference target. [Pg.84]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...