Algorithms, clustering

After selecting a measure one has to decide which clustering algorithm (strategy) may be appropriate. Sometimes it is necessary for the algorithm to fit the similarity measure. In most cases one wishes to use the algorithm which yields the most interpretable or plausible data structure. [Pg.156]

Two main groups of cluster algorithms can be distinguished hierarchical or nonhier-archical (partitioning) techniques. [Pg.156]

The typical output of hierarchical cluster methods is a so-called dendrogram, a treelike diagram which is very useful for discussing several possible results of the clustering process. For an illustration see Fig. 5-13 the underlying example will be explained in Section 5.3.4. [Pg.156]

As indicated, agglomerative methods start with single objects or pairs of objects step by step clusters are formed which are finally united in one cluster. Divisive methods, on the other hand, start from the one cluster of all objects and divide it step by step. One drawback of the commonly used agglomerative methods is that clusters formed may not be broken up in a subsequent step. With certain algorithms this sometimes leads to so-called inversions in the dendrogram, i.e. crossing lines in the diagram. [Pg.156]

One has to keep in mind that groups of objects found by any clustering procedure are not statistical samples from a certain distribution of data. Nevertheless the groups or clusters are sometimes analyzed for their distinctness using statistical methods, e.g. by multivariate analysis of variance and discriminant analysis, see Section 5.6. As a result one could then discuss only those clusters which are statistically different from others. [Pg.157]

There is no correct method of performing cluster analysis and a large number of algorithms have been devised from which one must choose the most appropriate approach. There can also be a wide variation in the efficiency of the various cluster algorithms, which may be an important consideration if the data set is large. [Pg.507]

J. A. Hartigan, Clustering Algorithms, John Wiley Sons, Inc., New York, 1975. [Pg.431]

A related method is the component synthesis method [17], which uses a so-called static condition to model the interactions between parts of a molecule whose corresponding diagonal blocks in the Hessian are first diagonalized. It has been combined with a residue clustering algorithm that provides a hierarchy of parts, which at the lowest level provides small enough matrices for efficient diagonalization [18]. It has been applied to double-helical DNA [17] and the protein crambin [18]. [Pg.157]

The applicability of a clustering algorithm to pattern recognition is entirely dependent upon the clustering characteristics of the patterns in the representation space. This structural dependence emphasizes the importance of representation. An optimal representation uses pattern features that result in easily identified clustering of the different pattern classes in the representation space. At the other extreme, a poor choice of representation can result in patterns from all classes being uniformly distributed with no discernible class structure. [Pg.60]

Hanagandi, V. and M. Nikolaou. A Hybrid Approach to Global Optimization Using a Clustering Algorithm in a Genetic Search Framework. Comput Chem Eng 22 1913-1925 (1998). [Pg.414]

Usually one cannot expect a unique solution for cluster analysis. The result depends on the used distance measure, the cluster algorithm, and the chosen parameters often... [Pg.267]

All these distance measures allow a judgment of the similarity between the objects, and consequently the complete information between all n objects is contained in one-half of the n x n distance matrix. Thus, in case of a large number of objects, clustering algorithms that take the distance matrix into account are computationally not attractive, and one has to resort to other algorithms (see Section 6.3). [Pg.268]

Most of the standard clustering algorithms can be directly used for clustering the variables. In this case, the distance between the variables rather than between the objects has to be measured. A popular choice is the Pearson correlation distance, defined for two variables xj and xk as... [Pg.268]

Four pairs of structures with identical descriptors merge at a distance of zero. From the chemist s point of view clustering appears more satisfying than the linear projection method PCA (with only 47.6% of the total variance preserved by the first two PCA scores). A number of different clustering algorithms have been applied to the 20 standard amino acids by Willet (1987). [Pg.273]

The standard hierarchical clustering algorithms produce a whole set of cluster solutions, namely a partitioning of the objects into k 1, n clusters. The partitions are ordered hierarchically, and there are two possible procedures ... [Pg.277]

High-Throughput Screen Clustering Algorithm (HTSCA)... [Pg.157]

See also in sourсe #XX -- [ Pg.371 ]

See also in sourсe #XX -- [ Pg.6 ]

See also in sourсe #XX -- [ Pg.587 ]

See also in sourсe #XX -- [ Pg.104 , Pg.105 , Pg.106 , Pg.107 , Pg.108 , Pg.109 , Pg.110 , Pg.111 , Pg.112 , Pg.113 , Pg.114 ]

See also in sourсe #XX -- [ Pg.334 ]

See also in sourсe #XX -- [ Pg.150 ]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...