Clustering Criteria

Feature mapping (i.e., numeric-symbolic mapping) requires decision mechanisms that can distinguish between possible label classes. As shown in Fig. 5, widely used decision mechanisms include linear discriminant surfaces, local data cluster criteria, and simple decision limits. Depending on the nature of the features and the feature extraction approaches, one or more of these decision mechanisms can be selected to assign labels. [Pg.6]

Perceptions, multilayer perceptions and radial basis function networks require supervised training with data for which the answers are known. Some applications require the automatic clustering of data, data for which the clusters and clustering criteria are not known. One of the best known architectures for such problems is the Kohonen selforganizing map (SOM), named after its inventor, Teuvo Kohonen (Kohonen, 1997). In this section the rationale behind such networks is described. [Pg.46]

In the next section of this chapter we formalize the clustering optimization problem. This formalization allows us to apply simulated annealing as a global optimization technique, which we describe in Section 3. Section 4 provides examples of the use of simulated annealing clustering algorithms and the importance of internal clustering criteria to these techniques. Section 5 contains our conclusions. [Pg.135]

Squared-error is one of the most common of all clustering criteria. Provided that the clusters are fairly spherical and are of approximately the same size, squared-error performs extremely well. Thus, in the absence of any prior information, squared-error is often a suitable choice for exploring a new data set. This explains why squared-error is fundamental to many partitioning algorithms like ISODATA (Ball and Hall, 1965) and K-MEANS (see Hartigan, 1975). It provides a compact, accurate measure of clustering... [Pg.139]

We emphasize that the choice of the internal criterion for use in a partitional clustering problem is critical to the interpretability and usefulness of the resulting output. However, for computational reasons, it is desired that the criterion be simple. One of the simplest of the internal clustering criteria is total within-cluster distance ... [Pg.148]

Optimizing Techniques. Clusters are formed by optimization of a clustering criterion. The resulting classes are mutually exclusive, the objects are partitioned clearly into sets. [Pg.949]

The internal clustering criterion allows us to formulate clustering as an optimization problem. Unfortunately, this optimization problem falls into the category of NP-hard making it intractable for all but the smallest problem instances. Hence, a number of heuristic approaches have been advocated and in many cases these approaches do not explicitly specify the internal criterion being optimized. [Pg.135]

Clustering problems can have numerous formulations depending on the choices for data structure, similarity/distance measure, and internal clustering criterion. This section first describes a very general formulation, then it details special cases that corresponds to two popular classes of clustering algorithms partitional and hierarchical. [Pg.135]

Equations (1) and (2) represent the most general form of the optimal clustering problem. The objective is to find the clustering c that minimizes an internal clustering criterion J. J typically employs a similarity/dissimilarity measure to judge the quality of any c. The set C defines c s data structure, including all the feasible clusterings of the set Q of all objects to be clustered. [Pg.136]

Agglomerative methods, such as single link and complete link, are stepwise procedures. The formulation in (5)-(7) allows us to define the hierarchical clustering problem in terms of combinatorial optimization. To do this, however, we need an appropriate internal clustering criterion. The most obvious is squared error. [Pg.139]

These results show clearly the importance of the optimization criterion to clustering. The computationally simple Ward s method performs better than the simulated annealing approach with a simplistic criterion. However, a criterion that more correctly accounts for the hierarchy, by minimizing the sum of squared error at each level, performs much better. As with partitional clustering the application of simulated annealing to hierarchical clustering requires careful selection of the internal clustering criterion. [Pg.151]

Fig. 3. Illustration of the "two-shell" cluster criterion. In this illustration particles 1 and 2 make up a cluster. The trajectory is terminated when one particle achieves "infinite" separation (particle 3). In calculating the breakup time, the period required to travel between the "critical" and infinite separations (3 ->3) is removed. Reproduced with permission of copyright holder.

Various partitions, resulted from the different combinations of clustering parameters. The estimation of the number of classes and the selection of optimum clustering is based on separability criteria such as the one defined by the ratio of the minimum between clusters distance to the maximum of the average within-class distances. In that case the higher the criterion value the more separable the clustering. By plotting the criterion value vs. the number of classes and/or the algorithm parameters, the partitions which maximise the criterion value is identified and the number of classes is estimated. [Pg.40]

As oversimplified cases of the criterion to be used for the clustering of datasets, we may consider some high-quality Kohonen maps, or PCA plots, or hierarchical clustering. [Pg.208]

In particular it can be shown that the dynamic flocculation model of stress softening and hysteresis fulfils a plausibility criterion, important, e.g., for finite element (FE) apphcations. Accordingly, any deformation mode can be predicted based solely on uniaxial stress-strain measurements, which can be carried out relatively easily. From the simulations of stress-strain cycles at medium and large strain it can be concluded that the model of cluster breakdown and reaggregation for prestrained samples represents a fundamental micromechanical basis for the description of nonlinear viscoelasticity of filler-reinforced rubbers. Thereby, the mechanisms of energy storage and dissipation are traced back to the elastic response of tender but fragile filler clusters [24]. [Pg.621]

The metal size clearly increases when the decomposition takes place on the substrate. Nevertheless, the overall shift after complete decomposition is the same both on crystalline and amorphous substrates. This can be explained by the assumption that the increase of the number of the metal atoms in the cluster takes place also on an amorphous substrate, on a scale high enough to shift the core levels but low enough to maintain a constant emitted intensity ratio between the substrate and the metal core levels. The authors concluded therefore that the core-level position is highly size-sensitive in the range of very small particles, e.g. < 100 atoms where the associated electronic properties are primarily atomic. However, on approaching the metallic state for >100 atoms, the corelevel shift is a much poorer criterion of the cluster size. [Pg.81]

Both methods described above belong to a class of methods that is also called partitioning or optimization or partitioning-optimization techniques. They partition the set of objects into subsets according to some optimization criterion. Both methods use representative elements, in one case an object of the set to be clustered (the centrotype), in the other an object with real values for the variables that is not necessarily (and usually not) part of the objects to be clustered (the centroid). [Pg.78]

I. Bondarenko, H. Van Malderen, B. Treiger, P. Van Espen and R. Van Grieken, Hierarchical cluster analysis with stopping rules built on Akaike s information criterion for aerosol particle classification based on electron probe X-ray microanalysis. Chemom. Intell. Lab. Syst., 22 (1994) 87-95. [Pg.85]

Local methods, on the other hand, are characterized by input transformations that are approached using partition methods for cluster seeking. The overall thrust is to analyze input data and identify clusters of the data that have characteristics that are similar based on some criterion. The objective is to develop a description of these clusters so that plant behaviors can be compared and/or data can be interpreted. [Pg.28]

The most commonly used family of methods for cluster seeking uses optimization of a squared-error performance criterion in the form... [Pg.28]

The selection of cluster number, which is generally not known beforehand, represents the primary performance criterion. Optimization of performance therefore requires trial-and-error adjustment of the number of clusters. Once the cluster number is established, the neural network structure is used as a way to determine the linear discriminant for interpretation. In effect, the RBFN makes use of known transformed features space defined in terms of prototypes of similar patterns as a result of applying /c-means clustering. [Pg.62]

ART2 forms clusters from training patterns by first computing a measure of similarity (directional rather than distance) of each pattern vector to a cluster prototype vector, and then comparing this measure to an arbitrarily specified proximity criterion called the vigilance. If the pattern s similarity measure exceeds the vigilance, the cluster prototype or center is updated to incorporate the effect of the pattern, as shown in Fig. 25 for pattern 3. If the pattern fails the similarity test, competition resumes without the node... [Pg.63]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...