A-Means Algorithm

One of the most popular and widely used clustering techniques is the application of the A -Means algorithm. It is available with all popular cluster analysis software packages and can be applied to relatively large sets of data. The principal objective of the method is to partition the m objects, each characterized by n variables, into K clusters so that the square of the within-cluster sum of distances is minimized. Being an optimization-based technique, the number of possible solutions cannot be predicted and the best possible partitioning of the objects may not be achieved. In practice, the method finds a local optimum, defined as being a classification in which no movement of an observation from one cluster to another will reduce the within-cluster sum of squares. [Pg.115]

Many versions of the algorithm exist, but in most cases the user is expected to supply the number of clusters, K, expected. The algorithm described here is that proposed by Hartigan. [Pg.116]

X defines the data matrix with elements x, y(l i m, I ), where m is the number of objects and n is the number of variables used to characterize the objects. The cluster analysis seeks to find K partitions or clusters, with each object residing in only one of the clusters. [Pg.116]

The mean value for each variable j, for all objects in cluster L is denoted by Bij K). The number of objects residing in cluster L is R]. [Pg.116]

The distance, Di between the fth object and the centre or average of each cluster is given by the Euclidean metric, [Pg.116]

This index is employed by both the k-means (MacQueen, 1967) and the isodata algorithms (Ball and Hall, 1965), which partition a set of data into k clusters. With the A -means algorithm, the number of clusters are prespecified, while the isodata algorithm uses various heuristics to identify an unconstrained number of clusters. [Pg.29]

The traditional hierarchical and nonhierarchical (e.g., A -means) clustering algorithms [69] have a number of drawbacks that require caution in their implementation for time series data. The hierarchical clustering algorithms assume an imphcit parent-child relationship between the members of a cluster which may not be relevant for time series data. However, they can provide good initial estimates of patterns that may exist in the data set. The A -means algorithm requires the estimate of the number of clusters (i.e., k) and its solution depends on the initial assignments as the optimization... [Pg.208]

Clearly, the extent of exotherm-generated temperature overshoot predicted by the Chiao and finite element models differs substantially. The finite element results were not markedly changed by refining the mesh size or the time increments, so the difference appears to be inherent in the numerical algorithms used. Such comparison is useful in further development of the codes, as it provides a means of pinpointing those model parameters or algorithms which underlie the numerical predictions. These points will be explored more fully in future work. [Pg.280]

In the computer code, a sorting algorithm can be used to put the mean mass fractions (Ya) in descending order before defining Xp. By keeping track of the order of the indices, one can easily define the inverse transformation needed to compute Ya from Xp. [Pg.271]

In principle, MC algorithms can be tuned for particular systems and can thus be more efficient than MD for obtaining equilibrium distributions. An interesting idea is to use MC simulations to obtain accurate initial guesses for subsequent MD simulations. Already as early as 1993, Venable and co-workers [68] used a scheme for efficiently sampling configurations of individual lipids in a mean field. These configurations were then used to develop the initial conditions for the molecular dynamic simulations. [Pg.48]

In 1 a function has been provided for an easy application of the NIPALS algorithm for the calculation of, for instance, two PCs of a mean-centered matrix X, the R-code is as follows ... [Pg.89]

The most widely known algorithm for partitioning is the k means algorithm (Hartigan 1975). It uses pairwise distances between the objects, and requires the input of the desired number k of clusters. Internally, the k-means algorithm uses the so-called centroids (means) representing the center of each cluster. For example, a centroid c, of a cluster j = 1,..., k can be defined as the arithmetic mean vector of all objects of the corresponding cluster, i.e.,... [Pg.274]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...