Clustering similarity measures

Jarvis R A and E A Patrick 1973. Clustering Using a Similarity Measure Based on Shared Near Neighbours. IEEE Transactions in Computers C-22 1025-1034. [Pg.523]

Clustering is the process of dividing a collection of objects into groups (or clusters) so that the objects within a cluster are highly similar whereas objects in different clusters are dissimilar [41]. When applied to databases of compounds, clustering methods require the calculation of all the pairwise similarities of the compounds with similarity measures such as those described previously, for example, 2D fingerprints and the Tanimoto coefficient. [Pg.200]

Jarvis RA, Patrick EA. Clustering using a similarity measure based on shared near neighbours. IEEE Trans Comput 1973 C-22 1025-34. [Pg.206]

Pan, D., Iyer, M., Liu, J., Li, Y., Hopfinger, A. J., Constructing optimum blood brain barrier QSAR models using a combination of 4D-molecular similarity measures and cluster analysis descriptors. J. Chem. Inf. Model. 2004, 44, 2083-2098. [Pg.125]

The result of the clustering procedure depends on which procedure is applied and on the similarity measures used. Each gives a different view of the complex reality in the data set. It is therefore highly recommended that a clustering method is combined with a PC A or PLS display (see Chapters 17, 31 and 35) and, if possible, that several clustering methods and several types of similarity are used. [Pg.84]

ART2 forms clusters from training patterns by first computing a measure of similarity (directional rather than distance) of each pattern vector to a cluster prototype vector, and then comparing this measure to an arbitrarily specified proximity criterion called the vigilance. If the pattern s similarity measure exceeds the vigilance, the cluster prototype or center is updated to incorporate the effect of the pattern, as shown in Fig. 25 for pattern 3. If the pattern fails the similarity test, competition resumes without the node... [Pg.63]

Further on, the measure RMS distance that is to be optimized is a valuable point of information in itself. It is used, for example, to compare predictions with crystal structures and invaluable for clustering similar placements. However, caution must be taken to avoid problems with symmetry in the molecules. Again, the problem of correspondence must be treated carefuUy, since, for example, a rotation of 180° of a phenyl ring should not affect the result of such a quality assessment. [Pg.72]

On the basis of these clustering results, the EPA library of FTIR spectra was Judged adequate as a source of spectra to form the data base for the mixture analysis problem and the dot product was deemed an adequate similarity measure. Every chemical class considered to be a candidate for Inclusion was subjected to the clustering algorithm. Only those classes exhibiting a high degree of Internal similarity were retained In the mixture analysis data base. [Pg.167]

Two recent papers further highlight the difficulty of choosing which similarity searching method to use (29,30). Given the vast array of different molecular descriptors available and the vast number of different similarity measures, this field has been the subject of many publications espousing different methods. As similarity searching and clustering are the easiest, yet very useful,... [Pg.90]

Clustering is a branch of exploratory analysis able to provide answers about the presence of groupings among objects or variables, by means of a similarity measurement (Vandeginste et al., 1998). The similarity among two objects is defined as an inverse fimction of their distance the more two objects are distant, the less they are similar. Several metrics may be used to evaluate the distance D between two objects i and j in a n-dimensional space. The most common are... [Pg.82]

Raymond, J.W., Blankley, C.J., Willett, P. Comparison of chemical clustering methods using graph- and fingerprint-based similarity measures. /. Moi. Graph. Model. 2003, 21, 421-433. [Pg.222]

After selecting a measure one has to decide which clustering algorithm (strategy) may be appropriate. Sometimes it is necessary for the algorithm to fit the similarity measure. In most cases one wishes to use the algorithm which yields the most interpretable or plausible data structure. [Pg.156]

As mentioned, hierarchical cluster analysis usually offers a series of possible cluster solutions which differ in the number of clusters. A measure of the total within-groups variance can then be utilized to decide the probable number of clusters. The procedure is very similar to that described in Section 5.4 under the name scree plot. If one plots the variance sum for each cluster solution against the number of clusters in the respective solution a decay pattern (curve) will result, hopefully tailing in a plateau level this indicates that further increasing the number of clusters in a solution will have no effect. [Pg.157]

The next step is to link the objects. The most common approach is called agglomerative clustering whereby single objects are gradually connected to each other in groups. Any similarity measure can be used in the first step, but for simplicity we will illustrate this using only the correlation coefficients of Table 4.17. Similar considerations apply to all the similarity measures introduced in Section 4.4.1, except that in the other cases the lower the distance the more similar the objects. [Pg.227]

The new data matrix using nearest neighbour clustering is presented in Table 4.20, with the new values shaded. Remember that there are many similarity measures and methods for linking, so this table represents only one possible way of handling the information. [Pg.228]

Table 4.20 Nearest neighbour cluster analysis, using correlation coef-ficients for similarity measures, and data in Table 4.16. ...

Previously we discussed the use of different similarity measures in cluster analysis (Section 4.4.1), including various approaches for determining the distance between... [Pg.236]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...