Analysis cluster

Cluster analysis is not a robust technique in a classic review, Cormack [73] wrote [Pg.150]

Different algorithms may produce radically different results when applied to the same dataset. For example, the single-linkage method has difficulty in recognizing two separate clusters linked by a chain of intervening observations (a situation par- [Pg.150]

Cluster analysis is a method for dividing a group of objects into classes so that similar objects are in the same class. As in PCA, the groups are not known prior to the mathematical analysis and no assumptions are made about the distribution of the variables. Cluster analysis searches for objects which are close together in the variable space. The distance, d, between two points in n-dimensional space with coordinates (Xj, X2, x ) and (yi, y2. y ) is usually taken as the Euclidean distance defined by [Pg.220]

For example the distance between the compounds E and F in Table 8.3 (if the unstandardized variables are used) is given by [Pg.220]

As in PCA, a decision has to be made as to whether or not the data are standardized. Standardizing the data will mean that all the variables are measured on a common scale so that one variable does not dominate the others. [Pg.220]

Apply the single linkage method to the (unstandardized) data in Table 8.1. [Pg.222]

The printout below was obtained using Minitab. With this software the linkages continue until there is only one cluster, unless the user specifies otherwise. [Pg.222]

Clustering analysis (CA) is a statistical technique particularly suited to grouping of data. It is gaining wide acceptance in many different fields of research such as dafa mining, marketing, operations research, and bioinformatics. CA is used when it is believed that the sample units come from an unknown population. Clustering is the classification of similar objects into [Pg.326]

There are two types of clusfering algorifhms (Khattree and Dayanand, 2000), namely [Pg.327]

The hierarchical clustering method uses the distances (or dissimilarities) between variables when forming the clusters. The distances that can be computed are based on a single dimension or multiple dimensions [Pg.160]

Once several objects have been linked together, the next step is to determine the distances between the new clusters. This new procedure is carried out by linkage or amalgamation rules that determine when two clusters are similar enough as to be linked together. There are various possibilities [Pg.160]

Grouping of samples by their similarity is another important sensing task. Most techniques discussed above will yield clusters of similar responses. The similarity is computed as distance d. between two samples x and xj, which in /7-dimensional space is given by the formula [Pg.327]

N represents the number of sensors in the array. For p = 2, the distance in (10.10) is Euclidian. The protocol is relatively simple. The distance matrix is created from the datapoints and scanned for the smallest values that are then arranged and displayed in the form of a dendrogram (Fig. 10.9 Suslick, 2004) in which the dissimilarity is plotted on the horizontal axis. In a dendrogram, each horizontal line segment represents the distance—that is, the similarity—between samples. Thus, if we want [Pg.327]

Chatfield and Collins (1980), in the introduction to their chapter on cluster analysis, quote the first sentence of a review article on cluster analysis by Cormack (1971) The availability of computer packages of classification techniques has led to the waste of more valuable scientific time than any other statistical innovation (with the possible exception of multiple-regression techniques). This is perhaps a little hard on cluster analysis and, for that matter, multiple regression but it serves as a note of warning. The aim of this book is to explain the basic principles of the more popular and useful multivariate methods so that readers will be able to understand the results obtained from the techniques and, if interested, apply the methods to their own data. This is not a substitute for a formal training in statistics the best way to avoid wasting one s own valuable scientific time is to seek professional help at an early stage. [Pg.103]

Dendrogram of water samples characterized by their conc ttrations of Ca, K, Na, and Si (from Scarminio etal. 1982, with kind permission). [Pg.105]

The dendrogram in Fig. 5.9 is derived from a data matrix of ED50 values for 40 neuroleptic compounds tested in 12 different assays in rats (Lewi 1976). This is an example of a situation in which die data involves multiple dependent variables (see Chapter 8), but here the multiple biological data is used to characterize the tested compounds. The figure [Pg.105]

The final example of a dendrogram to be shown here, Fig. 5.10, is also one of the largest. This figure shows one thousand conformations of an insecticidal pyrethroid analogue (see Fig. 5.5) described by the values of four torsion angles (Hudson et al. 1992). A dendrogram such as this was used for the selection of representative conformations from the one thousand conformations produced by molecular dynamics simulation. [Pg.106]

18 HALOPERIDOL 31 SPIROPERIDOL 37 TRIFLUPERAZINE 17 HALOPERIDIDE THIOPERAZINE 24 PERPHENAZINE TRIPERIDOL [Pg.107]

Fuzzy clustering methods that have recently become popular are distinct from traditional clustering techniques in that molecules are permitted to belong to multiple clusters or have fractional membership in all clusters. A potential advantage of such classification schemes is that more than one similarity relationship can be established by cluster analysis. [Pg.13]

Abstract. A smooth empirical potential is constructed for use in off-lattice protein folding studies. Our potential is a function of the amino acid labels and of the distances between the Ca atoms of a protein. The potential is a sum of smooth surface potential terms that model solvent interactions and of pair potentials that are functions of a distance, with a smooth cutoff at 12 Angstrom. Techniques include the use of a fully automatic and reliable estimator for smooth densities, of cluster analysis to group together amino acid pairs with similar distance distributions, and of quadratic progrmnming to find appropriate weights with which the various terms enter the total potential. For nine small test proteins, the new potential has local minima within 1.3-4.7A of the PDB geometry, with one exception that has an error of S.SA. [Pg.212]

Keywords, protein folding, tertiary structure, potential energy surface, global optimization, empirical potential, residue potential, surface potential, parameter estimation, density estimation, cluster analysis, quadratic programming... [Pg.212]

Other methods consist of algorithms based on multivariate classification techniques or neural networks they are constructed for automatic recognition of structural properties from spectral data, or for simulation of spectra from structural properties [83]. Multivariate data analysis for spectrum interpretation is based on the characterization of spectra by a set of spectral features. A spectrum can be considered as a point in a multidimensional space with the coordinates defined by spectral features. Exploratory data analysis and cluster analysis are used to investigate the multidimensional space and to evaluate rules to distinguish structure classes. [Pg.534]

There is no correct method of performing cluster analysis and a large number of algorithms have been devised from which one must choose the most appropriate approach. There can also be a wide variation in the efficiency of the various cluster algorithms, which may be an important consideration if the data set is large. [Pg.507]

A cluster analysis requires a measure of the similarity (or dissimilarity) between pairs of objects. When comparing conformations, the RMSD would be an obviou.s measure to use. [Pg.507]

The aim of cluster analysis is to group together similar objects. [Pg.508]

The dimensionality of a data set is the number of variables that are used to describe eac object. For example, a conformation of a cyclohexane ring might be described in terms c the six torsion angles in the ring. However, it is often found that there are significai correlations between these variables. Under such circumstances, a cluster analysis is ofte facilitated by reducing the dimensionality of a data set to eliminate these correlation Principal components analysis (PCA) is a commonly used method for reducing the dimensior ality of a data set. [Pg.513]

Aldenderfer M S and R K Blahfield 1984. Cluster Analysis. Newbury Park, CA. Sage New York, Garlanc Publishing. [Pg.521]

Chatfield C and A J CoHns 1980. Introduction to Multivariate Analysis. London, Chapman Hall. Desiraju G R 1997. Crystal Gazing Structure Prediction and Polymorphism. Sdence 278 404-405. Everitt B.S. 1993 Cluster Analysis. Chichester, John Wiley Sons. [Pg.521]

Ithough knowledge-based potentials are most popular, it is also possible to use other types potential function. Some of these are more firmly rooted in the fundamental physics of iteratomic interactions whereas others do not necessarily have any physical interpretation all but are able to discriminate the correct fold from decoy structures. These decoy ructures are generated so as to satisfy the basic principles of protein structure such as a ose-packed, hydrophobic core [Park and Levitt 1996]. The fold library is also clearly nportant in threading. For practical purposes the library should obviously not be too irge, but it should be as representative of the different protein folds as possible. To erive a fold database one would typically first use a relatively fast sequence comparison lethod in conjunction with cluster analysis to identify families of homologues, which are ssumed to have the same fold. A sequence identity threshold of about 30% is commonly... [Pg.562]

Selection of Diverse Sets Using Cluster Analysis... [Pg.698]

In dissimilarity-based compound selection the required subset of molecules is identified directly, using an appropriate measure of dissimilarity (often taken to be the complement of the similarity). This contrasts with the two-stage procedure in cluster analysis, where it is first necessary to group together the molecules and then decide which to select. Most methods for dissimilarity-based selection fall into one of two categories maximum dissimilarity algorithms and sphere exclusion algorithms [Snarey et al. 1997]. [Pg.699]

A major potential drawback with cluster analysis and dissimilarity-based methods f selecting diverse compounds is that there is no easy way to quantify how completel one has filled the available chemical space or to identify whether there are any hole This is a key advantage of the partition-based approaches (also known, as cell-bas( methods). A number of axes are defined, each corresponding to a descriptor or son combination of descriptors. Each axis is divided into a number of bins. If there are axes and each is divided into b bins then the number of cells in the multidimension space so created is ... [Pg.701]

Multiple linear regression is strictly a parametric supervised learning technique. A parametric technique is one which assumes that the variables conform to some distribution (often the Gaussian distribution) the properties of the distribution are assumed in the underlying statistical method. A non-parametric technique does not rely upon the assumption of any particular distribution. A supervised learning method is one which uses information about the dependent variable to derive the model. An unsupervised learning method does not. Thus cluster analysis, principal components analysis and factor analysis are all examples of unsupervised learning techniques. [Pg.719]

In a typical appHcation of hierarchical cluster analysis, measurements are made on the samples and used to calculate interpoint distances using an appropriate distance metric. The general distance, is given by... [Pg.422]

Fig. 10. R-mode cluster analysis of the Pacific Northwest rainwater study (24). Reprinted with permission.

D. L. Massart and L. Kaufman, The Interpretation of Analytical Chemical Data by the Use of Cluster Analysis,]ohxs Wiley Sons, Inc., New York, 1983. [Pg.431]

The PLS calibration set was built mixing in an agate mortar different amounts of Mancozeb standard with kaolin, a coadjuvant usually formulated in agrochemicals. Cluster analysis was employed for sample classification and to select the adequate PLS model acording with the characteristics of the sample matrix and the presence of other components. [Pg.93]

In order to evaluate possible classes among samples considered, a clustering analysis was carried out before PFS treatment for selecting properly a reduced but well representative calibration set. [Pg.142]

It has been shown that similar conformations that belong to adjacent energy basins separated by high energy barriers are incorrectly grouped together by the straightforward cluster analysis [29]. [Pg.86]

FI Spath. Cluster-Analysis Algorithms for Data Reduction and Classification of Objects. Chichester Ellis Florwood, 1980. [Pg.90]

D Shalon, SJ Smith, PO Brown. A DNA microairay system for analyzing complex DNA samples using two-color fluorescent probe hybridization. Genome Res 6 639-645, 1996. MB Eisen, PT Spellman, PO Brown, D Botstem. Cluster analysis and display of genomewide expression patterns. Proc Natl Acad Sci USA 95 14863-14868, 1998. [Pg.348]

P Willett, V Wmterman, D Bawden. Implementation of nonhierarchic cluster analysis methods m chemical information systems Selection of compounds for biological testing and clustering of substiaictures search output. I Chem Inf Comput Sci 26 109-118, 1986. [Pg.368]

See also in sourсe #XX -- [ Pg.313 ]

See also in sourсe #XX -- [ Pg.494 , Pg.534 ]

See also in sourсe #XX -- [ Pg.85 ]

See also in sourсe #XX -- [ Pg.218 , Pg.245 ]

See also in sourсe #XX -- [ Pg.141 ]

See also in sourсe #XX -- [ Pg.57 , Pg.156 , Pg.384 , Pg.397 , Pg.416 ]

See also in sourсe #XX -- [ Pg.229 , Pg.231 , Pg.234 , Pg.246 ]

See also in sourсe #XX -- [ Pg.339 , Pg.340 ]

See also in sourсe #XX -- [ Pg.34 ]

See also in sourсe #XX -- [ Pg.96 , Pg.97 , Pg.98 ]

See also in sourсe #XX -- [ Pg.948 ]

See also in sourсe #XX -- [ Pg.51 ]

See also in sourсe #XX -- [ Pg.217 , Pg.225 , Pg.405 , Pg.406 , Pg.407 , Pg.467 , Pg.479 ]

See also in sourсe #XX -- [ Pg.247 ]

See also in sourсe #XX -- [ Pg.34 ]

See also in sourсe #XX -- [ Pg.153 ]

See also in sourсe #XX -- [ Pg.373 , Pg.376 ]

See also in sourсe #XX -- [ Pg.4 , Pg.12 , Pg.67 , Pg.68 ]

See also in sourсe #XX -- [ Pg.119 ]

See also in sourсe #XX -- [ Pg.167 ]

See also in sourсe #XX -- [ Pg.160 , Pg.161 , Pg.162 ]

See also in sourсe #XX -- [ Pg.15 ]

See also in sourсe #XX -- [ Pg.183 ]

See also in sourсe #XX -- [ Pg.29 ]

See also in sourсe #XX -- [ Pg.178 ]

See also in sourсe #XX -- [ Pg.339 ]

See also in sourсe #XX -- [ Pg.322 , Pg.324 , Pg.325 , Pg.326 , Pg.327 , Pg.331 , Pg.332 , Pg.333 , Pg.334 , Pg.335 , Pg.336 , Pg.337 , Pg.338 , Pg.342 , Pg.343 , Pg.347 , Pg.348 , Pg.349 , Pg.350 , Pg.351 ]

See also in sourсe #XX -- [ Pg.706 , Pg.707 , Pg.708 , Pg.717 ]

See also in sourсe #XX -- [ Pg.133 , Pg.155 , Pg.453 ]

See also in sourсe #XX -- [ Pg.571 , Pg.578 ]

See also in sourсe #XX -- [ Pg.108 , Pg.192 , Pg.376 ]

See also in sourсe #XX -- [ Pg.229 , Pg.231 , Pg.234 , Pg.246 ]

See also in sourсe #XX -- [ Pg.126 , Pg.129 , Pg.473 , Pg.486 ]

See also in sourсe #XX -- [ Pg.99 ]

See also in sourсe #XX -- [ Pg.501 ]

See also in sourсe #XX -- [ Pg.81 , Pg.126 ]

See also in sourсe #XX -- [ Pg.48 ]

See also in sourсe #XX -- [ Pg.407 ]

See also in sourсe #XX -- [ Pg.26 ]

See also in sourсe #XX -- [ Pg.416 ]

See also in sourсe #XX -- [ Pg.289 , Pg.390 ]

See also in sourсe #XX -- [ Pg.457 ]

See also in sourсe #XX -- [ Pg.317 ]

See also in sourсe #XX -- [ Pg.673 , Pg.698 ]

See also in sourсe #XX -- [ Pg.99 ]

See also in sourсe #XX -- [ Pg.25 ]

See also in sourсe #XX -- [ Pg.203 , Pg.204 , Pg.290 ]

See also in sourсe #XX -- [ Pg.378 ]

See also in sourсe #XX -- [ Pg.494 , Pg.534 ]

See also in sourсe #XX -- [ Pg.357 , Pg.365 , Pg.366 ]

See also in sourсe #XX -- [ Pg.176 ]

See also in sourсe #XX -- [ Pg.203 ]

See also in sourсe #XX -- [ Pg.407 ]

See also in sourсe #XX -- [ Pg.16 ]

See also in sourсe #XX -- [ Pg.107 ]

See also in sourсe #XX -- [ Pg.532 ]

See also in sourсe #XX -- [ Pg.501 ]

See also in sourсe #XX -- [ Pg.2 , Pg.455 ]

See also in sourсe #XX -- [ Pg.295 ]

See also in sourсe #XX -- [ Pg.81 , Pg.125 ]

See also in sourсe #XX -- [ Pg.92 ]

See also in sourсe #XX -- [ Pg.721 ]

See also in sourсe #XX -- [ Pg.100 , Pg.408 , Pg.412 , Pg.419 ]

See also in sourсe #XX -- [ Pg.301 ]

See also in sourсe #XX -- [ Pg.67 ]

See also in sourсe #XX -- [ Pg.295 ]

See also in sourсe #XX -- [ Pg.48 ]

See also in sourсe #XX -- [ Pg.100 , Pg.408 , Pg.412 , Pg.419 ]

See also in sourсe #XX -- [ Pg.119 ]

See also in sourсe #XX -- [ Pg.238 ]

See also in sourсe #XX -- [ Pg.14 ]

See also in sourсe #XX -- [ Pg.136 , Pg.137 ]

See also in sourсe #XX -- [ Pg.145 , Pg.146 , Pg.147 , Pg.185 , Pg.234 , Pg.237 ]

See also in sourсe #XX -- [ Pg.25 ]

See also in sourсe #XX -- [ Pg.273 ]

See also in sourсe #XX -- [ Pg.703 ]

See also in sourсe #XX -- [ Pg.94 , Pg.96 , Pg.97 , Pg.144 , Pg.226 , Pg.300 ]

See also in sourсe #XX -- [ Pg.686 ]

See also in sourсe #XX -- [ Pg.25 , Pg.36 ]

See also in sourсe #XX -- [ Pg.84 ]

See also in sourсe #XX -- [ Pg.65 ]

See also in sourсe #XX -- [ Pg.146 , Pg.147 , Pg.149 , Pg.150 ]

See also in sourсe #XX -- [ Pg.90 ]

See also in sourсe #XX -- [ Pg.353 ]

See also in sourсe #XX -- [ Pg.53 ]

See also in sourсe #XX -- [ Pg.358 , Pg.801 ]

See also in sourсe #XX -- [ Pg.290 , Pg.328 ]

See also in sourсe #XX -- [ Pg.52 ]

See also in sourсe #XX -- [ Pg.79 ]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...