Cluster analysis techniques

Apart from the use of vocabulary control, there are many systems which aid retrieval by classification. (Indeed, vocabulary control is only one form of classification). It has already been mentioned that the secondary services commonly classify and rearrange the documents they cover. These classifications are frequently computerised and so can be used for computer retrieval. For example, the Universal Decimal Classification 44) is in use as a basis of computerised systems (e. g. 45)j. Much work is also going on in the development of automatic classifications, based on cluster analysis techniques 46,47). In these cases the classification may be applied by assigning the appropriate codes to file items stored in some entirely different order (e. g. chronologically or alphabetically by source journal). Alternatively, the classification scheme may be used as a basis for organising the file, so that all file items falling into a particular class occur together. This is most commonly achieved by the use of the inverted file approach discussed in Section II. [Pg.81]

Cluster analysis techniques are usually limited to rather small data sets with 20 to 100 patterns. [Pg.96]

Abstract. A smooth empirical potential is constructed for use in off-lattice protein folding studies. Our potential is a function of the amino acid labels and of the distances between the Ca atoms of a protein. The potential is a sum of smooth surface potential terms that model solvent interactions and of pair potentials that are functions of a distance, with a smooth cutoff at 12 Angstrom. Techniques include the use of a fully automatic and reliable estimator for smooth densities, of cluster analysis to group together amino acid pairs with similar distance distributions, and of quadratic progrmnming to find appropriate weights with which the various terms enter the total potential. For nine small test proteins, the new potential has local minima within 1.3-4.7A of the PDB geometry, with one exception that has an error of S.SA. [Pg.212]

Other methods consist of algorithms based on multivariate classification techniques or neural networks they are constructed for automatic recognition of structural properties from spectral data, or for simulation of spectra from structural properties [83]. Multivariate data analysis for spectrum interpretation is based on the characterization of spectra by a set of spectral features. A spectrum can be considered as a point in a multidimensional space with the coordinates defined by spectral features. Exploratory data analysis and cluster analysis are used to investigate the multidimensional space and to evaluate rules to distinguish structure classes. [Pg.534]

Multiple linear regression is strictly a parametric supervised learning technique. A parametric technique is one which assumes that the variables conform to some distribution (often the Gaussian distribution) the properties of the distribution are assumed in the underlying statistical method. A non-parametric technique does not rely upon the assumption of any particular distribution. A supervised learning method is one which uses information about the dependent variable to derive the model. An unsupervised learning method does not. Thus cluster analysis, principal components analysis and factor analysis are all examples of unsupervised learning techniques. [Pg.719]

Analytical results are often represented in a data table, e.g., a table of the fatty acid compositions of a set of olive oils. Such a table is called a two-way multivariate data table. Because some olive oils may originate from the same region and others from a different one, the complete table has to be studied as a whole instead as a collection of individual samples, i.e., the results of each sample are interpreted in the context of the results obtained for the other samples. For example, one may ask for natural groupings of the samples in clusters with a common property, namely a similar fatty acid composition. This is the objective of cluster analysis (Chapter 30), which is one of the techniques of unsupervised pattern recognition. The results of the clustering do not depend on the way the results have been arranged in the table, i.e., the order of the objects (rows) or the order of the fatty acids (columns). In fact, the order of the variables or objects has no particular meaning. [Pg.1]

Multivariate chemometric techniques have subsequently broadened the arsenal of tools that can be applied in QSAR. These include, among others. Multivariate ANOVA [9], Simplex optimization (Section 26.2.2), cluster analysis (Chapter 30) and various factor analytic methods such as principal components analysis (Chapter 31), discriminant analysis (Section 33.2.2) and canonical correlation analysis (Section 35.3). An advantage of multivariate methods is that they can be applied in... [Pg.384]

In this chapter we have only addressed a selected number of topics and for lack of space we have left out many others. Cluster analysis has played a larger role in QSAR than appears from our overview. This technique is an established QSAR tool in recognition or classification of known patterns [38,60] as well as for cognition or detection of novel patterns [61]. [Pg.416]

In the MD/QM technique each tool is used separately, in an attempt to exploit their particular strengths. Classical molecular dynamics as a very fast sampling technique is first used for efficient sampling of the conformational space for the molecule of interest. A cluster analysis of the MD trajectory is then used to identify the main con-formers (clusters). Finally QM calculations, which provide a more accurate (albeit more computationally expensive) representation of the system, can be applied to just a small number of snapshots carefully extracted from each representative cluster from the MD-generated trajectory. [Pg.4]

Intact bacteria were first introduced into a mass spectrometer for analysis of molecular biomarkers without processing and fractionation around 1975.6 The ionization techniques available at the time limited analysis to secondary metabolites that could be volatilized, such as quinines and diglycerides, and vigorous pyrolysis of bacteria was explored as an alternative.7 Although biomarkers were destroyed in pyrolysis strategies, computer-supported cluster analysis was developed to characterize pure samples. [Pg.257]

One of the most used techniques of non-hierarchical cluster analysis is the density method (potential method). The high density of objects in the m-dimension that characterizes clusters is estimated by means of a density function (potential function) P. For this, the objects are modelled by Gaus-... [Pg.259]

Use of multivariate approaches based on classification modelling based on cluster analysis, factor analysis and the SIMCA technique [98,99], and the Kohonen artificial neural network [100]. All these methods, though rarely implemented, lead to very good results not achievable with classical strategies (comparisons, amino acid ratios, flow charts) and, moreover it is possible to know the confidence level of the classification carried out. [Pg.251]

Cluster analysis will be discussed in Chapter 6 in detail. Here we introduce cluster analysis as an alternative nonlinear mapping technique for exploratory data analysis. The method allows gaining more insight into the relations between the objects if a... [Pg.96]

Another very useful exploration technique is cluster analyis, which quantifies similarities by calculating mathematic distances. The typical graphic output is a dendrogram. A common method of cluster analysis is Hierarchical cluster analysis (HCA). [Pg.62]

In most applications chemometric methods are applied to analytical data in an off-line mode that is, data has already been obtained by conventional techniques and is then applied to a particular chemometric method. Examples of this use are in cluster analysis and in pattern recognition. They are applied to spectroscopic, chromatographic, and other analytical data. [Pg.101]

Barnard, J.M., Downs, G.M., ScHOLLEY-PFAB, A.V., and Brown, R. D. Use of Markush structure analysis techniques for descriptor generation and clustering of large combinatorial libraries./. Mol. Graph. 2000, 38, 452-463. [Pg.112]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...