Statistical methods clustering

Multiple linear regression is strictly a parametric supervised learning technique. A parametric technique is one which assumes that the variables conform to some distribution (often the Gaussian distribution) the properties of the distribution are assumed in the underlying statistical method. A non-parametric technique does not rely upon the assumption of any particular distribution. A supervised learning method is one which uses information about the dependent variable to derive the model. An unsupervised learning method does not. Thus cluster analysis, principal components analysis and factor analysis are all examples of unsupervised learning techniques. [Pg.719]

In the following sections we propose typical methods of unsupervised learning and pattern recognition, the aim of which is to detect patterns in chemical, physicochemical and biological data, rather than to make predictions of biological activity. These inductive methods are useful in generating hypotheses and models which are to be verified (or falsified) by statistical inference. Cluster analysis has... [Pg.397]

Even when the patterns are known to cluster, there remain difficult issues that must be addressed before a kernel-based approach can be used effectively. Two of the more fundamental conceptual issues are the number and size of clusters that should be used to characterize the pattern classes. These are issues for which there are no hard and fast answers. Despite the application of well-developed statistical methods, including squared-error indices and variance analysis, determining the number and size of clusters remains extremely formidable. [Pg.60]

Percolation theory describes [32] the random growth of molecular clusters on a d-dimensional lattice. It was suggested to possibly give a better description of gelation than the classical statistical methods (which in fact are equivalent to percolation on a Bethe lattice or Caley tree, Fig. 7a) since the mean-field assumptions (unlimited mobility and accessibility of all groups) are avoided [16,33]. In contrast, immobility of all clusters is implied, which is unrealistic because of the translational diffusion of small clusters. An important fundamental feature of percolation is the existence of a critical value pc of p (bond formation probability in random bond percolation) beyond which the probability of finding a percolating cluster, i.e. a cluster which spans the whole sample, is non-zero. [Pg.181]

Hierarchical Cluster Analysis (HCA) is a multivariate statistical method that can be used assign groundwater samples or monitoring sites to distinct categories (hydrochemical facies). HCA offers several advantages over other methods of... [Pg.75]

Computational methods have been applied to determine the connections in systems that are not well-defined by canonical pathways. This is either done by semi-automated and/or curated literature causal modeling [1] or by statistical methods based on large-scale data from expression or proteomic studies (a mostly theoretical approach is given by reference [2] and a more applied approach is in reference [3]). Many methods, including clustering, Bayesian analysis and principal component analysis have been used to find relationships and "fingerprints" in gene expression data [4]. [Pg.394]

The application of various mathematical statistical methods such as the information content derived from Shanon s equation, calculation of discriminative power and formation of cluster dendograms indicated that the best separation can be achieved by mobile phases 2 and 10 [116]. [Pg.138]

At the level of individual hits, the database can be queried to retrieve either marketed BioPrint drugs that have that same activity, or the ADR associations discussed in the previous section can be queried to identify potential ADRs and their relative risks. At the profile level, compounds with similar profiles can be identified using standard statistical methods such as similarity metrics and hierarchical clustering. This similarity can be assessed using the whole panel of assays or by using selected subsets of those assays as determined by the user. Once compounds with similar profiles have been identified, in vivo data for the similar compoimds can be accessed and examined for information that may permit the user to anticipate in vivo effects. [Pg.198]

Median partitioning is another statistical method distinct from RR The development of this methodology was driven by the need to select representative subsets from very large compound pools. Hierarchical clustering techniques... [Pg.292]

Hydrogen transport. The approach that they have used to predict proton transport through complex membranes such as Nation is to use ab initio methods to determine the barriers for migration of hydrogen as a function of the donor-acceptor separation and then to employ a statistical method that is based on the ab initio results. This method allows a proton jump among water clusters when the configuration around the proton is appropriate. [Pg.338]

Graphical methods in connection with pattern recognition algorithms, i.e. geometrical or statistical methods, e.g. minimum spanning tree or cluster analysis, are more powerful methods for explorative data analysis than graphical methods alone. [Pg.152]

One has to keep in mind that groups of objects found by any clustering procedure are not statistical samples from a certain distribution of data. Nevertheless the groups or clusters are sometimes analyzed for their distinctness using statistical methods, e.g. by multivariate analysis of variance and discriminant analysis, see Section 5.6. As a result one could then discuss only those clusters which are statistically different from others. [Pg.157]

In soil science, the empirical description of soil horizons predominates. Only a few applications of statistical methods in this scientific field are described. SCHEFFER and SCHACHTSCHABEL [1992] give an example for the classification of different soils into soil groups using cluster analysis. They claim the objectivity of the results to be one advantage of multivariate statistical methods. [Pg.336]

A wide variety of dynamical approximations have been applied to cluster dynamics and kinetics. Most calculations to date are based on simplified potentials and classical mechanics or statistical methods. In the near future, we can expect to see more work with detailed potential energy surfaces (both analytic and implicitly defined by electronic structure calculations) and progress in sorting out quantum effects and treating them more accurately. [Pg.33]

Principal component analysis is a popular statistical method that tries to explain the covariance structure of data by means of a small number of components. These components are linear combinations of the original variables, and often allow for an interpretation and a better understanding of the different sources of variation. Because PCA is concerned with data reduction, it is widely used for the analysis of high-dimensional data, which are frequently encountered in chemometrics. PCA is then often the first step of the data analysis, followed by classification, cluster analysis, or other multivariate techniques [44], It is thus important to find those principal components that contain most of the information. [Pg.185]

There are many other statistical models which can be used for the evaluation of DICE studies. Inclusion of not only a group factor, but also a time factor in the experiment methods of the analysis of variance (ANOVA) can be applied to find expression changes within the temporal course of the protein expression or to find interactions between the group and time factor. Several multivariate statistical methods are of use, too. Spots with similar expression profiles can be grouped by cluster analysis or, on the other hand, new spots can be assigned to existing groups by the methods of discriminant analysis. [Pg.53]

Linear regression analysis has pitfalls. There is always the possibility of chance correlations. Hence, we opted to analyze the data using an alternate statistical method, namely cluster analysis. The data were scaled so that each of the descriptors ranged in value between 0 and 1. Minimal tree spanning methods was employed in the determination of clusters (24). [Pg.558]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...