High-dimensional data

Previous work in our group had shown the power of self-organizing neural networks for the projection of high-dimensional datasets into two dimensions while preserving clusters present in the high-dimensional space even after projection [27]. In effect, 2D maps of the high-dimensional data are obtained that can show clusters of similar objects. [Pg.193]

One technique for high dimensional data is to reduce the number of dimensions being plotted. For example, one slice of a three-dimensional data set can be plotted with a two-dimensional technique. Another example is plotting the magnitude of vectors rather than the vectors themselves. [Pg.118]

As stated earlier, the main motivation for using either PCA or PCA is to construct a low-dimensional representation of the original high-dimensional data. The notion behind this approach is that the effective (or essential, as some call it [33]) dimensionality of a molecular conformational space is significantly smaller than its full dimensionality (3N-6 degrees of freedom for an A-atom molecule). Following the PCA procedure, each new... [Pg.87]

PPR is a linear projection-based method with nonlinear basis functions and can be described with the same three-layer network representation as a BPN (see Fig. 16). Originally proposed by Friedman and Stuetzle (1981), it is a nonlinear multivariate statistical technique suitable for analyzing high-dimensional data, Again, the general input-output relationship is again given by Eq. (22). In PPR, the basis functions 9m can adapt their shape to provide the best fit to the available data. [Pg.39]

Allison PD (2002) Missing data. Sage, Thousand Oaks, CA Andrews DF (1972) Plots of high dimensional data. Biometrics 28 125... [Pg.282]

Buja A, Cook D, Swayne DF (1996) Interactive high-dimensional data visualization. J Computat Graph Stat 5 78... [Pg.282]

Projection Crushing of high-dimensional data into two dimensions. [Pg.90]

Alternative models. In complex high-dimensional data sets there will commonly be a number of very different-looking models all of which describe the data about equally well. Scanning the forest uncovers these alternative models. The alternative models can look different but be relatively trivial, based on correlated variables. Or they can point to multiple mechanisms. [Pg.326]

Discriminant analysis (DA) performs samples classification with an a priori hypothesis. This hypothesis is based on a previously determined TCA or other CA protocols. DA is also called "discriminant function analysis" and its natural extension is called MDA (multiple discriminant analysis), which sometimes is named "discriminant factor analysis" or CD A (canonical discriminant analysis). Among these type of analyses, linear discriminant analysis (LDA) has been largely used to enforce differences among samples classes. Another classification method is known as QDA (quadratic discriminant analysis) (Frank and Friedman, 1989) an extension of LDA and RDA (regularized discriminant analysis), which works better with various class distribution and in the case of high-dimensional data, being a compromise between LDA and QDA (Friedman, 1989). [Pg.94]

Principal component analysis is a popular statistical method that tries to explain the covariance structure of data by means of a small number of components. These components are linear combinations of the original variables, and often allow for an interpretation and a better understanding of the different sources of variation. Because PCA is concerned with data reduction, it is widely used for the analysis of high-dimensional data, which are frequently encountered in chemometrics. PCA is then often the first step of the data analysis, followed by classification, cluster analysis, or other multivariate techniques [44], It is thus important to find those principal components that contain most of the information. [Pg.185]

Maronna, R. and Zamar, R.H., Robust multivariate estimates for high dimensional data sets, Technometrics, 44, 307-317, 2002. [Pg.214]

The next step was to use a Kohonen two-dimensional self-organizing map to represent the spectral data in the 218-dimensional measurement space. Self-organizing maps are, for the most part, used to visualize high-dimensional data. However, classification and prediction of multivariate data can also be performed with these... [Pg.368]

International Conference on Management of Data, Seattle, WA, 1998, pp. 94—105. Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. [Pg.38]

Mining, MIT Press, Cambridge, MA, 2000. Efficient Clustering of High Dimensional Data Sets with Application to Reference Matching. [Pg.39]

A specialized method for similarity-based visualization of high-dimensional data is formed by self-organizing feature maps (SOM). The data items are arranged on a two-dimensional plane with the aid of neural networks, especially Kohonen nets. Similarity between data items is represented by spacial closeness, while large distances indicate major dissimilarities [968]. At the authors department, a system called MIDAS had already been developed which combines strategies for the creation of feature maps with the supervised generation of fuzzy-terms from the maps [967]. [Pg.680]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...