Classical Multidimensional Visualization Techniques

Example 7.1 A biomedical researcher has a multidimensional dataset where he or she measured expression levels of 878 genes in 33 muscular dystrophy samples of three different types. He or she wants to visualize this multidimensional dataset to check the distribution patterns of the muscular dystrophy samples. If he or she is especially interested in patterns according to the variance in the dataset, what is an appropriate way to visualize the dataset [Pg.158]

Most statistical packages and tools support principal-component analysis (PCA). Principal-component analysis itself is not a visualization method, but the [Pg.158]

We first have to load the package (LabDS V) that has a PCA implementation. It should be noted that there are many other PCA implementations in R. Next, we load the dataset into R and then transpose the data matrix so that each muscular dystrophy sample takes each row in the matrix. After running a PCA on the transposed matrix, we can plot the result using the first two eigenvectors (having the biggest eigenvalues) as X- and y-axes (Fig. 7.1). [Pg.159]

Example 7.2 The same researcher in Example 7.1 wants to visually examine the same multidimensional dataset, but now he or she is interested in a low-dimensional projection where the similarity/distance information in the original multidimensional dataset is preserved as much as possible. What is an appropriate way to visualize the dataset [Pg.160]

Most statistical packages and tools also support multidimensional scaling (MDS). Multidimensional scaling has been used quite popularly to visualize multidimensional datasets (Fig. 7.2). We can use MDS to generate a 2D projection where distances/ similarities among data items in the original multidimensional space arc preserved as much as possible. In other words, MDS optimizes the following objective function Fmds = where d i,j) is the distance between i and j in the [Pg.160]

Multidimensional scaling [70] is a method for obtaining the best low-dimensional representation of a high-dimensional data set. Normally, a two- or three-dimensional representation is required, since it can then be plotted and inspected visually for clusters. In the classical scaling technique, the low-dimensional representation is obtained by extracting the eigenvectors of the (Af xAf) dissimilarity matrix. However, it can be shown that this operation is equivalent to a PC A of the (ApXAp) covariance matrix, provided that the distances in the dissimilarity matrix are Euclidian or near-Euclidian [53]. Since the covariance matrix is invariably of smaller order than the dissimilarity matrix, PCA is to be preferred on computational grounds. The only exception is if the dissimilarity matrix is available but the covariance matrix is not, a circumstance that rarely arises in structural chemistry work. [Pg.149]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...