Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...

Articles Figures Tables About

Dimension reduction

Thus, we see that CCA forms a canonical analysis, namely a decomposition of each data set into a set of mutually orthogonal components. A similar type of decomposition is at the heart of many types of multivariate analysis, e.g. PCA and PLS. Under the assumption of multivariate normality for both populations the canonical correlations can be tested for significance [6]. Retaining only the significant canonical correlations may allow for a considerable dimension reduction. [Pg.320]

All other methods discussed in this chapter provide such a dimension reduction. They search for the most interesting directions in T-space and/or interesting directions in X-space that are linearly related. They differ in the optimizing criterion that is used to discover those interesting directions. [Pg.324]

Reduced rank regression (RRR), also known as redundancy analysis (or PCA on Instrumental Variables), is the combination of multivariate least squares regression and dimension reduction [7]. The idea is that more often than not the dependent K-variables will be correlated. A principal component analysis of Y might indicate that A (A m) PCs may explain Y adequately. Thus, a full set of m... [Pg.324]

PCA is useful not only from a dimension reduction standpoint, but also in that the scores themselves provide additional information important in data analysis (Kresta and MacGregor, 1991 Piovoso and Kosanovich, 1994). A simple example of PCA is illustrated in Fig. 11. [Pg.26]

Work on dimension reduction methods for both input and input-output modeling and for interpretation has produced considerable practical interest, development, and application, so that this family of nonlocal methods is becoming a mainstream set of technologies. This section focuses on dimension reduction as a family of interpretation methods by relating to the descriptions in the input and input-output sections and then showing how these methods are extended to interpretation. [Pg.47]

The most serious problem with input analysis methods such as PCA that are designed for dimension reduction is the fact that they focus only on pattern representation rather than on discrimination. Good generalization from a pattern recognition standpoint requires the ability to identify characteristics that both define and discriminate between pattern classes. Methods that do one or the other are insufficient. Consequently, methods such as PLS that simultaneously attempt to reduce the input and output dimensionality while finding the best input-output model may perform better than methods such as PCA that ignore the input-output relationship, or OLS that does not emphasize input dimensionality reduction. [Pg.52]

Principal component analysis (PCA) can be considered as the mother of all methods in multivariate data analysis. The aim of PCA is dimension reduction and PCA is the most frequently applied method for computing linear latent variables (components). PCA can be seen as a method to compute a new coordinate system formed by the latent variables, which is orthogonal, and where only the most informative dimensions are used. Latent variables from PCA optimally represent the distances between the objects in the high-dimensional variable space—remember, the distance of objects is considered as an inverse similarity of the objects. PCA considers all variables and accommodates the total data structure it is a method for exploratory data analysis (unsupervised learning) and can be applied to practical any A-matrix no y-data (properties) are considered and therefore not necessary. [Pg.73]

FIGURE 3.2 Matrix scheme for PCA. Since the aim of PCA is dimension reduction and the... [Pg.76]

The goal of dimension reduction can be best met with PCA if the data distribution is elliptically symmetric around the center. It will not work well as a dimension reduction tool for highly skewed data. Figure 3.9 (left) shows skewed autoscaled... [Pg.80]

As already noted in Section 3.4, outliers can be influential on PCA. They are able to artificially increase the variance in an otherwise uninformative direction which will be determined as PCA direction. Especially for the goal of dimension reduction this is an undesired feature, and it will mainly appear with classical estimation of the PCs. Robust estimation will determine the PCA directions in such a way that a robust measure of variance is maximized instead of the classical variance. Essential features of robust PCA can be summarized as follows ... [Pg.81]

In contrast to PCA which can be considered as a method for basis rotation, factor analysis is based on a statistical model with certain model assumptions. Like PCA, factor analysis also results in dimension reduction, but while the PCs are just derived by optimizing a statistical criterion (spread, variance), the factors are aimed at having a real meaning and an interpretation. Only a very brief introduction is given here a classical book about factor analysis in chemistry is from Malinowski (2002) many other books on factor analysis are available (Basilevsky 1994 Harman 1976 Johnson and Wichem 2002). [Pg.96]

If PCA is used for dimension reduction and creation of uncorrelated variables, the optimum number of components is crucial. This value can be estimated from a scree plot showing the accumulated variance of the scores as a function of the number of used components. More laborious but safer methods use cross validation or bootstrap techniques. [Pg.114]

If the assumptions (multivariate normal distributions with equal group covariance matrices) are fulfilled, the Fisher rule gives the same result as the Bayesian rule. However, there is an interesting aspect for the Fisher rule in the context of visualization, because this formulation allows for dimension reduction. By projecting the data... [Pg.217]

Many of the hotplate devices presented so far rely on corresponding thermal simulations that are based on model assumptions and hnite element methods (FEM) [47, 92-97]. Analytical models also have been developed [7,9,98,99] another publication describes RC-network analysis and dimension reduction [100]. A reduction of the complexity and order of the model has been successfully realized, and the dilFerent relevant approaches have been summarized in recent articles [101,102]... [Pg.17]

Methods for unsupervised learning invariably aim at compression or the extraction of information present in the data. Most prominent in this field are clustering methods [140], self-organizing networks [141], any type of dimension reduction (e.g., principal component analysis [142]), or the task of data compression itself. All of the above may be useful to interpret and potentially to visualize the data. [Pg.75]

It is also beyond the graphical representation capabilities commonly used. Factor analysis is one of the pattern recognition techniques that uses all of the measured variables (features) to examine the interrelationships in the data. It accomplishes dimension reduction by minimizing minor variations so that major variations may be summarized. Thus, the maximum information from the original variables is included in a few derived variables or factors. Once the dimen-... [Pg.22]

Key Words Biological activity chemical features chemical space cluster analysis compound databases dimension reduction molecular descriptors molecule classification partitioning algorithms partitioning in low-dimensional spaces principal component analysis visualization. [Pg.279]

Low-Dimensional Chemical Space and Dimension-Reduction Techniques... [Pg.281]

How is dimension reduction of chemical spaces achieved There are a number of different concepts and mathematical procedures to reduce the dimensionality of descriptor spaces with respect to a molecular dataset under investigation. These techniques include, for example, linear mapping, multidimensional scaling, factor analysis, or principal component analysis (PCA), as reviewed in ref. 8. Essentially, these techniques either try to identify those descriptors among the initially chosen ones that are most important to capture the chemical information encoded in a molecular dataset or, alternatively, attempt to construct new variables from original descriptor contributions. A representative example will be discussed below in more detail. [Pg.282]

In chemoinformatics research, partitioning algorithms are applied in diversity analysis of large compound libraries, subset selection, or the search for molecules with specific activity (1-4). Widely used partitioning methods include cell-based partitioning in low-dimensional chemical spaces (1,3) and decision tree methods, in particular, recursive partitioning (RP) (5-7). Partitioning in low-dimensional chemical spaces is based on various dimension reduction methods (4,8) and often permits simplified three-dimensional representation of... [Pg.291]

In contrast to partitioning methods that involve dimension reduction of chemical reference spaces, MP is best understood as a direct space method. However, -dimensional descriptor space is simplified here by transforming property descriptors with continuous or discrete value ranges into a binary classification scheme. Essentially, this binary space transformation assigns less complex -dimensional vectors to test molecules, with each dimension having unity length of either 0 or 1. Thus, although MP analysis proceeds in -dimensional descriptor space, its dimensions are scaled and its complexity is reduced. [Pg.295]

This chapter provides a brief overview of chemoinformatics and its applications to chemical library design. It is meant to be a quick starter and to serve as an invitation to readers for more in-depth exploration of the field. The topics covered in this chapter are chemical representation, chemical data and data mining, molecular descriptors, chemical space and dimension reduction, quantitative structure-activity relationship, similarity, diversity, and multiobjective optimization. [Pg.27]

In this chapter, we will give a brief introduction to the basic concepts of chemoinformatics and their relevance to chemical library design. In Section 2, we will describe chemical representation, molecular data, and molecular data mining in computer we will introduce some of the chemoinformatics concepts such as molecular descriptors, chemical space, dimension reduction, similarity and diversity and we will review the most useful methods and applications of chemoinformatics, the quantitative structure-activity relationship (QSAR), the quantitative structure-property relationship (QSPR), multiobjective optimization, and virtual screening. In Section 3, we will outline some of the elements of library design and connect chemoinformatics tools, such as molecular similarity, molecular diversity, and multiple objective optimizations, with designing optimal libraries. Finally, we will put library design into perspective in Section 4. [Pg.28]


See other pages where Dimension reduction is mentioned: [Pg.107]    [Pg.324]    [Pg.329]    [Pg.330]    [Pg.345]    [Pg.360]    [Pg.58]    [Pg.339]    [Pg.229]    [Pg.73]    [Pg.77]    [Pg.211]    [Pg.223]    [Pg.279]    [Pg.282]    [Pg.283]    [Pg.283]    [Pg.291]    [Pg.292]    [Pg.298]    [Pg.299]    [Pg.532]    [Pg.34]    [Pg.37]    [Pg.40]   
See also in sourсe #XX -- [ Pg.107 ]

See also in sourсe #XX -- [ Pg.10 ]

See also in sourсe #XX -- [ Pg.136 ]

See also in sourсe #XX -- [ Pg.290 , Pg.291 ]

See also in sourсe #XX -- [ Pg.86 , Pg.118 ]

See also in sourсe #XX -- [ Pg.317 ]

See also in sourсe #XX -- [ Pg.164 ]




SEARCH



© 2024 chempedia.info