Descriptor orthogonality

In order to avoid some drawbacks of the stepwise approaches, the i-fold stepwise variable selection method was recently proposed [Lucic et ai, 1999b]. This technique is based on descriptor orthogonalization and, at each subsequent step, adds the set of the best i descriptors. [Pg.468]

One of the central oversights and indifferences in use of MRA in QSAR relates not to using orthogonal molecular descriptors. Orthogonal molecular descriptors, which were introduced almost a quarter of a century ago [7], continue to be overlooked by a vast majority of QSAR researchers, who apparently do not realize that use of orthogonal molecular descriptors in multiple regression analysis is not an option but a must if they want to elaborate on the structure-property or structure-activity relationship ... [Pg.133]

By applying this method, they demonstrated that the removal of insignificant variables increases the quality and reliability of the models despite the fact that the correlation coefficient, r, always decreases, although only shghtly. For example, the characteristics of a model with six orthogonalized descriptors were r = 0.99288, s = 0.9062, F = 127.4 and the quality of this model was sufficiently improved after removal of the two least significant descriptors, to r = 0.9925, s = 0.8553,... [Pg.207]

Besides the aforementioned descriptors, grid-based methods are frequently used in the field of QSAR quantitative structure-activity relationships) [50]. A molecule is placed in a box and for an orthogonal grid of points the interaction energy values between this molecule and another small molecule, such as water, are calculated. The grid map thus obtained characterizes the molecular shape, charge distribution, and hydrophobicity. [Pg.428]

Descriptors used to characterize molecules in QSAR studies should be as independent of each other (orthogonal) as possible. When using correlated parameters there is an increased danger of obtaining non-predictive, chance correlation [56]. To examine the correlation between PSA (calculated according to the fragment-based protocol [10]) and other descriptors, we studied a collection of 7010 bioactive molecules from the PubChem database [57]. In addition to PSA, the following parameters were used ... [Pg.121]

The description of the degree of retention data correlation is more complicated than it appears. For example, the 2D retention maps cannot be characterized by a simple correlation coefficient (Slonecker et al., 1996) since it fails to describe the datasets with apparent clustering (Fig. 12.2f). Several mathematical approaches have been developed to define the data spread in 2D separation space (Gray et al., 2002 Liu et al., 1995 Slonecker et al., 1996), but they are nonintuitive, complex, and use multiple descriptors to define the degree of orthogonality. [Pg.271]

Randic, M. (1991) Resolution of ambiguities in structure-property studies by use of orthogonal ized descriptors../. Chem. Inf. Comput. Sci. 31, 311-320. [Pg.48]

The orthogonality of a set of molecular descriptors is a very desirable property. Classification methodologies such as CART (11) (or other decision-tree methods) are not invariant to rotations of the chemistry space. Such methods may encounter difficulties with correlated descriptors (e.g., production of larger decision trees). Often, correlated descriptors necessitate the use of principal components transforms that require a set of reference data for their estimation (at worst, the transforms depend only on the data at hand and, at best, they are trained once from some larger collection of compounds). In probabilistic methodologies, such as Binary QSAR (12), approximation of statistical independence is simplified when uncorrelated descriptors are used. In addition,... [Pg.267]

The table shows a number of representative descriptor types (there are many more) that can be used to define chemical spaces. Each descriptor adds a dimension (with discrete or continuous value ranges) to the chemical space representation (e.g., selection of 18 descriptors defines an 18-dimensional space). Axes of chemical space are orthogonal only if the applied molecular descriptors are uncorrelated (which is, in practice, hardly ever the case). [Pg.281]

Fig. 1. Median partitioning and compound selection. In this schematic illustration, a two-dimensional chemical space is shown as an example. The axes represent the medians of two uncorrelated (and, therefore, orthogonal) descriptors and dots represent database compounds. In A, a compound database is divided in into equal subpopulations in two steps and each resulting partition is characterized by a unique binary code (shared by molecules occupying this partition). In B, diversity-based compound selection is illustrated. From the center of each partition, a compound is selected to obtain a representative subset. By contrast, C illustrates activity-based compound selection. Here, a known active molecule (gray dot) is added to the source database prior to MP and compounds that ultimately occur in the same partition as this bait molecule are selected as candidates for testing. Finally, D illustrates the effects of descriptor correlation. In this case, the two applied descriptors are significantly correlated and the dashed line represents a diagonal of correlation that affects the compound distribution. As can be seen, descriptor correlation leads to over- and underpopulated partitions.

A major practical issue affecting MP calculations is caused by use of correlated molecular descriptors. During subsequent MP steps, exact halves of values (and molecules) are only generated if the chosen descriptors are uncorrelated (orthogonal), as shown in Fig. 1A. By contrast, the presence of descriptor correlations (and departure from orthogonal reference space) leads to overpopulated and underpopulated, or even empty, partitions (see also Note 5), as illustrated in Fig. ID. For diversity analysis, compounds should be widely distributed over computed partitions and descriptor correlation effects should therefore be limited as much as possible. However, for other applications, the use of correlated descriptors that produce skewed compound distributions may not be problematic or even favorable (see Note 5). [Pg.295]

Figure 1.4. Dimension reduction. The figure illustrates the transformation of an -dimensional descriptor space into an orthogonal three-dimensional space formed by three non-correlated descriptors either selected from the original ones or derived from them as new composite descriptors.

The oriented skew-lines convention describes orthogonal skew lines that are not defined with the classical system. By analogy, the vector descriptors are chosen to be A/yl, respectively... [Pg.181]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...