Transforming Descriptors

On the other hand, techniques like Principle Component Analysis (PCA) or Partial Least Squares Regression (PLS) (see Section 9.4.6) are used for transforming the descriptor set into smaller sets with higher information density. The disadvantage of such methods is that the transformed descriptors may not be directly related to single physical effects or structural features, and the derived models are thus less interpretable. [Pg.490]

Each additional resolution — up to the highest resolution level J—decomposes the coarse coefficients and leaves the detail coefficients unchanged. The remaining coarse coefficient cannot be decomposed further it consists of just four components. J is determined by the size n of the original vector with J = log2(n) - 2. Consequently, a wavelet-transformed descriptor can be represented by either single-level (j = 1) or multilevel (/< = /) decomposition. [Pg.100]

The atom pairs button enables a special mode for the display of a list of atom pairs if the mouse is clicked on a certain descriptor point. The corresponding distance is calculated, and a list of atom pairs together with their original values from the distance matrix is displayed. The statistics button calculates some statistical parameters for the actual descriptor and for a superimposed descriptor if available. The peak areas button separates the individual peaks and calculates the peak areas. The transform buttons enable the display of an additional transformed descriptor. [Pg.154]

Wavelet-Transformed Descriptor is a transformation of a descriptor into the frequency space to enhance or suppress characteristic features of a molecule. [Pg.165]

The reduction in descriptor size and resolution of Cartesian RDF descriptors leads to a significant decrease of the quality of classification. The high-pass D20 transformed descriptors — although half the size of the Cartesian RDF — are suited for classification even down to extremely short vectors with a resolution of just 0.8 A (B = 1.5625 A ) of the original descriptor. [Pg.198]

Structural similarity in a shorter data vector. In practice, the recommendation given here is to nse a length of 128 components for the original descriptor. The transformed descriptor rednces then to 64 components but represents the strnctnres in almost the same qnality as the original descriptor. [Pg.199]

In this work we employ the SIFT (scale invariant-feature transform) descriptor, which has been shown to perform better than other local descriptors (Mikolajczyk and Schmid, 2003). The SIFT descriptor is invariant to scale, rotation, intensity and contrast changes and, to a small degree, affine transformations. SIFT divides the region into a set of bins. For each bin, it computes a histogram of the intensity gradient orientation at each pixel. The result is a 128-dimensional real-valued vector. Once each detected region has been converted to a SIFT vector, the input image is discarded, and only the bag of SIFT vectors is retained for further analysis. [Pg.197]

We sec from Figure 3-22 that we need three transpositions to transform the isomer of the product of this reaction into the reference isomer. Thus, for Eq. (7 we obtain = (-1) (+ 1 = (-1) and everything seems fine as the descriptor of the... [Pg.198]

The profits from using this approach are dear. Any neural network applied as a mapping device between independent variables and responses requires more computational time and resources than PCR or PLS. Therefore, an increase in the dimensionality of the input (characteristic) vector results in a significant increase in computation time. As our observations have shown, the same is not the case with PLS. Therefore, SVD as a data transformation technique enables one to apply as many molecular descriptors as are at one s disposal, but finally to use latent variables as an input vector of much lower dimensionality for training neural networks. Again, SVD concentrates most of the relevant information (very often about 95 %) in a few initial columns of die scores matrix. [Pg.217]

A structure descriptor is a mathematical representation of a molecule resulting from a procedure transforming the structural information encoded within a symbolic representation of a molecule. This mathematical representation has to be invariant to the molecule s size and number of atoms, to allow model building with statistical methods and artificial neural networks. [Pg.403]

A structure descriptor is a mathematical representation of a molecule resulting from a procedure transforming the structural information encoded within a symbolic representation of a molecule. [Pg.432]

Molecules are usually represented as 2D formulas or 3D molecular models. WhOe the 3D coordinates of atoms in a molecule are sufficient to describe the spatial arrangement of atoms, they exhibit two major disadvantages as molecular descriptors they depend on the size of a molecule and they do not describe additional properties (e.g., atomic properties). The first feature is most important for computational analysis of data. Even a simple statistical function, e.g., a correlation, requires the information to be represented in equally sized vectors of a fixed dimension. The solution to this problem is a mathematical transformation of the Cartesian coordinates of a molecule into a vector of fixed length. The second point can... [Pg.515]

Multivariate data analysis usually starts with generating a set of spectra and the corresponding chemical structures as a result of a spectrum similarity search in a spectrum database. The peak data are transformed into a set of spectral features and the chemical structures are encoded into molecular descriptors [80]. A spectral feature is a property that can be automatically computed from a mass spectrum. Typical spectral features are the peak intensity at a particular mass/charge value, or logarithmic intensity ratios. The goal of transformation of peak data into spectral features is to obtain descriptors of spectral properties that are more suitable than the original peak list data. [Pg.534]

Fig. 17.1. Multivariate characterization with VolSurf descriptors. Molecular Interaction Fields (MIF shaded areas) are computed from the 3D-molecular structure. MIFs are transformed in a table of descriptors, and statistical multivariate analysis is performed.

D-molecular descriptors, alignment-independent and based on molecular interaction, called GRIND have been developed. These are autocorrelation transforms that are independent of the orientation of the molecules in 3D space. The original descriptors can be extracted from the autocorrelation transform with the ALMOND program. The basic idea is to compress the information present in 3D maps into a few 2D numerical descriptors which are very simple to understand and interpret. [Pg.197]

The major differences between behavior profiles of organic chemicals in the environment are attributable to their physical-chemical properties. The key properties are recognized as solubility in water, vapor pressure, the three partition coefficients between air, water and octanol, dissociation constant in water (when relevant) and susceptibility to degradation or transformation reactions. Other essential molecular descriptors are molar mass and molar volume, with properties such as critical temperature and pressure and molecular area being occasionally useful for specific purposes. A useful source of information and estimation methods on these properties is the handbook by Boethling and Mackay (2000). [Pg.3]

Obviously, the partly inverted Legendre-transformed representations for reactive systems would similarly generate descriptors of the partially relaxed (electronically or geometrically) reactive systems. [Pg.473]

The permutational descriptor of the given chemical constitution is then a permutation of atom indices (or the inverse permutation of graph indices) which transforms the reference constitution into the considered constitution. [Pg.12]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...