Linear descriptors

A linear descriptor naturally contains an ordering of its elements, which makes it very easy to compare them. In a tree structure however, it is not clear which parts should be mapped onto each other. Different mappings will result in different similarity values ... [Pg.83]

Fig. 4.4 Li near versus treelike molecular descriptors Linear descriptors represent the occurrence of certain features like fragments or paths, the relative arrangement gets lost. Treelike descriptors represent the building blocks of the molecule as well as their relative arrangement.

The fields that are derived on a grid can be encoded into lengthy binary [167] or real-valued descriptors [147]. Also, an abstract description as the so-called field graphs has been attempted [197]. Another approach to convert fields back into linear descriptors is to extract the characteristic features [198]. [Pg.85]

Once a Feature Tree can be created from a molecule, the question arises of how to compare two Feature Trees. Using Eq. (1), we are able to compare two individual Feature Tree nodes. Owing to the additivity of the features stored at a node, we can also compare two sets of Feature Tree nodes. This is done by adding the features over all nodes within a set and applying Eq. (1) again. Obviously, we can also compare two complete Feature Trees in this way we just add all features in the two trees and apply Eq. (1). We call such a comparison level-0, because no division of the tree into pieces has been performed. Level-0 comparisons closely resemble the way linear descriptors work. If we assume for a moment that all components of a linear descriptor are additive and can be computed for each building block individually (such as the volume descriptor), adding the feature values over all Feature Tree nodes will create the linear descriptor. [Pg.85]

In Chapter 5, when discussing the use of co-linear descriptors in regression analysis, we mentioned the instability of regression equations. Here we will illustrate the Nightmare of QSAR on a regression using four connectivity indices % of... [Pg.449]

Besides these LFER-based models, approaches have been developed using whole-molecule descriptors and learning algorithms other then multiple linear regression (see Section 10.1.2). [Pg.494]

The previously mentioned data set with a total of 115 compounds has already been studied by other statistical methods such as Principal Component Analysis (PCA), Linear Discriminant Analysis, and the Partial Least Squares (PLS) method [39]. Thus, the choice and selection of descriptors has already been accomplished. [Pg.508]

An alternative to principal components analysis is factor analysis. This is a technique which can identify multicollinearities in the set - these are descriptors which are correlated with a linear combination of two or more other descriptors. Factor analysis is related to (and... [Pg.697]

Once the descriptors have been computed, is necessary to decide which ones will be used. This is usually done by computing correlation coelficients. Correlation coelficients are a measure of how closely two values (descriptor and property) are related to one another by a linear relationship. If a descriptor has a correlation coefficient of 1, it describes the property exactly. A correlation coefficient of zero means the descriptor has no relevance. The descriptors with the largest correlation coefficients are used in the curve fit to create a property prediction equation. There is no rigorous way to determine how large a correlation coefficient is acceptable. [Pg.244]

In order to parameterize a QSAR equation, a quantihed activity for a set of compounds must be known. These are called lead compounds, at least in the pharmaceutical industry. Typically, test results are available for only a small number of compounds. Because of this, it can be difficult to choose a number of descriptors that will give useful results without htting to anomalies in the test set. Three to hve lead compounds per descriptor in the QSAR equation are normally considered an adequate number. If two descriptors are nearly col-linear with one another, then one should be omitted even though it may have a large correlation coefficient. [Pg.247]

In the case of drug design, it may be desirable to use parabolic functions in place of linear functions. The descriptor for an ideal drug candidate often has an optimum value. Drug activity will decrease when the value is either larger or smaller than optimum. This functional form is described by a parabola, not a linear relationship. [Pg.247]

Other techniques that work well on small computers are based on the molecules topology or indices from graph theory. These fields of mathematics classify and quantify systems of interconnected points, which correspond well to atoms and bonds between them. Indices can be defined to quantify whether the system is linear or has many cyclic groups or cross links. Properties can be empirically fitted to these indices. Topological and group theory indices are also combined with group additivity techniques or used as QSPR descriptors. [Pg.308]

Multimedia models can describe the distribution of a chemical between environmental compartments in a state of equilibrium. Equilibrium concentrations in different environmental compartments following the release of defined quantities of pollutant may be estimated by using distribution coefficients such as and H s (see Section 3.1). An alternative approach is to use fugacity (f) as a descriptor of chemical quantity (Mackay 1991). Fugacity has been defined as fhe fendency of a chemical to escape from one phase to another, and has the same units as pressure. When a chemical reaches equilibrium in a multimedia system, all phases should have the same fugacity. It is usually linearly related to concentration (C) as follows ... [Pg.70]

Aqueous solubility is selected to demonstrate the E-state application in QSPR studies. Huuskonen et al. modeled the aqueous solubihty of 734 diverse organic compounds with multiple linear regression (MLR) and artificial neural network (ANN) approaches [27]. The set of structural descriptors comprised 31 E-state atomic indices, and three indicator variables for pyridine, ahphatic hydrocarbons and aromatic hydrocarbons, respectively. The dataset of734 chemicals was divided into a training set ( =675), a vahdation set (n=38) and a test set (n=21). A comparison of the MLR results (training, r =0.94, s=0.58 vahdation r =0.84, s=0.67 test, r =0.80, s=0.87) and the ANN results (training, r =0.96, s=0.51 vahdation r =0.85, s=0.62 tesL r =0.84, s=0.75) indicates a smah improvement for the neural network model with five hidden neurons. These QSPR models may be used for a fast and rehable computahon of the aqueous solubihty for diverse orgarhc compounds. [Pg.93]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...