Descriptors encoding

A challenging task in material science as well as in pharmaceutical research is to custom tailor a compound s properties. George S. Hammond stated that the most fundamental and lasting objective of synthesis is not production of new compounds, but production of properties (Norris Award Lecture, 1968). The molecular structure of an organic or inorganic compound determines its properties. Nevertheless, methods for the direct prediction of a compound s properties based on its molecular structure are usually not available (Figure 8-1). Therefore, the establishment of Quantitative Structure-Property Relationships (QSPRs) and Quantitative Structure-Activity Relationships (QSARs) uses an indirect approach in order to tackle this problem. In the first step, numerical descriptors encoding information about the molecular structure are calculated for a set of compounds. Secondly, statistical and artificial neural network models are used to predict the property or activity of interest based on these descriptors or a suitable subset. [Pg.401]

The molecular descriptors refer to the molecular size and shape, to the size and shape of hydrophilic and hydrophobic regions, and to the balance between them. Hydrogen bonding, amphiphilic moments, critical packing parameters are other useful descriptors. The VolSurf descriptors have been presented and explained in detail elsewhere [8]. The VolSurf descriptors encode physico-chemical properties and, therefore, allow both for a design in the physico-chemical property space in order to rationally modulate pharmacokinetic properties, and for establishing quantitative structure-property relationships (QSPR). [Pg.409]

At the low end of the hierarchy are the TS descriptors. This is the simplest of the four classes molecular structure is viewed only in terms of atom connectivity, not as a chemical entity, and thus no chemical information is encoded. Examples include path length descriptors [13], path or cluster connectivity indices [13,14], and number of circuits. The TC descriptors are more complex in that they encode chemical information, such as atom and bond type, in addition to encoding information about how the atoms are connected within the molecule. Examples of TC descriptors include neighborhood complexity indices [23], valence path connectivity indices [13], and electrotopological state indices [17]. The TS and TC are two-dimensional descriptors which are collectively referred to as TIs (Section 31.2.1). They are straightforward in their derivation, uncomplicated by conformational assumptions, and can be calculated very quickly and inexpensively. The 3-D descriptors encode 3-D aspects of molecular structure. At the upper end of the hierarchy are the QC descriptors, which encode electronic aspects of chemical structure. As was mentioned previously, QC descriptors may be obtained using either semiempirical or ab initio calculation methods. The latter can be prohibitive in terms of the time required for calculation, especially for large molecules. [Pg.485]

The rows of the matrix represent the observations, and the columns are the values of the descriptors. In other words, each row is a data or pattern vector, and the components of the data vector are physically measurable quantities called descriptors. It is essential that descriptors encode the same information for all samples in the data set. If variable 5 is the area of a gas chromatographic (GC) peak for phenol in sample 1, it must also be the area of the GC peak for phenol in samples 2, 3, etc. Hence, peak matching is crucial when chromatograms or spectra are translated into data vectors. [Pg.341]

The CRI descriptor encodes information about all connectivities in the H-depleted molecular graph and is sensitive to the presence of heteroatoms in the molecule. [Pg.135]

Moreover, atom-type -state indices were proposed as molecular descriptors encoding topological and electronic information related to particular atom types in the molecule [Hall and Kier, 1995a Hall et al., 1995b]. They are calculated by summing the -state values of all atoms of the same atom type in the molecule. Each atom type is first defined by atom identity, based on the atomic number Z, and valence state, itself identified by the valence state indicator (VST) defined as ... [Pg.162]

The most widely used molecular descriptor encoding this property is the octanol-water partition coefficient ATow (and log or also logP when no further specifications are given), i.e. the partition coefficient between 1-octanol and water ... [Pg.270]

Descriptor A numerical representation of a molecular property, eitherabulk property (like log / ) or a two-dimensional (2D) or lhtee-tlimen.sional (3D) structural property. When descriptors encode the presence or absence of a property, they are usually represented by Is and Os. and the collection of ilcscriptors is called a fingerprint of the molecule. [Pg.61]

Transferable atom equivalent (TAE) descriptors encode the distributions of electron density based molecular properties, such as kinetic energy densities,... [Pg.406]

Another application of GAs was published by Aires de Sousa et al. they used genetic algorithms to select the appropriate descriptors for representing structure-chemical shift correlations in the computer [69]. Each chromosome was represented by a subset of 486 potentially useful descriptors for predicting H-NMR chemical shifts. The task of a fitness function was performed by a CPG neural network that used the subset of descriptors encoded in the chromosome for predicting chemical shifts. Each proton of a compound is presented to the neural network as a set of descriptors obtaining a chemical shift as output. The fitness function was the RMS error for the chemical shifts obtained from the neural network and was verified with a cross-validation data set. [Pg.111]

The scoring of each chromosome is now performed by a CPG neural network that uses the subset of descriptors encoded in the chromosome for predicting chemical shifts. The score function of one chromosome (fimess function) is the RMS error for the chemical shifts obtained with a cross-validation set. [Pg.206]

VolSurf descriptors were designed to compress relevant MIF information into a few alignment-independent descriptors encoding information about molecular size and shape, the overall distribution of hydrophobic and hydrophilic regions and the balance between them (Table G7). [Pg.360]

MOLMAP (Molecular Map of Atom-level Properties) descriptors are uniform-length vectorial descriptors derived by mapping physico-chemical properties of all the bonds in a molecule into a 2D Kohonen —> self organizing map (SOM) [Zhang and Aires-de-Sousa, 2005 Gupta, Metthew ef al., 2006]. These descriptors encode local features of a chemical structure, being calculated on the basis of properties of single elements in a molecule, such as bonds. [Pg.553]

Reactivity indices are molecular descriptors encoding information about the behavior of molecules in chemical reactions and are usually categorized as either electrophilic indices or nucleophilic indices, depending on whether the reaction of interest involves electrophilic or nucleophilic attack. Moreover, static reactivity indices, such as charges, describe isolated molecules in their ground state, while dynamic reactivity indices refer to molecules in their transition states during a reaction. [Pg.638]

Scoring functions are molecular descriptors encoding information about drug likeness of compounds [Oprea, Gottfries et al., 2000 Walters and Murcko, 2002 Muegge, 2003 Leach, Hann et al, 2006]. As the strictly related property filters, they are mainly applied in the design... [Pg.662]

PathFinder fingerprints are vectorial descriptors encoding information about the molecular shape starting from a surface representation based on molecular interaction fields (MIFs) [McLay, Harm et al, 2006],... [Pg.691]

The Estrada index is a molecular descriptor encoding information on complexity of molecular graphs and is also used to describe characteristic features of complex systems of physicochemical interest, such as reaction, metabolic, and protein-protein interaction networks. [Pg.718]

Triplets of Pharmacophoric Point descriptors (or TOPP descriptors) encode information on presence/absence or occurrence frequencies of 3-point pharmacophores derived from molecular interaction fields (MlFs) [Sciabola, Morao et al., 2007 Lamanna, Catalano et al, 2007]. [Pg.778]

FLAP fingerprints (FLAP stands for Fingerprints for Ligands and Proteins) are vectorial descriptors encoding information about 3- and 4-point pharmacophores [Perruccio, Mason etal, 2006 Baroni, Crucianiet al, 2007]. The theory underpinning FLAP is similar to that of the TOPP descriptors. [Pg.780]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...