Descriptor validation studies

Matter H, Selecting optimally diverse compounds from structural databases A validation study of two-dimensional and three-dimensional molecular descriptors, J. Med. Chem., 40 1219-1229, 1997. [Pg.366]

Many different methods have been developed both to measure diversity and to select diverse sets of compounds, however, currently there is no clear picture of which methods are best. To date, some work has been done on comparing the various methods however, there is a great need for more validation studies to be performed both on the structural descriptors used and on the different compound selection strategies that have been devised. In some cases, the characteristics of the library itself might determine the choice of descriptors and the compound selection methods that can be applied. For example, computationally expensive methods such as 3D pharmacophore methods are limited in the size of libraries that can be handled. Thus for product-based selection, they are currently restricted to handling libraries of tens of thousands of compounds rather than the millions that can be handled using 2D based descriptors. [Pg.61]

Matter, H. (1997). Selecting Optimally Diverse Compounds from Structure Databases. A Validation Study of Two-Dimensional and Three-Dimensional Molecular Descriptors. J.Med.Chem., 40,1219-1229. [Pg.613]

Given these numerous descriptors and selection techniques, the question arises, whether chemical diversity can be correlated to biological diversity. This section summarizes own and many published validation studies for selected diversity descriptors. Several physico-chemical descriptors were investigated in different studies to uncover the relationship between 2D/3D similarity and biological activity. [Pg.420]

Many different computational methods have been developed to assist in the design of combinatorial libraries. To date, very little has been done to compare the various methods. There is therefore a great need for more validation studies to be performed, both on the structural descriptors used and on the different compound selection strategies that have been devised. Once libraries have been synthesized and tested, a large amount of information is available that should enable much more effective libraries to be designed for that particular screen. However, current methods for QSAR are only able to handle datasets of a few hundred compounds [73], and structure-activity models are required that are able to handle the much larger datasets that are the result of combinatorial syntheses. [Pg.270]

The biological relevance of chemical descriptors was at the heart of the next stage of evolution in diversity evaluation. The similar property principle of Johnson and Maggiora states that structurally similar molecules are expected to show similar physical, chemical and biological properties. Patterson et al. extended this principle to neighbourhood behaviour molecules close to each other in a defined property space should show similar physical, chemical and biological properties. Martin and co-workers described a validation study of a... [Pg.373]

H. Matter,/. Peptide. Res., 52,305 (1998). A Validation Study of Molecular Descriptors for the Rational Design of Peptide Libraries. [Pg.47]

By all these there was practically proved the reliability of the algebraic approach to explain the eventually statistically paradoxes as well to overcome them in a systematically analytical manner. Still, a more general fundament for the algebraic QSAR approach containing statistical descriptors, validity and predictability effects should be studied at the pure mathematical level regardless of the chemical or biological systems involved. [Pg.325]

The abbreviation QSAR stands for quantitative structure-activity relationships. QSPR means quantitative structure-property relationships. As the properties of an organic compound usually cannot be predicted directly from its molecular structure, an indirect approach Is used to overcome this problem. In the first step numerical descriptors encoding information about the molecular structure are calculated for a set of compounds. Secondly, statistical methods and artificial neural network models are used to predict the property or activity of interest, based on these descriptors or a suitable subset. A typical QSAR/QSPR study comprises the following steps structure entry or start from an existing structure database), descriptor calculation, descriptor selection, model building, model validation. [Pg.432]

When applied to QSAR studies, the activity of molecule u is calculated simply as the average activity of the K nearest neighbors of molecule u. An optimal K value is selected by the optimization through the classification of a test set of samples or by the leave-one-out cross-validation. Many variations of the kNN method have been proposed in the past, and new and fast algorithms have continued to appear in recent years. The automated variable selection kNN QSAR technique optimizes the selection of descriptors to obtain the best models [20]. [Pg.315]

The literature of the past three decades has witnessed a tremendous explosion in the use of computed descriptors in QSAR. But it is noteworthy that this has exacerbated another problem rank deficiency. This occurs when the number of independent variables is larger than the number of observations. Stepwise regression and other similar approaches, which are popularly used when there is a rank deficiency, often result in overly optimistic and statistically incorrect predictive models. Such models would fail in predicting the properties of future, untested cases similar to those used to develop the model. It is essential that subset selection, if performed, be done within the model validation step as opposed to outside of the model validation step, thus providing an honest measure of the predictive ability of the model, i.e., the true q2 [39,40,68,69]. Unfortunately, many published QSAR studies involve subset selection followed by model validation, thus yielding a naive q2, which inflates the predictive ability of the model. The following steps outline the proper sequence of events for descriptor thinning and LOO cross-validation, e.g.,... [Pg.492]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...