Solubility datasets

Huuskonen and coworkers [11] published a study on 211 drugs, all of which were solids. The solubility data were taken from the literature. We will discuss this work further in 10.7. Klopman et al. [16] published a water-solubility study on 483 literature compounds, but unless one orders the accompanying Supplementary Material it is difficult to know the physical states of the substances they studied. [Pg.234]

McFarland et al. recently [1] published the results of studies carried out on 22 crystalline compounds. Their water solubilities were determined using pSOL [21], an automated instrument employing the pH-metric method described by Avdeef and coworkers [22]. This technique assures that it is the thermodynamic equilibrium solubility that is measured. While only ionizable compounds can be determined by this method, their solubilities are expressed as the molarity of the unionized molecular species, the intrinsic solubility, SQ. This avoids confusion about a compound s overall solubility dependence on pH. Thus, S0, is analogous to P, the octanol/water partition coefficient in both situations, the ionized species are implicitly factored out. In order to use pSOL, one must have knowledge of the various pKas involved therefore, in principle, one can compute the total solubility of a compound over an entire pH range. However, the intrinsic solubility will be our focus here. There was one zwitterionic compound in this dataset. To obtain best results, this compound was formulated as the zwitterion rather than the neutral form in the HYBOT [23] calculations. [Pg.234]

Forward step-wise MLR [24] was used to analyze the data obtained from the solubility investigations that follow. To compute Q2, we used MODDE software [25]. The results from the study reported in [1] are summarized in Eq. (2) [Pg.235]

With only 22 compounds available, there was only enough information to support a three-term correlation equation. Since this work was published, the solubilities of an additional 36 compounds have been reported to us. Many of these were determined [27] in the same laboratories as those cited in [1], but 21 were determined in the laboratory of Artursson [28], and six in the laboratories of Faller and Wohnsland [29]. While this introduced some inter-laboratory variability, we believe that this is minimal because the same automated method on comparable, commercial pSOL instruments was employed. This new dataset now contains six zwitterionic compounds as before, their structures were entered in their zwitter-ionic forms to do the HYBOT calculations. When they were entered in their neutral forms, poorer results were obtained in the subsequent MLR analysis. [Pg.235]

Using the solubilities of these compounds and the same set of descriptors as given in [1], the correlation summarized in Eq. (3) was obtained [Pg.235]

The differences can also be seen in Figure 15.4, which compares cumulative distributions of these properties between the Gasteiger and Roche datasets. Many of the literature compounds are very simple, with low molecular weight, few polar atoms, and few functional groups. They have often been included in solubility datasets because they are well characterized and because accurate solubility data are available for them, rather than because they are druglike. The inclusion of many such simple compounds in a training set for a solubility prediction tool may focus the tool on an area of chemistry space that is not well populated with druglike molecules and may make the tool less useful for the prediction of the solubility... [Pg.389]

GPSVS2 explains, to a lesser degree, water solubility. Because most efforts to model solubility have been focused on its relationship with logP [9], GPSVS2 was compared not only to Abraham s solubility dataset [32] (794 compounds), but also to the Pomona logP dataset [63] (7954 compounds). GPSVS2 scores correlated directly to measured logP values (P = 0.61), and to water solubility (P = 0.68). [Pg.259]

Table 16-2 Summary of different methods and models for the Huuskonen aqueous solubility dataset... [Pg.1025]

Fig. 16-5 Model of the Huuskonen aqueous solubility dataset using PLS [34]. Triangles - training set, circles - test set. The plot shows the "deceptively good...

In addition to confounding experimental factors, a number of published solubility models are somewhat misleading due to a lack of proper computational controls. While we sometimes have limited control over the experimental data used to build models, we have complete control over the way models are evaluated and should always employ appropriate means of evaluating our models. In subsequent sections, we will use solubility datasets to examine some of these control strategies. [Pg.3]

In this chapter, we will consider three different, publicly available, solubility datasets. [Pg.3]

The Huuskonen Dataset This set of 1274 experimental solubility values (Log S) was one of the first large solubility datasets published [15,16] and has subsequently been used in a number of other publications [14,17]. The data in this set was extracted from the AQUASOL [18, 19] database, compiled by the Yalkowsky group at the... [Pg.3]

Aqueous solubility is selected to demonstrate the E-state application in QSPR studies. Huuskonen et al. modeled the aqueous solubihty of 734 diverse organic compounds with multiple linear regression (MLR) and artificial neural network (ANN) approaches [27]. The set of structural descriptors comprised 31 E-state atomic indices, and three indicator variables for pyridine, ahphatic hydrocarbons and aromatic hydrocarbons, respectively. The dataset of734 chemicals was divided into a training set ( =675), a vahdation set (n=38) and a test set (n=21). A comparison of the MLR results (training, r =0.94, s=0.58 vahdation r =0.84, s=0.67 test, r =0.80, s=0.87) and the ANN results (training, r =0.96, s=0.51 vahdation r =0.85, s=0.62 tesL r =0.84, s=0.75) indicates a smah improvement for the neural network model with five hidden neurons. These QSPR models may be used for a fast and rehable computahon of the aqueous solubihty for diverse orgarhc compounds. [Pg.93]

Delaney [4,14] and Klamt [16] argued that for drug-like compound datasets only about 20% of the variance of log S arises from AG s. This is further confirmed by the study of Wassvik et al. [15] in which 77% of the variance is due to the solubility of the supercooled liquid. Hence, applying crude estimates by mean values or by QSAR approaches we can reasonably expect that the inaccuracies introduced in dmg solubility prediction by our theoretical ignorance of AG s is less than, or at least not much bigger than, the inaccuracies introduced by the estimates of the larger park i.e. the liquid solubility, and by the experimental difficulties in solubility measurement. [Pg.291]

Refinement of a QSPR model requires experimental solubilities to train the model. Several models have used the dataset of Huuskonen [44] who sourced experimental data from the AQUASOL [45] and PHYSPROP [46] databases. The original set had a small number of duplicates, which have been removed in most subsequent studies using this dataset, leaving 1290 compounds. When combined, the log Sw... [Pg.302]

As a key first step towards oral absorption, considerable effort has been directed towards the development of computational solubility prediction [26-30]. However, partly due to a lack of large experimental datasets measured under identical conditions, today s methods are not sufficiently robust for reliable predictions [31]. Nonetheless, further fine-tuning of these models can be expected since high-throughput data have become available for their construction. [Pg.7]

Are the equilibrium constants for the important reactions in the thermodynamic dataset sufficiently accurate The collection of thermodynamic data is subject to error in the experiment, chemical analysis, and interpretation of the experimental results. Error margins, however, are seldom reported and never seem to appear in data compilations. Compiled data, furthermore, have generally been extrapolated from the temperature of measurement to that of interest (e.g., Helgeson, 1969). The stabilities of many aqueous species have been determined only at room temperature, for example, and mineral solubilities many times are measured at high temperatures where reactions approach equilibrium most rapidly. Evaluating the stabilities and sometimes even the stoichiometries of complex species is especially difficult and prone to inaccuracy. [Pg.24]

To explore the differences between the methods, we use spece8 to calculate at 25 °C the solubility of gypsum (CaSCU 2H2O) as a function of NaCl concentration. We use two datasets thermo.dat, which invokes the B-dot equation, and thermo hmw. dat, based on the hmw model. The log K values for the gypsum dis-... [Pg.130]

One problem highlighted by several reviewers [14,20] is that datasets like the Huuskonen set cover unnecessarily large ranges of solubility. The Huuskonen set covers the range log S (log of solubility in mol/1) from —11.62 to +1.58, which converts approximately to 9.6 x 10 7-1.5 x 107pg/ml for a MW of 400 Da. [Pg.453]

Fig. 15.2 Predicted versus observed solubility values for 493 compounds in the Gasteiger dataset [33] = 0.93.

Tab. 15.1 values for QMPRPlus solubility predictions for various datasets... [Pg.385]

Fig. 15.3 Predicted versus observed solubility values for 1526 compounds in the Roche dataset r = 0.17.

Votano, J.R., Parham, M., Hall, L.H., Kier, L.B., Hall, L.M. Prediction of aqueous solubility based on large datasets using several QSPR models utilizing topological structure representation. Chem. Biodivers. 2004, 1, 1829-41. [Pg.125]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...