Boxplots, datasets

FIGURE 1.2 A boxplot comparison of Log S for the three datasets studied in this chapter. [Pg.5]

TABLE 1.1 Boxplot Statistics for the Three Datasets Studied in This Chapter... [Pg.6]

Once we have calculated the maximum similarity of each training set structure to each test set structure, we can use a boxplot to compare the similarities of our test sets. Listing 7 provides the R code for reading the similarity data, assigning labels to the datasets and plotting boxplots. [Pg.13]

The JCIM set is less similar to the training set. The median is 0.62 and the mean is 0.74. The boxplot and higher mean indicate that there are a number of compounds in this set that are similar to the training set. The absence of whiskers in the boxplot indicates that the similarities are more narrowly disttibuted. We would expect moderate performance from this dataset. [Pg.14]

The PubChem dataset appears to be very different from the training set. Note that the boxplot is almost flat, with a few outliers drawn as circles. The mean and median similarities to the training set are both 0.56. Similarity values in this range are what we would tend to expect from pairs of random compounds. We would expect extremely poor performance from this dataset. [Pg.14]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...